Randomized Algorithms
This note is part of my learning notes on Data Structures & Algorithms.
Introduction
Randomized Algorithms are algorithms that solves a problem step-by-step by making random choices, from a finite set of possible steps. We call an algorithm randomized if its behavior is determined not only by its input but also by values produced by a random number generator (). Hence, a randomized algorithm may produce different outputs when applied more than one times to the same input.
We use randomized algorithms because they can help speed up a decision algorithm, resulting in shorter running time (reduced time complexity).
Randomized algorithms can be designed either as a Las Vegas algorithm or a Monte Carlo algorithm. A Las Vegas algorithm is weaker than a Monte Carlo algorithm, so we often design a Monte Carlo algorithm first.
- Monte Carlo algorithm: may produce incorrect results with a certain probability, but it will stop after a fixed amount of time or after a certain number of iterations. A common example is the Karger-Stein’s algorithm.
- output is a random variable
- running time is bounded by something deterministic
- Las Vegas algorithm: keep running the algorithm until it produces the correct result, but the running time may vary. A common example is the randomized quicksort algorithm.
- output is deterministic
- running time is a random variable
Indicator Random Variable
An indicator random variable is a function that assigns a real value to an outcome in the sample space of a random experiment. It provides a convenient method for converting between probabilities and expectations. Suppose we are given a sample space and an event . Then the indicator random variable associated with event is defined as:
High Probability Bounds
In randomized algorithms, high probability bounds refer to probabilistic guarantees on the performance or behavior of the algorithm. Specifically, these bounds indicate that the algorithm will achieve certain outcomes with high probability, meaning that the probability of failure is very low.
For example, if an algorithm has a high probability bound of , it means that the algorithm will produce the correct result or behave as expected with probability at least where is a small constant typically representing the probability of failure.
Markov’s Inequality
Markov’s inequality provides a loose upper-bound on the probability that a non-negative random variable is greater than or equal to a positive constant. It is useful for providing a quick and simple estimate of the upper bound on the probability that a random variable deviates from its expected value. This is particularly useful in algorithms where the exact distribution of the variable is unknown but its expectation is known.
Theorem: for any non-negative random variable and all positive constant :
where:
- is the probability that the random variable is greater than or equal to .
- is the expected value (mean) of the random variable .
- is the threshold ( is at least ).
Chebyshev’s Inequality
Chebyshev’s inequality is an extension of Markov’s Inequality. It provides an upper bound on the probability and helps quantify how much a random variable can deviate from its mean, which is critical in algorithms where high reliability or consistency is required despite the presence of randomness.
Theorem: for any random variable with finite mean and finite non-zero variance , and for any positive constant :
where:
- is the probability that the absolute deviation of the random variable from its mean is greater than or equal to times the standard deviation .
- is the mean (expected value) of the random variable .
- is the variance of the random variable .
Chernoff Bounds
Chernoff Bounds are particularly useful in settings where an algorithm needs to ensure that the probability of an undesirable outcome (e.g., large deviation from the mean) is extremely low.
Given a sequence of independent random variables , , , with and their sum , the Chernoff bounds give upper bounds on the probability that deviates significantly from its expected value .
There are several forms of Chernoff bounds, but one common form is the following:
Comparison between Markov, Chebyshev, and Chernoff Bounds:
The bound given by Markov is the “weakest” one. It is constant and does not change as n increases. The bound given by Chebyshev’s inequality is “stronger” than the one given by Markov’s inequality. The strongest bound is the Chernoff bound. It goes to zero exponentially fast.
Binomial Distribution
The binomial distribution is the discrete probability distribution which shows the number of success in a sequence of independent Bernoulli trials (each with two possible outcomes, and with the same probability of success ).
Probability mass function:
Cumulative distribution function:
Normal Distribution
Normal distribution is also known as the Gaussian distribution. It describes data that tends to cluster around a mean. Many random processes tend towards a normal distribution as the number of trials increases (Central Limit Theorem).
When and are large enough, the binomial distribution can be approximated with a normal distribution.
: number of trials
: probability of success in each trial
Normal distribution with mean and variance is:
Poisson Distribution
Poisson distribution describes the probability of a given number of events happening in a fixed interval of time or space, under the assumption that these events happen with a known constant mean rate and independently of the time since the last event.
In the analysis of randomized algorithms, particularly those involving random sampling, the number of iterations or trials until a certain event occurs can often be modeled using a Poisson distribution (especially for Monte Carlo methods).
If is huge and is very small, the binomial distribution can be approximated by a Poisson distribution with
Poisson distribution with mean is:
Example: Coupon Collector
Suppose there is a lucky draw, in which you can get different types of coupons. Each time you collect a coupon, it’s chosen uniformly at random from the available types. The goal is to determine the expected number of draws to get a full collection of all coupons.
Algorithm:
- Initialize an empty set to keep track of the types of coupons collected.
- Repeat the following steps until all types of coupons are collected:
a. Choose a coupon uniformly at random from the available types.
b. If the chosen coupon is not already in the set, add it to the set. - Count the total number of coupons collected until all types are collected.
Analysis:
Let be the number of draws to get a full collection, is the number of draws to get the next different coupon, under the condition that you already have different types of coupon.
Example: Randomized Quicksort
Quicksort is a popular algorithm used to sort numbers, it works by first choosing a pivot element, then sorting the numbers around that particular pivot. A randomized quicksort is designed to decrease the chances of the quicksort algorithm being executed in the worst case time complexity of .
The running time of a randomized quicksort is independent of the input order. No assumptions need to be made about the input distribution. The worst case is determined only be the output of a random-number generator.
Algorithm:
- Randomly select a number to be the pivot
- Divided the remaining numbers into two groups:
- those smaller than
- those larger than
- Sort each group recursively, and concatenate them.
Analysis:
We analyze the number of comparisons needed to fully sort all the numbers.
Let be the random variable for the running time of randomized quicksort on an input size of , where random numbers are independent.
For , the indicator random variable is as follows:
Since all splits are likely equal, assuming elements are distinct, then:
Randomized Quicksort always outperform deterministic algorithms like heap sort and merge sort.
Example: Karger-Stein’s Algorithm
Karger-Stein’s algorithm is used to find minimum cuts in graphs, by iteratively contracting randomly chosen edges until only two edges remain.
Given a connected graph which is undirected, unweighted, and has number of vertices, number of edges. We need to compute the smallest set of edges that will make disconnected.
A cut in a graph is a partition of the vertices into two non-empty parts. An edge crosses the cut if it goes from one side of the cut to the other. A minimum cut is a cut that has the fewest possible edges crossing it.
Algorithm:
- Pick a random edge
- Contract the chosen edge (merge the two endpoints together)
- Repeat steps 1 and 2, until there are only two vertices left
- Return the cut that corresponds to those two vertices
Theorem: the probability that Karger-Stein’s algorithm returns a minimum cut in a graph with vertices is at least .
Example: Polynomial Identity Testing
Given a polynomial , we want to find out if this polynomial is equivalent to 0.
In fundamental theorem of algebra: a -variable polynomial with degreees has at most roots.
Analysis:
For which has sample space ,
- Choose number of random values independently and uniformly from .
- Eval()
When is true, , the degree
The possibility that is equal to zero:
The possibility that isn not equal to zero:
The possibility of is not zero when is zero:
Further Notes
Expectation
The expectation (or expected value) of a random variable is denoted by , which is a measure of the central tendency of the distribution of .
- For discrete random variables
- For continuous random variables
Variance
The variance of a random variable is denoted by or . It measures the spread or dispersion of the distribution of .