Randomized Algorithms

This note is part of my learning notes on Data Structures & Algorithms.

Introduction

Randomized Algorithms are algorithms that solves a problem step-by-step by making random choices, from a finite set of possible steps. We call an algorithm randomized if its behavior is determined not only by its input but also by values produced by a random number generator ( $r \in {1,...,R}$ ). Hence, a randomized algorithm may produce different outputs when applied more than one times to the same input.

We use randomized algorithms because they can help speed up a decision algorithm, resulting in shorter running time (reduced time complexity).

Randomized algorithms can be designed either as a Las Vegas algorithm or a Monte Carlo algorithm. A Las Vegas algorithm is weaker than a Monte Carlo algorithm, so we often design a Monte Carlo algorithm first.

Monte Carlo algorithm: may produce incorrect results with a certain probability, but it will stop after a fixed amount of time or after a certain number of iterations. A common example is the Karger-Stein’s algorithm.
- output is a random variable
- running time is bounded by something deterministic
Las Vegas algorithm: keep running the algorithm until it produces the correct result, but the running time may vary. A common example is the randomized quicksort algorithm.
- output is deterministic
- running time is a random variable

Indicator Random Variable

An indicator random variable is a function that assigns a real value to an outcome in the sample space of a random experiment. It provides a convenient method for converting between probabilities and expectations. Suppose we are given a sample space $S$ and an event $A$ . Then the indicator random variable $I\{A\}$ associated with event $A$ is defined as:

I\{A\} = \begin{cases} 1, \text{if A occurs}\\ 0, \text{if A does not occur} \end{cases}

High Probability Bounds

In randomized algorithms, high probability bounds refer to probabilistic guarantees on the performance or behavior of the algorithm. Specifically, these bounds indicate that the algorithm will achieve certain outcomes with high probability, meaning that the probability of failure is very low.

For example, if an algorithm has a high probability bound of $1 - \delta$ , it means that the algorithm will produce the correct result or behave as expected with probability at least $1 - \delta$ where $\delta$ is a small constant typically representing the probability of failure.

Markov’s Inequality

Markov’s inequality provides a loose upper-bound on the probability that a non-negative random variable is greater than or equal to a positive constant. It is useful for providing a quick and simple estimate of the upper bound on the probability that a random variable deviates from its expected value. This is particularly useful in algorithms where the exact distribution of the variable is unknown but its expectation is known.

Theorem: for any non-negative random variable $X$ and all positive constant $\delta > 0$ :

P(X \geq \delta) \leq \frac{E[X]}{\delta}, \text{for } X \geq 0, \delta >0

where:

$P(X \geq \delta)$ is the probability that the random variable $X$ is greater than or equal to $\delta$ .
$E[X]$ is the expected value (mean) of the random variable $X$ .
$\delta$ is the threshold ( $X$ is at least $\delta$ ).

Chebyshev’s Inequality

Chebyshev’s inequality is an extension of Markov’s Inequality. It provides an upper bound on the probability and helps quantify how much a random variable can deviate from its mean, which is critical in algorithms where high reliability or consistency is required despite the presence of randomness.

Theorem: for any random variable $X$ with finite mean $\mu$ and finite non-zero variance $\sigma^2$ , and for any positive constant $k$ :

P(|X - \mu| \geq \delta) \leq \frac{Var[X]}{\delta^2}

where:

$P(|X-\mu| \geq \delta)$ is the probability that the absolute deviation of the random variable $X$ from its mean $\mu$ is greater than or equal to $k$ times the standard deviation $\sigma$ .
$\mu = E(X)$ is the mean (expected value) of the random variable $X$ .
$Var[X] = \sigma^2$ is the variance of the random variable $X$ .

Chernoff Bounds

Chernoff Bounds are particularly useful in settings where an algorithm needs to ensure that the probability of an undesirable outcome (e.g., large deviation from the mean) is extremely low.

Given a sequence of independent random variables $X_1$ , $X_2$ , $...$ , $X_n$ with $0 \leq X_i \leq 1$ and their sum $S = X_1 + X_2 + ... + X_n$ , the Chernoff bounds give upper bounds on the probability that $S$ deviates significantly from its expected value $E[S]$ .

There are several forms of Chernoff bounds, but one common form is the following:

P(S \geq (1 + \delta)E[S]) \leq e^{-\frac{\delta^2E[S]}{3}}

P(S \leq (1 - \delta)E[S]) \leq e^{-\frac{\delta^2E[S]}{2}}

Comparison between Markov, Chebyshev, and Chernoff Bounds:
The bound given by Markov is the “weakest” one. It is constant and does not change as n increases. The bound given by Chebyshev’s inequality is “stronger” than the one given by Markov’s inequality. The strongest bound is the Chernoff bound. It goes to zero exponentially fast.

Binomial Distribution

The binomial distribution is the discrete probability distribution which shows the number of success in a sequence of $n$ independent Bernoulli trials (each with two possible outcomes, and with the same probability of success $p$ ).

Probability mass function:

\begin{aligned} f(k;n,p) &= Pr[X=k] \\ &= (^n_k) p^k (1-p)^{n-k}, 0 \leq k \leq n \\ \end{aligned}

Cumulative distribution function:

\begin{aligned} F(k;n,p) &= Pr[X \leq k] \\ \displaystyle &= \sum^k_{i=0} (^n_i) p^i (1-p)^{n-i}, 0 \leq k \leq n \end{aligned}

Normal Distribution

Normal distribution is also known as the Gaussian distribution. It describes data that tends to cluster around a mean. Many random processes tend towards a normal distribution as the number of trials increases (Central Limit Theorem).

When $\mu = E[X]=np$ and $\sigma^2 = Var[X]=np(1-p)$ are large enough, the binomial distribution can be approximated with a normal distribution.

$n$ : number of trials
$p$ : probability of success in each trial

Normal distribution with mean $\mu$ and variance $\sigma^2$ is:

f(x;\mu,\sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} e^{- \frac{1}{2}(\frac{x-\mu}{\sigma})^2}, x \in R

Poisson Distribution

Poisson distribution describes the probability of a given number of events happening in a fixed interval of time or space, under the assumption that these events happen with a known constant mean rate and independently of the time since the last event.

In the analysis of randomized algorithms, particularly those involving random sampling, the number of iterations or trials until a certain event occurs can often be modeled using a Poisson distribution (especially for Monte Carlo methods).

If $n$ is huge and $p$ is very small, the binomial distribution can be approximated by a Poisson distribution with $\mu = np$

Poisson distribution with mean $\mu > 0$ is:

f(k;\mu) = \frac{\mu^ke^{-\mu}}{k!}, k=0,1,2,...

Example: Coupon Collector

Suppose there is a lucky draw, in which you can get $n$ different types of coupons. Each time you collect a coupon, it’s chosen uniformly at random from the available types. The goal is to determine the expected number of draws to get a full collection of all coupons.

Algorithm:

Initialize an empty set to keep track of the types of coupons collected.
Repeat the following steps until all types of coupons are collected:
a. Choose a coupon uniformly at random from the available types.
b. If the chosen coupon is not already in the set, add it to the set.
Count the total number of coupons collected until all types are collected.

Analysis:
Let $X$ be the number of draws to get a full collection, $X_i$ is the number of draws to get the next different coupon, under the condition that you already have $i-1$ different types of coupon.

\begin{aligned} X &= \sum^{n}_{i=1} X_i \\ &= X_1 + X_2 + X_3 + ... + X_n \end{aligned}

E[X_i] = \frac{n}{n-i+1 }

\begin{aligned} E[X] &= \sum^{n}_{i=1} E[X_i] \\ &= \sum^{n}_{i=1} \left( \frac{n}{n-i+1} \right) \\ &= n \left( \frac{1}{n} + \frac{1}{n-1} + ... + \frac{1}{2} + \frac{1}{1} \right) \\ &= nlogn + O(n) \end{aligned}

Example: Randomized Quicksort

Quicksort is a popular algorithm used to sort numbers, it works by first choosing a pivot element, then sorting the numbers around that particular pivot. A randomized quicksort is designed to decrease the chances of the quicksort algorithm being executed in the worst case time complexity of $O(n^2)$ .

The running time of a randomized quicksort is independent of the input order. No assumptions need to be made about the input distribution. The worst case is determined only be the output of a random-number generator.

Algorithm:

Randomly select a number $x$ to be the pivot
Divided the remaining numbers into two groups:
- those smaller than $x$
- those larger than $x$
Sort each group recursively, and concatenate them.

Analysis:
We analyze the number of comparisons needed to fully sort all the numbers.

Let $T(n)$ be the random variable for the running time of randomized quicksort on an input size of $n$ , where random numbers are independent.

For $i=0,1,...,n-1$ , the indicator random variable is as follows:

X_i = \begin{cases} 1, \text{If partition generates a} k:n-k-1 \text{split} \\ 0, \text{otherwise} \end{cases}

Since all splits are likely equal, assuming elements are distinct, then:

E[X_i] = Pr[X_i = 1] = \frac{1}{n}

T(n) = O(nlogn)

Randomized Quicksort always outperform deterministic algorithms like heap sort and merge sort.

Example: Karger-Stein’s Algorithm

Karger-Stein’s algorithm is used to find minimum cuts in graphs, by iteratively contracting randomly chosen edges until only two edges remain.

Given a connected graph $G=V(E)$ which is undirected, unweighted, and has $|V|=n$ number of vertices, $|E|=m$ number of edges. We need to compute the smallest set of edges that will make $G$ disconnected.

A cut in a graph is a partition of the vertices into two non-empty parts. An edge crosses the cut if it goes from one side of the cut to the other. A minimum cut is a cut that has the fewest possible edges crossing it.

Algorithm:

Pick a random edge
Contract the chosen edge (merge the two endpoints together)
Repeat steps 1 and 2, until there are only two vertices left
Return the cut that corresponds to those two vertices

Theorem: the probability that Karger-Stein’s algorithm returns a minimum cut in a graph with $n$ vertices is at least $\frac{2}{n(n-1)}$ .

Example: Polynomial Identity Testing

Given a polynomial $P(x_1,x_2,x_3) = x_3^2(x_1+x_2)(x_1-x_2) + \left( x_1^2+(1+x_2)(1-x_2)\right)x_3^2 - x_3^2(x_1 + x_2)(x_1 - x_2)$ , we want to find out if this polynomial is equivalent to 0.

In fundamental theorem of algebra: a $n$ -variable polynomial with $d$ degreees has at most $d$ roots.

Analysis:
For $P(x_1,x_2,...,x_n)$ which has sample space $S$ ,

Choose $n$ number of random values $r_1, r_2, ... , r_n$ independently and uniformly from $S$ .
Eval( $P(r_1,r_2,...,r_n)$ )

\begin{cases} \text{if eval=0, output} P \equiv 0 \\ \text{else, output } P \neq 0 \end{cases}

When $n$ is true, $P(x_1,x_2,...,x_n) = x_1^k \cdot Q(x_2,...,x_n) + T(x_1,x_2,...,x_n)$ , the degree $(x_1) < k$

(x_1,x_2,...,x_n) = (r_1, r_2, ..., r_n)

The possibility that $Q$ is equal to zero:

Pr[Q=0] \leq \frac{d-k}{|S|}

The possibility that $Q$ isn not equal to zero:

Pr[Q \neq 0] \geq 1 - \frac{d-k}{|S|}

The possibility of $T$ is not zero when $Q$ is zero:

Pr[T \neq 0 | Q=0 ] = Pr\left[T(x_1,x_2,...,x_n) \leq \frac{k}{|S|} \right]

Further Notes

Expectation

The expectation (or expected value) of a random variable $X$ is denoted by $E(X)$ , which is a measure of the central tendency of the distribution of $X$ .

For discrete random variables

E(X) = \sum^n_{i=1} p_1, p_2,..., p_n

For continuous random variables

E(X) = \int^{\infty}_{-\infty} xf(x) dx

Variance

The variance of a random variable $X$ is denoted by $Var(X)$ or $\sigma^2$ . It measures the spread or dispersion of the distribution of $X$ .

Var(X) = E[(X-E(X))^2]

Introduction

Indicator Random Variable

High Probability Bounds

Markov’s Inequality

Chebyshev’s Inequality

Chernoff Bounds

Binomial Distribution

Normal Distribution

Poisson Distribution

Example: Coupon Collector

Example: Randomized Quicksort

Example: Karger-Stein’s Algorithm

Example: Polynomial Identity Testing

Further Notes

Expectation

Variance

References