Lecture 9 on 02/25/2026 - Chernoff Bounds and Hashing with Chaining

Summary of the lecture:

Chernoff bounds for sums of independent Bernoulli random variables
Expectation and concentration (tail bounds intuition)
Applying Chernoff bounds to hashing with chaining
High-probability guarantees on bucket sizes (chain lengths)

Chernoff Bounds: Setup and Motivation

Let $X_1, X_2, \dots, X_n$ be independent Bernoulli random variables, where

X_i = \begin{cases} 1 & \text{with probability } p_i \\ 0 & \text{with probability } 1 - p_i \end{cases}

Define the sum

X = \sum\_{i=1}^{n} X_i.

This represents the number of successes (e.g., heads in $n$ biased coin tosses).

By linearity of expectation,

\mu = \mathbb{E}[X] = \sum*{i=1}^{n}\mathbb{E}[X_i] = \sum*{i=1}^{n} p_i.

The Chernoff bounds apply because the variables $X_1,\dots,X_n$ are independent. Independence is essential for obtaining strong concentration guarantees.

The goal is to bound the probability that $X$ deviates significantly from its expectation $\mu$ (tail probabilities). Chernoff bounds provide exponentially small probabilities for large deviations.

Chernoff Bounds

Upper Tail

For $\delta > 0$ ,

\Pr\bigl[X \ge (1+\delta)\mu\bigr] \le \exp\!\left(-\frac{\delta^2 \mu}{3}\right).

Lower Tail

For $0 < \delta < 1$ ,

\Pr\bigl[X \le (1-\delta)\mu\bigr] \le \exp\!\left(-\frac{\delta^2 \mu}{2}\right).

Interpretation

These inequalities show that $X$ is highly concentrated around its expectation $\mu$ . As $\mu$ grows, the probability of large deviations decreases exponentially.

Intuitively, although each trial is random, averaging many independent trials makes extreme outcomes extremely unlikely.

High Probability

An event occurs with high probability if it happens with probability at least $1 - 1/n$ (or another quantity approaching $1$ as $n$ grows).

Application: Hashing with Chaining

We hash $n$ elements into $m$ buckets using a random hash function, where each element independently lands in any bucket with probability $1/m$ .

Fix a particular bucket. For each element $i$ , define

X_i = \begin{cases} 1 & \text{if element $i$ hashes to the bucket} \\ 0 & \text{otherwise}. \end{cases}

Then the bucket size is

X = \sum\_{i=1}^{n} X_i.

Since $\Pr[X_i = 1] = 1/m$ , we obtain

\mu = \mathbb{E}[X] = \frac{n}{m}.

The quantity $\frac{n}{m}$ is called the load factor (expected chain length).

Because the bucket size is a sum of independent Bernoulli indicator variables, Chernoff bounds can be applied directly to analyze load balance.

Bounding Long Chains

Using the upper-tail Chernoff bound,

\Pr\bigl[X \ge (1+\delta)\mu\bigr] \le \exp\!\left(-\frac{\delta^2 \mu}{3}\right).

Thus, the probability that a bucket becomes much larger than its expected size decreases exponentially in $\mu$ .

From One Bucket to All Buckets (Union Bound)

The Chernoff bound controls the size of a fixed bucket. To guarantee that no bucket becomes too large, we apply a union bound over all $m$ buckets.

This shows that with high probability, every bucket size remains close to its expectation.

Maximum Chain Length

A classical result states that when $n$ elements are hashed into $n$ buckets, the maximum bucket size is

O\!\left(\frac{\log n}{\log\log n}\right) \quad \text{with high probability}.

This implies that hashing with chaining supports operations in nearly constant time with high probability.

Consequences

With high probability:

Bucket sizes remain close to the expected load $\frac{n}{m}$
Very long chains are unlikely
Hash table operations (search, insert, delete) remain efficient

Conclusion

Chernoff bounds are a central tool in analyzing randomized algorithms and data structures. They show strong concentration for sums of independent random variables and provide high-probability guarantees, such as bounding chain lengths in hashing with chaining.

Intuition

Randomized algorithms often achieve near-deterministic performance: undesirable outcomes occur only with exponentially small probability.