Skip to content

Lecture Notes for 02/25/2026 - Chernoff Bounds and Hashing with Chaining

  • Chernoff bounds for sums of independent Bernoulli random variables
  • Expectation and concentration (tail bounds intuition)
  • Applying Chernoff bounds to hashing with chaining
  • High-probability guarantees on bucket sizes (chain lengths)

Let be independent Bernoulli random variables, where

Define the sum

This represents the number of successes (e.g., heads in biased coin tosses).

By linearity of expectation,

The Chernoff bounds apply because the variables are independent. Independence is essential for obtaining strong concentration guarantees.

The goal is to bound the probability that deviates significantly from its expectation (tail probabilities). Chernoff bounds provide exponentially small probabilities for large deviations.

For ,

For ,

These inequalities show that is highly concentrated around its expectation . As grows, the probability of large deviations decreases exponentially.

Intuitively, although each trial is random, averaging many independent trials makes extreme outcomes extremely unlikely.

An event occurs with high probability if it happens with probability at least (or another quantity approaching as grows).

We hash elements into buckets using a random hash function, where each element independently lands in any bucket with probability .

Fix a particular bucket. For each element , define

Then the bucket size is

Since , we obtain

The quantity is called the load factor (expected chain length).

Because the bucket size is a sum of independent Bernoulli indicator variables, Chernoff bounds can be applied directly to analyze load balance.

Using the upper-tail Chernoff bound,

Thus, the probability that a bucket becomes much larger than its expected size decreases exponentially in .

From One Bucket to All Buckets (Union Bound)

Section titled “From One Bucket to All Buckets (Union Bound)”

The Chernoff bound controls the size of a fixed bucket. To guarantee that no bucket becomes too large, we apply a union bound over all buckets.

This shows that with high probability, every bucket size remains close to its expectation.

A classical result states that when elements are hashed into buckets, the maximum bucket size is

This implies that hashing with chaining supports operations in nearly constant time with high probability.

With high probability:

  • Bucket sizes remain close to the expected load
  • Very long chains are unlikely
  • Hash table operations (search, insert, delete) remain efficient

Chernoff bounds are a central tool in analyzing randomized algorithms and data structures. They show strong concentration for sums of independent random variables and provide high-probability guarantees, such as bounding chain lengths in hashing with chaining.

Randomized algorithms often achieve near-deterministic performance: undesirable outcomes occur only with exponentially small probability.