Lecture 02/18/2026 - Chernoff Bound and Birthday Paradox
Scribe: Mauricio Monje
Summary of the Lecture
Section titled “Summary of the Lecture”- Recap FKS Hashing and HWC Analysis
- The Chernoff Bound
- The Birthday Paradox
- Balls and Bins Framework
Review Previous Lecture: FKS Hashing Analysis
Section titled “Review Previous Lecture: FKS Hashing Analysis”Comparison: FKS vs HWC
Section titled “Comparison: FKS vs HWC”Previously, we concluded that steps 1 and 2 of FKS Hashing take
| Operation | FKS | HWC |
|---|---|---|
| Preprocessing | ||
| Query |
We’ve proved that FKS hashing has
We also saw that, by Markov:
And that by Chebyshev:
We also briefly discussed that, by Chernoff:
If we were to concisely discuss what the Chernoff bound is, then we would say that it bounds tail probabilities by
Birthday Paradox
Section titled “Birthday Paradox”Problem
Section titled “Problem”In a room we have 24 people, what is the probability that some 2 people share the same birthday?
When an event’s probability is difficult to compute directly, we can instead compute the probability of its complement and subtract from 1.
Solution
Section titled “Solution”We consider the complement, that no two people share a birthday, meaning that everyone has a different birthday:
Therefore:
So in a room with just 24 people, there is more than a 50% chance that some two will share a birthday.
Balls and Bins
Section titled “Balls and Bins”Motivation: Scheduling Problem
Section titled “Motivation: Scheduling Problem”Imagine you are a scheduler with 10 machines. People come to you with different tasks (codes they want to run), and your job is to assign a machine to each task. These are called scheduling problems: you have a certain number of jobs and a certain number of machines, and you want to assign jobs to machines to finish them in the least amount of time (or optimize some other objective).
One of the simplest approaches is random scheduling: when someone comes with a task, you just randomly pick one of your machines and assign the task to it.
To analyze these randomized algorithms, we use a framework called balls and bins. This framework is also useful for understanding hashing algorithms, since a hash function takes keys and hashes them into cells of a hash table, where all cells are equally likely.
The Balls and Bins Framework
Section titled “The Balls and Bins Framework”When we throw
Interpretation of Each Term: Each term in the product represents the probability that the next ball goes into a new bin. The second ball has probability
Note: This formula assumes
(the number of balls does not exceed the number of bins). If , then by the pigeonhole principle, we are guaranteed to have at least one collision, so .
Connection to Birthday Paradox: If we relate people to balls and birthdays to bins, then throwing 24 balls into 365 bins gives us the birthday paradox. Here, 365 corresponds to the number of days in a year, and we’re asking for the probability that no date gets assigned more than one person, which is the same as everyone having a different birthday. When we plug in
Simplification Using
Section titled “Simplification Using ”To simplify this product, we use the following limits:
Note: We won’t go into the proof of these limits, but for now we’ll just take them as given.
For
If we raise both sides to the power of
Generalization
Section titled “Generalization”We can generalize the above to:
This is an important approximation to remember, as it applies to any term in our product formula.
Now we apply this to the product formula we saw earlier. The probability that no bin has 2 or more balls is:
Using our approximation, we replace each term:
When we multiply exponentials with the same base, we add the exponents:
The sum
Therefore, the probability that some bin has at least 2 balls is:
So, instead of manually calculating the probability for specific values of
Sanity Check
Section titled “Sanity Check”For
This confirms that the approximation works well for the birthday paradox with 23 people.
Inverting the Question
Section titled “Inverting the Question”If we flip the question around, then instead of being given the number of people and asked for the probability, we instead ask:
How many people should we have in a room so that
In this case,
If we rearrange this equation by isolating the exponential term, we get:
Taking the natural logarithm of both sides and dividing by -1, we have:
Since we cannot have a fractional number of people, then we can say that if we have 35 people in a room, there is an 80% chance that some two will share a birthday.