Skip to content

Lecture 02/18/2026 - Chernoff Bound and Birthday Paradox

Scribe: Mauricio Monje

  • Recap FKS Hashing and HWC Analysis
  • The Chernoff Bound
  • The Birthday Paradox
  • Balls and Bins Framework

Review Previous Lecture: FKS Hashing Analysis

Section titled “Review Previous Lecture: FKS Hashing Analysis”

Previously, we concluded that steps 1 and 2 of FKS Hashing take time in expectation.

OperationFKSHWC
Preprocessing expected
Query expected

We’ve proved that FKS hashing has worst-case query time and expected preprocessing time.

We also saw that, by Markov:

And that by Chebyshev:

We also briefly discussed that, by Chernoff:

If we were to concisely discuss what the Chernoff bound is, then we would say that it bounds tail probabilities by . We should also note that the Chernoff bound is only applicable when the random variable is a sum of independent random variables.

In a room we have 24 people, what is the probability that some 2 people share the same birthday?

When an event’s probability is difficult to compute directly, we can instead compute the probability of its complement and subtract from 1.

We consider the complement, that no two people share a birthday, meaning that everyone has a different birthday:

Therefore:

So in a room with just 24 people, there is more than a 50% chance that some two will share a birthday.

Imagine you are a scheduler with 10 machines. People come to you with different tasks (codes they want to run), and your job is to assign a machine to each task. These are called scheduling problems: you have a certain number of jobs and a certain number of machines, and you want to assign jobs to machines to finish them in the least amount of time (or optimize some other objective).

One of the simplest approaches is random scheduling: when someone comes with a task, you just randomly pick one of your machines and assign the task to it.

To analyze these randomized algorithms, we use a framework called balls and bins. This framework is also useful for understanding hashing algorithms, since a hash function takes keys and hashes them into cells of a hash table, where all cells are equally likely.

When we throw balls into bins, the probability that no bin has 2 or more balls is:

Interpretation of Each Term: Each term in the product represents the probability that the next ball goes into a new bin. The second ball has probability of not going into the first ball’s bin, which would be a collision. The third ball has probability of not going into either of the first two balls’ bins. This continues until the -th ball has probability of going into a new bin, since bins are already occupied by the previous balls.

Note: This formula assumes (the number of balls does not exceed the number of bins). If , then by the pigeonhole principle, we are guaranteed to have at least one collision, so .

Connection to Birthday Paradox: If we relate people to balls and birthdays to bins, then throwing 24 balls into 365 bins gives us the birthday paradox. Here, 365 corresponds to the number of days in a year, and we’re asking for the probability that no date gets assigned more than one person, which is the same as everyone having a different birthday. When we plug in and into the product above, we get the same 0.4616 that we calculated earlier for the birthday paradox.

To simplify this product, we use the following limits:

Note: We won’t go into the proof of these limits, but for now we’ll just take them as given.

For large enough, we can approximate:

If we raise both sides to the power of , we get:

We can generalize the above to:

This is an important approximation to remember, as it applies to any term in our product formula.

Now we apply this to the product formula we saw earlier. The probability that no bin has 2 or more balls is:

Using our approximation, we replace each term:

When we multiply exponentials with the same base, we add the exponents:

The sum (for large ), so we get:

Therefore, the probability that some bin has at least 2 balls is:

So, instead of manually calculating the probability for specific values of and in the original product, we can use this approximation to quickly estimate the probability of collisions in the balls and bins framework.

For and :

This confirms that the approximation works well for the birthday paradox with 23 people.

If we flip the question around, then instead of being given the number of people and asked for the probability, we instead ask:

How many people should we have in a room so that ?

In this case, is unknown and . We know that:

If we rearrange this equation by isolating the exponential term, we get:

Taking the natural logarithm of both sides and dividing by -1, we have:

Since we cannot have a fractional number of people, then we can say that if we have 35 people in a room, there is an 80% chance that some two will share a birthday.