Skip to content

Lecture 7 on 02/18/2026 - Chernoff Bound and Birthday Paradox

Scribe: Mauricio Monje

  • Recap FKS Hashing and HWC Analysis
  • The Chernoff Bound
  • The Birthday Paradox
  • Balls and Bins Framework

In a room we have 24 people, what is the probability that some 2 people share the same birthday?

At first, this might seem unlikely. After all, there are 365 possible days, so intuitively you might think you’d need many more people for a collision. However, when you actually compute this probability, the result is surprisingly high - this is why it’s called the “paradox.” It’s not truly a mathematical paradox, but rather a result that contradicts most people’s intuition.

When an event’s probability is difficult to compute directly, we can instead compute the probability of its complement (the opposite event) and subtract from 1. The complement of “some 2 people share a birthday” is “no 2 people share a birthday,” which means everyone has a different birthday. This complementary event is actually much easier to count.

We set up the calculation using the complement:

Pr(some 2 share a birthday)=1Pr(everyone has different birthday)\Pr(\text{some 2 share a birthday}) = 1 - \Pr(\text{everyone has different birthday})

To compute the probability of the complement, we count favorable outcomes over total possible outcomes.

Counting total possible outcomes: Each of the 24 people independently has any of 365 possible birthdays, so there are 36524365^{24} total configurations.

Counting favorable outcomes: For everyone to have different birthdays:

  • The first person can have any of 365 days
  • The second person can have any of the remaining 364 days
  • The third person can have any of the remaining 363 days
  • And so on…

This gives us 365364363342365 \cdot 364 \cdot 363 \cdots 342 favorable outcomes, which can also be written as (36524)24!1=365!(36524)!\frac{\binom{365}{24} \cdot 24!}{1} = \frac{365!}{(365-24)!}.

Pr(everyone has different birthday)=36536434236524=0.4616\Pr(\text{everyone has different birthday}) = \frac{365 \cdot 364 \cdots 342}{365^{24}} = 0.4616

Therefore:

Pr(some 2 share a birthday)=10.46160.538454%\Pr(\text{some 2 share a birthday}) = 1 - 0.4616 \approx 0.5384 \approx 54\%

What this means: In a room with just 24 people, there is more than a 50% chance that some two will share a birthday. This is the surprising result that motivates the name “birthday paradox.”

Imagine you are a scheduler with 10 machines. People come to you with different tasks (codes they want to run), and your job is to assign a machine to each task. These are called scheduling problems: you have a certain number of jobs and a certain number of machines, and you want to assign jobs to machines to finish them in the least amount of time (or optimize some other objective).

One of the simplest approaches is random scheduling: when someone comes with a task, you just randomly pick one of your machines and assign the task to it.

To analyze these randomized algorithms, we use a framework called balls and bins. This framework is also useful for understanding hashing algorithms, since a hash function takes keys and hashes them into cells of a hash table, where all cells are equally likely.

When we throw mm balls into nn bins, the probability that no bin has 2 or more balls is:

Pr(no bin has 2 or more balls)=(11n)(12n)(1m1n)\Pr(\text{no bin has 2 or more balls}) = \left(1 - \frac{1}{n}\right)\left(1 - \frac{2}{n}\right) \cdots \left(1 - \frac{m-1}{n}\right)

Understanding Each Term: Let’s think through this product carefully. For no collisions to occur, each ball must land in an empty bin.

  • First ball: Can go anywhere, so its “probability of success” (going into an empty bin) is 1.
  • Second ball: The first ball occupies 1 bin. So there are n1n-1 empty bins out of nn total. The probability it avoids collision is n1n=11n\frac{n-1}{n} = 1 - \frac{1}{n}.
  • Third ball: The first two balls occupy 2 different bins. So there are n2n-2 empty bins out of nn total. The probability it avoids collision is n2n=12n\frac{n-2}{n} = 1 - \frac{2}{n}.
  • mm-th ball: The first m1m-1 balls occupy m1m-1 different bins. So there are n(m1)n-(m-1) empty bins. The probability it avoids collision is n(m1)n=1m1n\frac{n-(m-1)}{n} = 1 - \frac{m-1}{n}.

To have no collisions at all, all of these independent events must happen together, so we multiply the probabilities.

Note: This formula assumes mnm \leq n (the number of balls does not exceed the number of bins). If m>nm > n, then by the pigeonhole principle, we are guaranteed to have at least one collision, so Pr(no bin has 2 or more balls)=0\Pr(\text{no bin has 2 or more balls}) = 0.

Connection to Birthday Paradox: If we relate people to balls and birthdays to bins, then throwing 24 balls into 365 bins gives us the birthday paradox. Here, 365 corresponds to the number of days in a year, and we’re asking for the probability that no date gets assigned more than one person, which is the same as everyone having a different birthday. When we plug in m=24m = 24 and n=365n = 365 into the product above, we get the same 0.4616 that we calculated earlier for the birthday paradox.

Simplifying Products with the Constant ee

Section titled “Simplifying Products with the Constant eee”

The product we have contains many terms like (11n)(1 - \frac{1}{n}), (12n)(1 - \frac{2}{n}), etc. To make this more manageable, we use a fundamental mathematical constant called e2.71828e \approx 2.71828.

Approximation: For large values of nn, we have the relationship:

1knek/n1 - \frac{k}{n} \approx e^{-k/n}

This approximation comes from the well-known mathematical limits:

limn(11n)n=e1\lim_{n \to \infty} \left(1 - \frac{1}{n}\right)^n = e^{-1}

We won’t prove this here, but this is a fundamental result in calculus. The intuition is that when you have a term like (1kn)(1 - \frac{k}{n}) and nn is large, it’s very close to ek/ne^{-k/n}.

Why this helps: When we convert each term like (1kn)(1 - \frac{k}{n}) into ek/ne^{-k/n}, the product becomes much easier to work with. Instead of multiplying many complicated terms, we can add exponents.

Applying the Approximation to Balls and Bins

Section titled “Applying the Approximation to Balls and Bins”

Now we apply our approximation to the product formula. We start with:

Pr(no bin has 2 or more balls)=(11n)(12n)(1m1n)\Pr(\text{no bin has 2 or more balls}) = \left(1 - \frac{1}{n}\right)\left(1 - \frac{2}{n}\right) \cdots \left(1 - \frac{m-1}{n}\right)

Step 1: Replace each term using our approximation 1knek/n1 - \frac{k}{n} \approx e^{-k/n}:

e1/ne2/ne3/ne(m1)/n\approx e^{-1/n} \cdot e^{-2/n} \cdot e^{-3/n} \cdots e^{-(m-1)/n}

Step 2: Combine exponentials by adding exponents (since eaeb=ea+be^a \cdot e^b = e^{a+b}):

=e1+2+3++(m1)n= e^{-\frac{1+2+3+\cdots+(m-1)}{n}}

Step 3: Simplify the sum in the exponent. We know that 1+2++(m1)=m(m1)21 + 2 + \cdots + (m-1) = \frac{m(m-1)}{2}. For large mm, this is approximately m22\frac{m^2}{2}:

em22n\approx e^{-\frac{m^2}{2n}}

Step 4: Find the probability of at least one collision by subtracting from 1:

Pr(some bin has at least 2 balls)1em2/2n\Pr(\text{some bin has at least 2 balls}) \approx 1 - e^{-m^2/2n}

What this gives us: Instead of laboriously calculating the product for specific values, we now have a simple closed-form approximation. This is much faster to use and reveals the essential behavior of the system.

Sanity Check: Comparing with the Birthday Paradox

Section titled “Sanity Check: Comparing with the Birthday Paradox”

Let’s verify our approximation works by checking it against the birthday paradox numbers we calculated earlier.

For the birthday paradox, we have m=23m = 23 people and n=365n = 365 days. Let’s compute:

Pr(some 2 people share a birthday)1em2/(2n)\Pr(\text{some 2 people share a birthday}) \approx 1 - e^{-m^2/(2n)}

Substituting our values:

=1e232/(2365)=1e529/7301e0.725= 1 - e^{-23^2/(2 \cdot 365)} = 1 - e^{-529/730} \approx 1 - e^{-0.725}

Evaluating e0.7250.484e^{-0.725} \approx 0.484, we get:

10.4840.516 or about 51.6%\approx 1 - 0.484 \approx 0.516 \text{ or about } 51.6\%

Comparison: Earlier we calculated the exact probability for 24 people to be about 54%. Our formula here with 23 people gives about 51.6%. These are very close! This shows that our approximation using ee captures the behavior of the birthday paradox very well, and the approximation is accurate even for moderate values of nn.

Inverting the Question: How Many People Do We Need?

Section titled “Inverting the Question: How Many People Do We Need?”

So far we’ve been given the number of people and calculated the probability. Now let’s flip it: How many people do we need for an 80% chance of a collision?

With mm unknown and n=365n = 365, we set up:

Pr(some 2 share a birthday)=0.8\Pr(\text{some 2 share a birthday}) = 0.8

Using our formula:

1em2/(2365)=0.81 - e^{-m^2/(2 \cdot 365)} = 0.8

Step 1: Isolate the exponential. Subtract 1 from both sides and multiply by -1:

em2/730=0.2e^{-m^2/730} = 0.2

Step 2: Take the natural logarithm. This helps us get mm out of the exponent:

ln(em2/730)=ln(0.2)\ln\left(e^{-m^2/730}\right) = \ln(0.2) m2730=ln(0.2)-\frac{m^2}{730} = \ln(0.2)

Step 3: Solve for mm. Multiply both sides by -730:

m2=730ln(0.2)=730ln(5)m^2 = -730 \ln(0.2) = 730 \ln(5)

(Note: ln(0.2)=ln(1/5)=ln(5)\ln(0.2) = \ln(1/5) = -\ln(5), so 730ln(0.2)=730ln(5)-730 \ln(0.2) = 730 \ln(5))

Taking the square root:

m=730ln(5)7301.609117434.27m = \sqrt{730 \ln(5)} \approx \sqrt{730 \cdot 1.609} \approx \sqrt{1174} \approx 34.27

Conclusion: Since we need a whole number of people, we need 35 people in a room to have approximately an 80% chance that two will share a birthday.