Lecture 10 on 03/02/2026 - Linear Probing and Hashing Complexity

Lecture on 03/02/2026

Scribes: Xuexiong Wu

Review

Review: Dictionary/Membership Problem by HWC and FSK

	Pre-processing	Query
HWC	$O(n)$ worst case $O(n)$ expected	$E(Q)=1$ $\Pr(Q > t) < e^{(-t \ln t)/3}$ (Chernoff) $Q \text{ is } O\!\left(\frac{\ln n}{\ln \ln n}\right) \text{ w.h.p.}$ $Q \text{ is } O\!\left(\frac{\ln n}{\ln \ln n}\right) \text{ in the worst case}$
FSK	$O(n)$ expected	$O(1)$ worst case

What does Chernoff actually mean? At its core, Chernoff tells us something powerful: if a random variable is the sum of independent Bernoulli variables, then it stays very close to its expectation all the time.

To understand this intuitively: imagine flipping many independent coins and counting heads. You expect about half to be heads, but you’re curious—how far can the actual count deviate from this expectation? Chernoff answers this: deviations become exponentially less likely the further you go from expectation. This is remarkable because for other types of random variables, deviations might be quite common.

More formally, Chernoff bounds give us precise probabilities: the probability that a sum-of-Bernoullis random variable deviates far from its expectation is very small. If you understand the expectation of such a random variable, you can confidently trust that the actual value will stay close to that expectation.

When can we use Chernoff? Chernoff can only apply to random variables that are sums of independent Bernoulli variables. This is crucial: if your random variable has this structure, Chernoff gives you very strong (exponential) bounds. When the random variable is NOT a sum of independent Bernoullis, we can only use Markov or Chebyshev, which give weaker bounds.

Why keep Markov and Chebyshev? You might ask: if Chernoff is stronger, why do we ever use Markov or Chebyshev? The answer is that Chernoff only works when you have the specific structure of a sum of independent Bernoullis. Many random variables don’t have this structure. In those cases, Markov and Chebyshev are the only tools available, so they’re still essential.

HWC: The Pre-Processing time is $O(n)$ in the worst case. The expected Query time is $E(Q)=1$ . Or we could say the $\Pr(Q > t) < e^{(-t \ln t)/3}$ by Chernoff, $\Pr(Q > t) \leq \frac{1}{t}$ by Markov or $\Pr(Q > t) \leq \frac{1}{t^2}$ by Chebyshev.

FSK: The Pre-Processing time is $O(n)$ expected, the Query time is $E(Q)=1$ in worst case.

Linear Probing

Preprocessing

Let $S = \{X_1, X_2, \dots, X_n\}$ be a set of keys, and let the hash table have $m$ cells. We want to hash the keys one by one.

If the cell $h(X_i)$ is empty, we store key $X_i$ in cell $h(X_i)$ .
If the cell $h(X_i)$ is already occupied, we walk to the right from $h(X_i)$ until we find an empty slot.
Suppose we walk $t$ steps. Then we store $X_i$ in cell

h(X_i) + t

Query

Compute $h(q)$ , look at cell $h(q)$ .
If $q$ is there, return yes
If $q$ is not there, move to the right until we find $q$ or we find an empty slot. If $q$ is found, return yes. If empty slot is found, return no.

Who is Donald Knuth?

Before discussing his remarkable analysis, it’s worth knowing who Donald Knuth is. Knuth is one of the most influential computer scientists of all time. He authored The Art of Computer Programming, a legendary five-volume series considered the encyclopedia of algorithms and the definitive reference on many topics in computer science. These volumes contain an incredible collection of problems, some of which remain open research questions today.

Interestingly, Knuth still offers rewards (originally 1 hexadecimal dollar = $2.56$ ) for anyone who finds an error in his books—a testament to their precision. Additionally, Knuth invented LaTeX, the very typesetting system you’re likely using to write your lecture notes!

Donald Knuth’s Analysis

E(\text{Succ. Query}) = 1 + \frac{1}{1-\alpha}, \quad \alpha (\text{Load Factor}) = \frac{\text{number of keys}}{\text{number of cells}} = \frac{n}{m} < 1

E(\text{Unsucc. Query}) = 1 + \frac{1}{(1-\alpha)^2}

Here the $1-\alpha$ or $(1-\alpha)^2$ is the steps we have to walk to the right until we find an empty cell. Therefore, if the $\alpha$ increases, the expected query time also increases.

\text{Assume } \alpha = \frac{1}{5}, \text{ we will show } E(Q) = O(1)

Analysis

Block Definition and Intuition

To understand the behavior of linear probing, we need to introduce the concept of a block. This is the key to understanding why linear probing works well.

Intuitive idea: Imagine the hash table as a row of cells. When you insert keys, some cells become occupied and some remain empty. A block is simply a contiguous group of occupied cells sandwiched between two empty cells.

Formal definition: A sequence of $B$ consecutive cells is called a block if:

The cell immediately before this sequence is empty (no one hashes there)
The cell immediately after this sequence is empty (no one hashes there)
Exactly $B$ keys have hashed into this block

$LaTeX diagram$

Why blocks matter for query time: When you do a query and your hash value lands in a block, you have to walk through the entire block to find an empty cell. Therefore, your query time is determined by the size of the block you land in. The longer the blocks, the longer your queries take.

What makes a block “bad”? We call a block of $B$ cells bad when it is completely full (all $B$ cells occupied). When you insert a new element whose hash location falls into a bad block, you must walk past all $B$ occupied cells before finding an empty slot. This is expensive.

How many keys end up in a block? Let’s think about expectations. When we have $n$ keys and $m$ cells with load factor $\alpha = \frac{n}{m}$ , the expected number of keys hashing to a single cell is $\alpha$ . For a block of $B$ consecutive cells, we expect:

E[\text{keys in block}] = \alpha B

Now here’s the intuition: If $\alpha < 1$ (fewer keys than cells), then we expect fewer than $B$ keys in any $B$ -sized block. But what if, by bad luck, a block actually gets $B$ or more keys? That would be a significant deviation from expectation. This is exactly where Chernoff bounds help: they quantify how unlikely such “bad” scenarios are.

Applying Chernoff: Step-by-Step Analysis

Let $X$ be the random variable representing the number of keys hashing to a particular block of size $B$ .

What we know:

$E(X) = \alpha B$ (expected keys in the block)
A block is bad if $X \geq B$ (at least $B$ keys in the block)
We want to bound $\Pr(\text{block is bad}) = \Pr(X \geq B)$

Concrete example: Assume $\alpha = \frac{1}{5}$ (5 times more cells than keys). Then:

For a block of size $B = 10$ : we expect only $\frac{10}{5} = 2$ keys in the block
But the block is bad if 10 keys somehow hash into it
We need to find the probability of this worst-case 5× deviation from expectation

Setting up Chernoff: We want to apply the Chernoff bound, which comes in the following form for sums of Bernoullis:

\Pr(X \geq (1+\delta) E(X)) \leq e^{-\frac{1}{3}\delta^2 E(X)}

This tells us: the probability that $X$ exceeds its expectation by a factor of $(1+\delta)$ decays exponentially in $\delta$ .

Finding $\delta$ for our case: We want to find when a block is bad, which means $X \geq B$ . So we need to express this in the form $(1+\delta)E(X)$ by solving:

(1+\delta) E(X) = B

Since $E(X) = \alpha B$ , we substitute:

(1+\delta) \alpha B = B

Dividing both sides by $\alpha B$ :

1 + \delta = \frac{1}{\alpha}

Therefore:

\delta = \frac{1}{\alpha} - 1

Concrete numbers: For $\alpha = \frac{1}{5}$ (meaning we have 5 times more cells than keys):

\delta = \frac{1}{1/5} - 1 = 5 - 1 = 4

This means a bad block requires a $5\times$ deviation from the expected number of keys.

Plugging into Chernoff: Now we substitute our expression for $\delta$ into the Chernoff bound:

\Pr(X \geq B) = \Pr(X \geq (1+\delta)E(X)) \leq e^{-\frac{1}{3}\delta^2 E(X)}

Substituting $\delta = \frac{1}{\alpha} - 1$ and $E(X) = \alpha B$ :

\Pr(X \geq B) \leq e^{-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha B}

Important simplification: Look at the exponent:

-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha B

Notice that the term $-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha$ is a constant—it only depends on our choice of $\alpha$ , not on $B$ . Let’s call this constant $c$ :

c = \frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha

Then we can rewrite our probability bound more simply as:

\Pr(X \geq B) \leq e^{-c B} = \left(e^{-c}\right)^B

Define $C = e^{-c}$ . Since $c > 0$ , we have $C < 1$ . Therefore:

\Pr(X \geq B) \leq C^B

What this means—Exponential Decay: The probability that a block is bad decreases exponentially with block size — this is crucial. For example, if $C = 0.7$ :

$B=1$ : probability $\leq 0.7$ (70%)
$B=2$ : probability $\leq (0.7)^2 = 0.49$ (49%)
$B=5$ : probability $\leq (0.7)^5 \approx 0.168$ (17%)
$B=10$ : probability $\leq (0.7)^{10} \approx 0.028$ (2.8%)

You can see how rapidly the probabilities shrink. This exponential decay is exactly what we need: it ensures that long blocks are very rare, which means our expected insertion time stays constant.

Expected insertion time: The insertion (and query) time is proportional to the length of the bad block containing the hash location. Thus:

E[\text{insertion time}] = E[\text{length of bad block containing the query}]

Computing Expected Insertion Time via Series Summation

Now let’s compute the expected insertion time. When you hash a query into position $h(q)$ , you might collide with blocks of various sizes. How many blocks of size $B$ could contain your hash position?

Blocks containing a position: A hash position $h(q)$ can be contained in up to $B$ different blocks of size $B$ (one where $h(q)$ is the rightmost position, one where it’s second-from-right, …, one where it’s the leftmost position). So there are $B$ possible $B$ -sized blocks that could contain any given hash position.

$LaTeX diagram$

Expected insertion time formula:

E[\text{insertion time}] = \sum_{B=1}^{M} B \times P(\text{at least one block of size } B \text{ is bad})

By overestimating (even if all $B$ blocks of size $B$ are bad), we get:

E[\text{insertion time}] \leq \sum_{B=1}^{M} B \times B \times \Pr(\text{a specific block of size } B \text{ is bad})

Substituting our Chernoff bound:

E[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 \times C^B

Why Chernoff is Essential: Comparison with Markov

Now let’s see why we needed Chernoff and why simpler inequalities don’t work.

What if we used Markov? Recall that Markov’s inequality says:

\Pr(X \geq B) \leq \frac{E[X]}{B}

In our case, with $E[X] = \alpha B$ :

\Pr(X \geq B) \leq \frac{\alpha B}{B} = \alpha

Notice something troubling: this bound doesn’t depend on $B$ at all! Whether we’re looking at blocks of size 1 or size 1000, Markov gives us the same bound $\alpha$ . This is too weak.

Why this breaks our analysis: If we tried to compute expected insertion time using Markov’s bound, we’d get:

E[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 \times \Pr(\text{a block of size } B \text{ is bad})

\leq \sum_{B=1}^{M} B^2 \times \alpha = \alpha \sum_{B=1}^{M} B^2

The sum $\sum_{B=1}^{M} B^2$ grows as $O(M^3)$ . Since $M = O(n)$ , we’d conclude that expected insertion time is $O(n^3)$ —a terrible result!

Why Chernoff succeeds: Chernoff gives us the exponential bound $C^B$ where $C < 1$ . The exponential decay dominates the polynomial growth of $B^2$ , making the sum converge to a constant. This is the power of Chernoff: it exploits the specific structure of sums of independent Bernoullis, while Markov only uses the expectation.

Series Convergence: Showing the Sum is O(1)

Now we need to compute the sum of all our probabilities over all possible block sizes:

E[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 C^B

The question is: does this sum stay bounded as $M$ grows? Or does it blow up?

Intuition—Exponential beats polynomial: This is where the magic happens. We have:

$B^2$ grows polynomially (slowly)
$C^B$ decays exponentially (very fast)

When exponential decay competes with polynomial growth, exponential always wins.

Making this rigorous: For large enough $B$ , the term $C^B$ becomes incredibly small—smaller than $\frac{1}{B^4}$ . So:

\sum_{B=1}^{\infty} B^2 C^B = \underbrace{\sum_{B=1}^{3} B^2 C^B}_{\text{finitely many terms}} + \underbrace{\sum_{B=4}^{\infty} B^2 C^B}_{\text{bounded by } \sum_{B=4}^{\infty} \frac{1}{B^2}}

The first sum has just a few terms, so it’s finite. The second sum is bounded by the convergent series:

\sum_{B=4}^{\infty} \frac{1}{B^2} \leq \sum_{B=1}^{\infty} \frac{1}{B^2} = \frac{\pi^2}{6} \approx 1.64

Conclusion: The entire series $\sum_{B=1}^{M} B^2 C^B$ is bounded by some constant (which may depend on $\alpha$ , but not on $M$ or $n$ ). Therefore:

E[\text{insertion time}] = O(1) \text{ (constant!)}

This completes the argument. With the right load factor (like $\alpha = \frac{1}{5}$ ), linear probing achieves constant expected insertion and query time.

The Proof in Two Sentences

The complete analysis can be summarized as follows:

Long runs are very improbable: If you have enough cells compared to your keys (load factor $\alpha < 1$ ), then finding long contiguous runs of packed elements should be rare.
They become more improbable as they get longer: The probability that a block is bad decreases exponentially with its length ( $C^B$ where $C < 1$ ). This exponential decay dominates the polynomial growth of block sizes, making the expected insertion time sum to a constant.

Together, these two observations guarantee that linear probing—despite being simple to implement—achieves constant expected insertion and query time when the load factor is kept as a small constant (like $\frac{1}{5}$ ).