Skip to content

Lecture 10 on 03/02/2026 - Linear Probing and Hashing Complexity

Scribes: Xuexiong Wu

Review: Dictionary/Membership Problem by HWC and FSK

Section titled “Review: Dictionary/Membership Problem by HWC and FSK”
Pre-processingQuery
HWCO(n)O(n) worst case
O(n)O(n) expected
E(Q)=1E(Q)=1
Pr(Q>t)<e(tlnt)/3\Pr(Q > t) < e^{(-t \ln t)/3} (Chernoff)
Q is O ⁣(lnnlnlnn) w.h.p.Q \text{ is } O\!\left(\frac{\ln n}{\ln \ln n}\right) \text{ w.h.p.}
Q is O ⁣(lnnlnlnn) in the worst caseQ \text{ is } O\!\left(\frac{\ln n}{\ln \ln n}\right) \text{ in the worst case}
FSKO(n)O(n) expectedO(1)O(1) worst case

What does Chernoff actually mean? At its core, Chernoff tells us something powerful: if a random variable is the sum of independent Bernoulli variables, then it stays very close to its expectation all the time.

To understand this intuitively: imagine flipping many independent coins and counting heads. You expect about half to be heads, but you’re curious—how far can the actual count deviate from this expectation? Chernoff answers this: deviations become exponentially less likely the further you go from expectation. This is remarkable because for other types of random variables, deviations might be quite common.

More formally, Chernoff bounds give us precise probabilities: the probability that a sum-of-Bernoullis random variable deviates far from its expectation is very small. If you understand the expectation of such a random variable, you can confidently trust that the actual value will stay close to that expectation.

When can we use Chernoff? Chernoff can only apply to random variables that are sums of independent Bernoulli variables. This is crucial: if your random variable has this structure, Chernoff gives you very strong (exponential) bounds. When the random variable is NOT a sum of independent Bernoullis, we can only use Markov or Chebyshev, which give weaker bounds.

Why keep Markov and Chebyshev? You might ask: if Chernoff is stronger, why do we ever use Markov or Chebyshev? The answer is that Chernoff only works when you have the specific structure of a sum of independent Bernoullis. Many random variables don’t have this structure. In those cases, Markov and Chebyshev are the only tools available, so they’re still essential.

HWC: The Pre-Processing time is O(n)O(n) in the worst case. The expected Query time is E(Q)=1E(Q)=1. Or we could say the Pr(Q>t)<e(tlnt)/3\Pr(Q > t) < e^{(-t \ln t)/3} by Chernoff, Pr(Q>t)1t\Pr(Q > t) \leq \frac{1}{t} by Markov or Pr(Q>t)1t2\Pr(Q > t) \leq \frac{1}{t^2} by Chebyshev.

FSK: The Pre-Processing time is O(n)O(n) expected, the Query time is E(Q)=1E(Q)=1 in worst case.

Let S={X1,X2,,Xn}S = \{X_1, X_2, \dots, X_n\} be a set of keys, and let the hash table have mm cells. We want to hash the keys one by one.

  • If the cell h(Xi)h(X_i) is empty, we store key XiX_i in cell h(Xi)h(X_i).
  • If the cell h(Xi)h(X_i) is already occupied, we walk to the right from h(Xi)h(X_i) until we find an empty slot.
  • Suppose we walk tt steps. Then we store XiX_i in cell
h(Xi)+th(X_i) + t
  • Compute h(q)h(q), look at cell h(q)h(q).
  • If qq is there, return yes
  • If qq is not there, move to the right until we find qq or we find an empty slot. If qq is found, return yes. If empty slot is found, return no.

Before discussing his remarkable analysis, it’s worth knowing who Donald Knuth is. Knuth is one of the most influential computer scientists of all time. He authored The Art of Computer Programming, a legendary five-volume series considered the encyclopedia of algorithms and the definitive reference on many topics in computer science. These volumes contain an incredible collection of problems, some of which remain open research questions today.

Interestingly, Knuth still offers rewards (originally 1 hexadecimal dollar = 2.562.56) for anyone who finds an error in his books—a testament to their precision. Additionally, Knuth invented LaTeX, the very typesetting system you’re likely using to write your lecture notes!

E(Succ. Query)=1+11α,α(Load Factor)=number of keysnumber of cells=nm<1E(\text{Succ. Query}) = 1 + \frac{1}{1-\alpha}, \quad \alpha (\text{Load Factor}) = \frac{\text{number of keys}}{\text{number of cells}} = \frac{n}{m} < 1 E(Unsucc. Query)=1+1(1α)2E(\text{Unsucc. Query}) = 1 + \frac{1}{(1-\alpha)^2}

Here the 1α1-\alpha or (1α)2(1-\alpha)^2 is the steps we have to walk to the right until we find an empty cell. Therefore, if the α\alpha increases, the expected query time also increases.

Assume α=15, we will show E(Q)=O(1)\text{Assume } \alpha = \frac{1}{5}, \text{ we will show } E(Q) = O(1)

To understand the behavior of linear probing, we need to introduce the concept of a block. This is the key to understanding why linear probing works well.

Intuitive idea: Imagine the hash table as a row of cells. When you insert keys, some cells become occupied and some remain empty. A block is simply a contiguous group of occupied cells sandwiched between two empty cells.

Formal definition: A sequence of BB consecutive cells is called a block if:

  • The cell immediately before this sequence is empty (no one hashes there)
  • The cell immediately after this sequence is empty (no one hashes there)
  • Exactly BB keys have hashed into this block

LaTeX diagram

Why blocks matter for query time: When you do a query and your hash value lands in a block, you have to walk through the entire block to find an empty cell. Therefore, your query time is determined by the size of the block you land in. The longer the blocks, the longer your queries take.

What makes a block “bad”? We call a block of BB cells bad when it is completely full (all BB cells occupied). When you insert a new element whose hash location falls into a bad block, you must walk past all BB occupied cells before finding an empty slot. This is expensive.

How many keys end up in a block? Let’s think about expectations. When we have nn keys and mm cells with load factor α=nm\alpha = \frac{n}{m}, the expected number of keys hashing to a single cell is α\alpha. For a block of BB consecutive cells, we expect:

E[keys in block]=αBE[\text{keys in block}] = \alpha B

Now here’s the intuition: If α<1\alpha < 1 (fewer keys than cells), then we expect fewer than BB keys in any BB-sized block. But what if, by bad luck, a block actually gets BB or more keys? That would be a significant deviation from expectation. This is exactly where Chernoff bounds help: they quantify how unlikely such “bad” scenarios are.

Let XX be the random variable representing the number of keys hashing to a particular block of size BB.

What we know:

  • E(X)=αBE(X) = \alpha B (expected keys in the block)
  • A block is bad if XBX \geq B (at least BB keys in the block)
  • We want to bound Pr(block is bad)=Pr(XB)\Pr(\text{block is bad}) = \Pr(X \geq B)

Concrete example: Assume α=15\alpha = \frac{1}{5} (5 times more cells than keys). Then:

  • For a block of size B=10B = 10: we expect only 105=2\frac{10}{5} = 2 keys in the block
  • But the block is bad if 10 keys somehow hash into it
  • We need to find the probability of this worst-case 5× deviation from expectation

Setting up Chernoff: We want to apply the Chernoff bound, which comes in the following form for sums of Bernoullis:

Pr(X(1+δ)E(X))e13δ2E(X)\Pr(X \geq (1+\delta) E(X)) \leq e^{-\frac{1}{3}\delta^2 E(X)}

This tells us: the probability that XX exceeds its expectation by a factor of (1+δ)(1+\delta) decays exponentially in δ\delta.

Finding δ\delta for our case: We want to find when a block is bad, which means XBX \geq B. So we need to express this in the form (1+δ)E(X)(1+\delta)E(X) by solving:

(1+δ)E(X)=B(1+\delta) E(X) = B

Since E(X)=αBE(X) = \alpha B, we substitute:

(1+δ)αB=B(1+\delta) \alpha B = B

Dividing both sides by αB\alpha B:

1+δ=1α1 + \delta = \frac{1}{\alpha}

Therefore:

δ=1α1\delta = \frac{1}{\alpha} - 1

Concrete numbers: For α=15\alpha = \frac{1}{5} (meaning we have 5 times more cells than keys):

δ=11/51=51=4\delta = \frac{1}{1/5} - 1 = 5 - 1 = 4

This means a bad block requires a 5×5\times deviation from the expected number of keys.

Plugging into Chernoff: Now we substitute our expression for δ\delta into the Chernoff bound:

Pr(XB)=Pr(X(1+δ)E(X))e13δ2E(X)\Pr(X \geq B) = \Pr(X \geq (1+\delta)E(X)) \leq e^{-\frac{1}{3}\delta^2 E(X)}

Substituting δ=1α1\delta = \frac{1}{\alpha} - 1 and E(X)=αBE(X) = \alpha B:

Pr(XB)e13(1α1)2αB\Pr(X \geq B) \leq e^{-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha B}

Important simplification: Look at the exponent:

13(1α1)2αB-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha B

Notice that the term 13(1α1)2α-\frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha is a constant—it only depends on our choice of α\alpha, not on BB. Let’s call this constant cc:

c=13(1α1)2αc = \frac{1}{3}\left(\frac{1}{\alpha}-1\right)^2 \alpha

Then we can rewrite our probability bound more simply as:

Pr(XB)ecB=(ec)B\Pr(X \geq B) \leq e^{-c B} = \left(e^{-c}\right)^B

Define C=ecC = e^{-c}. Since c>0c > 0, we have C<1C < 1. Therefore:

Pr(XB)CB\Pr(X \geq B) \leq C^B

What this means—Exponential Decay: The probability that a block is bad decreases exponentially with block size — this is crucial. For example, if C=0.7C = 0.7:

  • B=1B=1: probability 0.7\leq 0.7 (70%)
  • B=2B=2: probability (0.7)2=0.49\leq (0.7)^2 = 0.49 (49%)
  • B=5B=5: probability (0.7)50.168\leq (0.7)^5 \approx 0.168 (17%)
  • B=10B=10: probability (0.7)100.028\leq (0.7)^{10} \approx 0.028 (2.8%)

You can see how rapidly the probabilities shrink. This exponential decay is exactly what we need: it ensures that long blocks are very rare, which means our expected insertion time stays constant.

Expected insertion time: The insertion (and query) time is proportional to the length of the bad block containing the hash location. Thus:

E[insertion time]=E[length of bad block containing the query]E[\text{insertion time}] = E[\text{length of bad block containing the query}]

Computing Expected Insertion Time via Series Summation

Section titled “Computing Expected Insertion Time via Series Summation”

Now let’s compute the expected insertion time. When you hash a query into position h(q)h(q), you might collide with blocks of various sizes. How many blocks of size BB could contain your hash position?

Blocks containing a position: A hash position h(q)h(q) can be contained in up to BB different blocks of size BB (one where h(q)h(q) is the rightmost position, one where it’s second-from-right, …, one where it’s the leftmost position). So there are BB possible BB-sized blocks that could contain any given hash position.

LaTeX diagram

Expected insertion time formula:

E[insertion time]=B=1MB×P(at least one block of size B is bad)E[\text{insertion time}] = \sum_{B=1}^{M} B \times P(\text{at least one block of size } B \text{ is bad})

By overestimating (even if all BB blocks of size BB are bad), we get:

E[insertion time]B=1MB×B×Pr(a specific block of size B is bad)E[\text{insertion time}] \leq \sum_{B=1}^{M} B \times B \times \Pr(\text{a specific block of size } B \text{ is bad})

Substituting our Chernoff bound:

E[insertion time]B=1MB2×CBE[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 \times C^B

Why Chernoff is Essential: Comparison with Markov

Section titled “Why Chernoff is Essential: Comparison with Markov”

Now let’s see why we needed Chernoff and why simpler inequalities don’t work.

What if we used Markov? Recall that Markov’s inequality says:

Pr(XB)E[X]B\Pr(X \geq B) \leq \frac{E[X]}{B}

In our case, with E[X]=αBE[X] = \alpha B:

Pr(XB)αBB=α\Pr(X \geq B) \leq \frac{\alpha B}{B} = \alpha

Notice something troubling: this bound doesn’t depend on BB at all! Whether we’re looking at blocks of size 1 or size 1000, Markov gives us the same bound α\alpha. This is too weak.

Why this breaks our analysis: If we tried to compute expected insertion time using Markov’s bound, we’d get:

E[insertion time]B=1MB2×Pr(a block of size B is bad)E[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 \times \Pr(\text{a block of size } B \text{ is bad}) B=1MB2×α=αB=1MB2\leq \sum_{B=1}^{M} B^2 \times \alpha = \alpha \sum_{B=1}^{M} B^2

The sum B=1MB2\sum_{B=1}^{M} B^2 grows as O(M3)O(M^3). Since M=O(n)M = O(n), we’d conclude that expected insertion time is O(n3)O(n^3)—a terrible result!

Why Chernoff succeeds: Chernoff gives us the exponential bound CBC^B where C<1C < 1. The exponential decay dominates the polynomial growth of B2B^2, making the sum converge to a constant. This is the power of Chernoff: it exploits the specific structure of sums of independent Bernoullis, while Markov only uses the expectation.

Series Convergence: Showing the Sum is O(1)

Section titled “Series Convergence: Showing the Sum is O(1)”

Now we need to compute the sum of all our probabilities over all possible block sizes:

E[insertion time]B=1MB2CBE[\text{insertion time}] \leq \sum_{B=1}^{M} B^2 C^B

The question is: does this sum stay bounded as MM grows? Or does it blow up?

Intuition—Exponential beats polynomial: This is where the magic happens. We have:

  • B2B^2 grows polynomially (slowly)
  • CBC^B decays exponentially (very fast)

When exponential decay competes with polynomial growth, exponential always wins.

Making this rigorous: For large enough BB, the term CBC^B becomes incredibly small—smaller than 1B4\frac{1}{B^4}. So:

B=1B2CB=B=13B2CBfinitely many terms+B=4B2CBbounded by B=41B2\sum_{B=1}^{\infty} B^2 C^B = \underbrace{\sum_{B=1}^{3} B^2 C^B}_{\text{finitely many terms}} + \underbrace{\sum_{B=4}^{\infty} B^2 C^B}_{\text{bounded by } \sum_{B=4}^{\infty} \frac{1}{B^2}}

The first sum has just a few terms, so it’s finite. The second sum is bounded by the convergent series:

B=41B2B=11B2=π261.64\sum_{B=4}^{\infty} \frac{1}{B^2} \leq \sum_{B=1}^{\infty} \frac{1}{B^2} = \frac{\pi^2}{6} \approx 1.64

Conclusion: The entire series B=1MB2CB\sum_{B=1}^{M} B^2 C^B is bounded by some constant (which may depend on α\alpha, but not on MM or nn). Therefore:

E[insertion time]=O(1) (constant!)E[\text{insertion time}] = O(1) \text{ (constant!)}

This completes the argument. With the right load factor (like α=15\alpha = \frac{1}{5}), linear probing achieves constant expected insertion and query time.

The complete analysis can be summarized as follows:

  1. Long runs are very improbable: If you have enough cells compared to your keys (load factor α<1\alpha < 1), then finding long contiguous runs of packed elements should be rare.

  2. They become more improbable as they get longer: The probability that a block is bad decreases exponentially with its length (CBC^B where C<1C < 1). This exponential decay dominates the polynomial growth of block sizes, making the expected insertion time sum to a constant.

Together, these two observations guarantee that linear probing—despite being simple to implement—achieves constant expected insertion and query time when the load factor is kept as a small constant (like 15\frac{1}{5}).