Skip to content

Lecture 5 on 02/09/2026 - FKS Hashing and Tail Bounds

Scribes: Carlos Aucacama and Mohammed Zaid

  • Hashing with Chaining has expected constant query time.
  • Using stronger inequalities gives dramatically better tail bounds.
  • Recognizing when a random variable is a sum of independent Bernoullis is powerful.

Professor Goswami began by discussing the 3 guarantees of hashing and chaining:

1. Expected Query Time

Let QQ denote the query time. Then

E[Q]=nME[Q] = \frac{n}{M}

If M=Θ(n)M = \Theta(n), then

E[Q]=O(1)E[Q] = O(1)

2. Markov Inequality Guarantee

Pr(Q>T)1T\Pr(Q > T) \le \frac{1}{T}

3. Chebyshev Inequality Guarantee

Assuming M=nM = n, we have

Var(Q)1\mathrm{Var}(Q) \le 1

Applying Chebyshev:

Pr(Q>T)1T2\Pr(Q > T) \le \frac{1}{T^2}

Variance Analysis of Query Time in Hashing with Chaining

Section titled “Variance Analysis of Query Time in Hashing with Chaining”

Query Time as a Sum of Bernoulli Variables

Section titled “Query Time as a Sum of Bernoulli Variables”

Let QQ denote the query time. We write

Q=i=1nXiQ = \sum_{i=1}^{n} X_i

where

Xi={1if the i-th key hashes to the same bucket as the query,0otherwise.X_i = \begin{cases} 1 & \text{if the $i$-th key hashes to the same bucket as the query,} \\ 0 & \text{otherwise.} \end{cases}

Thus, QQ counts the number of keys that hash to the same bucket as the query.

  • The chance that the ii-th key hashes to the same bucket as the query is 1M\frac{1}{M}.
  • The variance of the query time is the sum of the variance of each XiX_i.
  • Variance of a Bernoulli is p(1p)p(1-p), where p=1Mp = \frac{1}{M}. Summing over nn variables gives
Var(Q)=n1M(11M)=np(1p)\mathrm{Var}(Q) = n \cdot \frac{1}{M} \left(1 - \frac{1}{M}\right) = np(1-p)

Since 11M11 - \frac{1}{M} \le 1, replacing it by 11 gives an upper bound:

Var(Q)nM\mathrm{Var}(Q) \le \frac{n}{M}

If M=nM = n, then

Var(Q)1\mathrm{Var}(Q) \le 1

We compute the variance of QQ to apply Chebyshev’s inequality. Markov’s inequality only requires expectation, but Chebyshev requires both expectation and variance. Assuming M=nM = n, we have

E[Q]=1andVar(Q)1E[Q] = 1 \quad \text{and} \quad \mathrm{Var}(Q) \le 1

Applying Chebyshev:

Pr(Q1>t)1t2\Pr(|Q - 1| > t) \le \frac{1}{t^2}

Since Q1>tQ - 1 > t is equivalent to Q>t+1Q > t+1, we get

Pr(Q>T)1T2\Pr(Q > T) \le \frac{1}{T^2}

This is much stronger than the Markov bound,

Pr(Q>T)1T\Pr(Q > T) \le \frac{1}{T}

For example, when T=50T = 50, Markov gives approximately 2%, while Chebyshev gives 0.04%. There is no contradiction: Chebyshev uses more information (variance), so it gives a tighter bound.

Tail Bounds for Random Variables in Hashing with Chaining

Section titled “Tail Bounds for Random Variables in Hashing with Chaining”
Pr(Xt)E[X]t,X0\Pr(X \ge t) \le \frac{E[X]}{t}, \quad X \ge 0
  • Applies to any positive random variable.
  • Only requires the expectation E[X]E[X].
Pr(XE[X]t)Var(X)t2\Pr(|X - E[X]| \ge t) \le \frac{\mathrm{Var}(X)}{t^2}
  • Applies to any random variable.
  • Requires expectation E[X]E[X] and variance Var(X)\mathrm{Var}(X).
Q=i=1nXi,XiBernoulli(p), independentQ = \sum_{i=1}^{n} X_i, \quad X_i \sim \text{Bernoulli}(p), \text{ independent} Pr(Q>t)1et,e2.718\Pr(Q > t) \le \frac{1}{e^t}, \quad e \approx 2.718
  • Only applies to sums of independent Bernoulli random variables.
  • Gives a much tighter bound than Markov or Chebyshev for large deviations.

Comparison for Large Deviations (e.g., t=50t=50)

Section titled “Comparison for Large Deviations (e.g., t=50t=50t=50)”
Pr(Q>50){Markov: 1500.02Chebyshev: 15020.0004Chernoff: 12500\Pr(Q > 50) \le \begin{cases} \text{Markov: } \frac{1}{50} \approx 0.02 \\ \text{Chebyshev: } \frac{1}{50^2} \approx 0.0004 \\ \text{Chernoff: } \frac{1}{2^{50}} \approx 0 \end{cases}
  • Estimate extreme events (tails) when exact probabilities are difficult.
  • Choice depends on type of random variable:
    • Markov: expectation only
    • Chebyshev: expectation + variance
    • Chernoff: sum of independent Bernoullis

Binary search has query time

O(logn)O(\log n)

which increases as nn increases.

Hashing with chaining improves this:

  • Preprocessing time: O(n)O(n) (always)
  • Query time: O(1)O(1) in expectation

However, the worst-case query time is not constant because collisions may occur.

We now design a hashing scheme with:

Preprocessing time: O(n) in expectation\text{Preprocessing time: } O(n) \text{ in expectation} Query time: O(1) worst case (always)\text{Query time: } O(1) \text{ worst case (always)}

This scheme is called FKS hashing (Fredman–Komlós–Szemerédi). The randomness is moved entirely into preprocessing.

FKS Hashing Two-Level Hashing Construction

Section titled “FKS Hashing Two-Level Hashing Construction”

Let S={x1,,xn}S = \{x_1, \dots, x_n\} be the set of keys.

Pick a perfectly random hash function

h:U{1,2,,n}h : U \to \{1,2,\dots,n\}

Hash all keys into nn buckets.

For each bucket ii, define

bi=number of keys mapped to bucket ib_i = \text{number of keys mapped to bucket } i

Sum of all elements in each bucket is nn:

i=1nbi=n\sum_{i=1}^{n} b_i = n

Square each bucket:

i=1nbi2\sum_{i=1}^{n} b_i^2

If the sum of squares is greater than 4n4n:

i=1nbi2>4n\sum_{i=1}^{n} b_i^2 > 4n

Discard hash function hh and choose a new hash function. Repeat this process until the sum of squares is less than or equal to 4n4n:

i=1nbi24n\sum_{i=1}^{n} b_i^2 \le 4n

This completes Step 1.

For each bucket ii:

  • There are bib_i keys in bucket ii.
  • Allocate a second-level table of size 2bi22b_i^2 for each bucket
  • Choose a random hash function gig_i mapping those bib_i keys into this table.
  • If any collision occurs, discard gig_i and choose another hash function.

Repeat until all bib_i keys map to distinct cells.

Thus, every second-level table satisfies:

  • Each cell contains at most one key.
  • No collisions occur.

This completes preprocessing.

First-level table uses:

n cellsn \text{ cells}

Second-level tables use:

i=1n2bi2=2i=1nbi2\sum_{i=1}^{n} 2 b_i^2 = 2 \sum_{i=1}^{n} b_i^2

From Step 1:

i=1nbi24n\sum_{i=1}^{n} b_i^2 \le 4n

Therefore:

2i=1nbi28n2 \sum_{i=1}^{n} b_i^2 \le 8n

Total space:

n+8n=9nn + 8n = 9n

Thus total space is

O(n)O(n)

Given a query key qq:

  • Compute i=h(q)i = h(q).
  • Compute j=gi(q)j = g_i(q).
  • Inspect the cell in bucket ii at position jj.

Since second-level tables contain no collisions:

  • If the cell contains qq, return YES.
  • Otherwise, return NO.

The query performs:

  • One evaluation of hh
  • One evaluation of gig_i
  • One table lookup

Therefore:

Query time=O(1) worst case\text{Query time} = O(1) \text{ worst case} Preprocessing time: O(n) in expectation\text{Preprocessing time: } O(n) \text{ in expectation} Space: O(n)\text{Space: } O(n) Query time: O(1) worst case\text{Query time: } O(1) \text{ worst case}