Skip to content

Lecture 02/09/2026 - FKS Hashing and Tail Bounds

Scribes: Carlos Aucacama and Mohammed Zaid

  • Hashing with Chaining has expected constant query time.
  • Using stronger inequalities gives dramatically better tail bounds.
  • Recognizing when a random variable is a sum of independent Bernoullis is powerful.

Professor Goswami began by discussing the 3 guarantees of hashing and chaining:

1. Expected Query Time

Let denote the query time. Then

If , then

2. Markov Inequality Guarantee

3. Chebyshev Inequality Guarantee

Assuming , we have

Applying Chebyshev:

Variance Analysis of Query Time in Hashing with Chaining

Section titled “Variance Analysis of Query Time in Hashing with Chaining”

Query Time as a Sum of Bernoulli Variables

Section titled “Query Time as a Sum of Bernoulli Variables”

Let denote the query time. We write

where

Thus, counts the number of keys that hash to the same bucket as the query.

  • The chance that the -th key hashes to the same bucket as the query is .
  • The variance of the query time is the sum of the variance of each .
  • Variance of a Bernoulli is , where . Summing over variables gives

Since , replacing it by gives an upper bound:

If , then

We compute the variance of to apply Chebyshev’s inequality. Markov’s inequality only requires expectation, but Chebyshev requires both expectation and variance. Assuming , we have

Applying Chebyshev:

Since is equivalent to , we get

This is much stronger than the Markov bound,

For example, when , Markov gives approximately 2%, while Chebyshev gives 0.04%. There is no contradiction: Chebyshev uses more information (variance), so it gives a tighter bound.

Tail Bounds for Random Variables in Hashing with Chaining

Section titled “Tail Bounds for Random Variables in Hashing with Chaining”

  • Applies to any positive random variable.
  • Only requires the expectation .

  • Applies to any random variable.
  • Requires expectation and variance .

  • Only applies to sums of independent Bernoulli random variables.
  • Gives a much tighter bound than Markov or Chebyshev for large deviations.
  • Estimate extreme events (tails) when exact probabilities are difficult.
  • Choice depends on type of random variable:
    • Markov: expectation only
    • Chebyshev: expectation + variance
    • Chernoff: sum of independent Bernoullis

Binary search has query time

which increases as increases.

Hashing with chaining improves this:

  • Preprocessing time: (always)
  • Query time: in expectation

However, the worst-case query time is not constant because collisions may occur.

We now design a hashing scheme with:

This scheme is called FKS hashing (Fredman–Komlós–Szemerédi). The randomness is moved entirely into preprocessing.

FKS Hashing Two-Level Hashing Construction

Section titled “FKS Hashing Two-Level Hashing Construction”

Let be the set of keys.

Pick a perfectly random hash function

Hash all keys into buckets.

For each bucket , define

Sum of all elements in each bucket is :

Square each bucket:

If the sum of squares is greater than :

Discard hash function and choose a new hash function. Repeat this process until the sum of squares is less than or equal to :

This completes Step 1.

For each bucket :

  • There are keys in bucket .
  • Allocate a second-level table of size for each bucket
  • Choose a random hash function mapping those keys into this table.
  • If any collision occurs, discard and choose another hash function.

Repeat until all keys map to distinct cells.

Thus, every second-level table satisfies:

(1) Each cell contains at most one key.

(2) No collisions occur.

This completes preprocessing.

First-level table uses:

Second-level tables use:

From Step 1:

Therefore:

Total space:

Thus total space is

Given a query key :

  • Compute .
  • Compute .
  • Inspect the cell in bucket at position .

Since second-level tables contain no collisions:

  • If the cell contains , return YES.
  • Otherwise, return NO.

The query performs:

  • One evaluation of
  • One evaluation of
  • One table lookup

Therefore: