Lecture 02/09/2026 - FKS Hashing and Tail Bounds
Scribes: Carlos Aucacama and Mohammed Zaid
Summary of Lecture
Section titled “Summary of Lecture”- Hashing with Chaining has expected constant query time.
- Using stronger inequalities gives dramatically better tail bounds.
- Recognizing when a random variable is a sum of independent Bernoullis is powerful.
3 Guarantees
Section titled “3 Guarantees”Professor Goswami began by discussing the 3 guarantees of hashing and chaining:
1. Expected Query Time
Let
If
2. Markov Inequality Guarantee
3. Chebyshev Inequality Guarantee
Assuming
Applying Chebyshev:
Variance Analysis of Query Time in Hashing with Chaining
Section titled “Variance Analysis of Query Time in Hashing with Chaining”Query Time as a Sum of Bernoulli Variables
Section titled “Query Time as a Sum of Bernoulli Variables”Let
where
Thus,
- The chance that the
-th key hashes to the same bucket as the query is . - The variance of the query time is the sum of the variance of each
. - Variance of a Bernoulli is
, where . Summing over variables gives
Since
If
We compute the variance of
Applying Chebyshev:
Since
This is much stronger than the Markov bound,
For example, when
Tail Bounds for Random Variables in Hashing with Chaining
Section titled “Tail Bounds for Random Variables in Hashing with Chaining”Markov Inequality
Section titled “Markov Inequality”- Applies to any positive random variable.
- Only requires the expectation
.
Chebyshev Inequality
Section titled “Chebyshev Inequality”- Applies to any random variable.
- Requires expectation
and variance .
Chernoff Bound
Section titled “Chernoff Bound”- Only applies to sums of independent Bernoulli random variables.
- Gives a much tighter bound than Markov or Chebyshev for large deviations.
Comparison for Large Deviations (e.g., )
Section titled “Comparison for Large Deviations (e.g., )”Purpose of Tail Bounds
Section titled “Purpose of Tail Bounds”- Estimate extreme events (tails) when exact probabilities are difficult.
- Choice depends on type of random variable:
- Markov: expectation only
- Chebyshev: expectation + variance
- Chernoff: sum of independent Bernoullis
Motivation for New Algorithm
Section titled “Motivation for New Algorithm”Binary search has query time
which increases as
Hashing with chaining improves this:
- Preprocessing time:
(always) - Query time:
in expectation
However, the worst-case query time is not constant because collisions may occur.
We now design a hashing scheme with:
This scheme is called FKS hashing (Fredman–Komlós–Szemerédi). The randomness is moved entirely into preprocessing.
FKS Hashing Two-Level Hashing Construction
Section titled “FKS Hashing Two-Level Hashing Construction”Let
Step 1: First-Level Hashing
Section titled “Step 1: First-Level Hashing”Pick a perfectly random hash function
Hash all keys into
For each bucket
Sum of all elements in each bucket is
Square each bucket:
If the sum of squares is greater than
Discard hash function
This completes Step 1.
Step 2: Second-Level Hashing
Section titled “Step 2: Second-Level Hashing”For each bucket
- There are
keys in bucket . - Allocate a second-level table of size
for each bucket - Choose a random hash function
mapping those keys into this table. - If any collision occurs, discard
and choose another hash function.
Repeat until all
Thus, every second-level table satisfies:
(1) Each cell contains at most one key.
(2) No collisions occur.
This completes preprocessing.
Space Analysis
Section titled “Space Analysis”First-level table uses:
Second-level tables use:
From Step 1:
Therefore:
Total space:
Thus total space is
Query Analysis
Section titled “Query Analysis”Given a query key
- Compute
. - Compute
. - Inspect the cell in bucket
at position .
Since second-level tables contain no collisions:
- If the cell contains
, return YES. - Otherwise, return NO.
The query performs:
- One evaluation of
- One evaluation of
- One table lookup
Therefore: