Lecture 6 on 02/11/2026 - FKS Hashing Analysis and Preprocessing

Scribes: Olivia Xu and Laura Torres

Summary of Lecture

Comparison of FKS Hashing vs. Chaining (Worst-case vs. Expected)
Mathematical Intuition: Minimizing Sum of Squares for Equality
Analysis of Step 2: Probability of Second-level Collisions
Analysis of Step 1: Expected Collisions and Markov Bound

FKS Hashing vs. Hashing with Chaining

Professor Goswami began by distinguishing the guarantees:

Hashing with Chaining: Query time is $O(1)$ expected. Preprocessing is always $O(n)$ .
FKS Hashing: Query time is $O(1)$ worst-case. Preprocessing is $O(n)$ expected.

First a Brief Summary of FKS Hashing Preprocessing Phase

Before analyzing the FKS hashing preprocessing, let’s review its steps.

Step 2: We take each $b_i$ row from the first step and find a hash function for each row that can distribute the keys into a row that is $2b_i^2$ in length such that there are no collisions.
By the end, we will have used $n+1$ hash functions: 1 in step 1 and $n$ in step 2.

Intuition: Why $\sum b_i^2$ ?

To understand why Step 1 limits the sum of squares, we consider a calculus problem:

Problem: Given $x + y = 1$ and $x, y \geq 0$ , minimize $f(x,y) = x^2 + y^2$ .

Substituting $y = 1-x$ , we get $f(x) = x^2 + (1-x)^2 = 2x^2 - 2x + 1$ .
To minimize it, we differentiate it once and set it equal to zero:

f'(x) = 4x - 2 = 0 \implies x = 1/2, y = 1/2

Since the second derivative is positive, $f''(x) = 4 > 0$ , this means that $x = 1/2$ and $y = 1/2$ is a minimum.
Insight: Minimizing the sum of squares is a mathematical way to enforce an equal distribution.
In FKS, $\sum b_i = n$ $\sum b_{i} = n$ . The sum of squares $\sum b_i^2$ $\sum b_{i}^{2}$ ranges from $n$ $n$ (perfectly equal, $b_i=1$ $b_{i} = 1$ ) to $n^2$ $n^{2}$ (all in one bucket). FKS accepts any $h$ $h$ where $\sum b_i^2 \leq 4n$ $\sum b_{i}^{2} \leq 4 n$ .
- The most equal distribution will be when there are no collisions and each $b_i$ is equal to 1. Then $\sum b_i^2 = n$ :

b_1^2+b_2^2+\dots+b_n^2 = 1^2+1^2+\dots+1^2=n

The most unequal distribution will occur when all keys go into the same bucket, so all the other buckets have 0 keys:

\sum b_i^2 = n^2

b_1^2+b_2^2+\dots+b_n^2 = n^2+0^2+\dots+0^2=n^2

Step 2 Analysis: Second-level Collision Probability

We need to prove that Step 2 terminates quickly.

Scenario: We hash $b_i$ keys into $m_i = 2b_i^2$ cells.

Let $C$ be the total number of collisions in a bucket.
Using a Universal Hash Family, for any pair of keys, $\Pr(\text{collision}) \leq 1/m_i = 1/(2b_i^2)$ .
Total number of pairs is $\binom{b_i}{2} = \frac{b_i(b_i-1)}{2}$ .

E[C] = \sum_{\text{pairs}} \Pr(\text{collision}) = \frac{b_i(b_i-1)}{2} \cdot \frac{1}{2b_i^2} < \frac{b_i^2}{4b_i^2} = 1/4

By Markov’s Inequality: $\Pr(C \geq 1) \leq \frac{E[C]}{1} < 1/4$ .
Conclusion: Since the failure probability is $< 1/2$ , the number of tries follows a Geometric Random Variable with success $p \geq 1/2$ . Expected tries $\leq 2$ .

Step 1 Analysis: Preprocessing Time

We prove that we don’t need to resample $h$ too many times. Step 1 succeeds with probability at least $1/2$ , so expected tries is at most 2.

From Collisions to Sum of Squares

Key observation: The total number of collisions is related to how keys distribute across buckets.

When $b_i$ keys hash to bucket $i$ , how many collision pairs are there? If we count ordered pairs (each unordered pair counted twice), we get $b_i(b_i-1)$ ordered collision pairs from bucket $i$ . Summing over all buckets:

C = \sum_{i=1}^{m} b_i(b_i-1) = \sum_{i=1}^{m} (b_i^2 - b_i) = \sum_{i=1}^{m} b_i^2 - \sum_{i=1}^{m} b_i = \sum b_i^2 - n

Since $\sum b_i = n$ (all keys must go somewhere). Therefore:

\sum b_i^2 = C + n

This connects the sum of squares directly to collision count, tying Step 1’s success condition to a probability argument.

Expected Number of Collisions

When hashing $n$ keys into $n$ buckets using a universal hash function:

What’s the expected number of keys that collide with one particular key $x_j$ ?

There are $n-1$ other keys
Each hashes to the same bucket as $x_j$ with probability $1/n$
Expected collisions with $x_j$ : $(n-1) \cdot \frac{1}{n} < 1$

Since this holds for every key:

E[C] = \sum_{j=1}^{n} E[\text{collisions with } x_j] < n

Important fact: When hashing $n$ keys into $n$ buckets, $E[C] < n$ .

Markov Bound for Step 1

Using the result above where $E[C] < n$ , and $C = \sum b_i^2 - n$ :

E[\sum b_i^2 - n] = E[C] < n \implies E[\sum b_i^2] < 2n

Step 1 fails when $\sum b_i^2 > 4n$ . By Markov’s inequality:

\Pr\left(\sum b_i^2 > 4n\right) \leq \frac{E[\sum b_i^2]}{4n} < \frac{2n}{4n} = \frac{1}{2}

Conclusion:

Probability Step 1 succeeds: $\geq 1/2$
Expected number of tries: $\leq 2$ (geometric with success probability $\geq 1/2$ )
Total expected work in Step 1: $2 \cdot O(n) = O(n)$

Summary of Guarantees

Query: $O(1)$ worst-case (two hash function evaluations).
Space: $O(n)$ (since $\sum 2b_i^2 \leq 8n$ from Step 1’s condition).
Preprocessing: $O(n)$ expected (Geometric trials for both levels).