Lecture 6 on 02/11/2026 - FKS Hashing Analysis and Preprocessing
Scribes: Olivia Xu and Laura Torres
Summary of Lecture
Section titled “Summary of Lecture”- Comparison of FKS Hashing vs. Chaining (Worst-case vs. Expected)
- Mathematical Intuition: Minimizing Sum of Squares for Equality
- Analysis of Step 2: Probability of Second-level Collisions
- Analysis of Step 1: Expected Collisions and Markov Bound
FKS Hashing vs. Hashing with Chaining
Section titled “FKS Hashing vs. Hashing with Chaining”Professor Goswami began by distinguishing the guarantees:
- Hashing with Chaining: Query time is expected. Preprocessing is always .
- FKS Hashing: Query time is worst-case. Preprocessing is expected.
First a Brief Summary of FKS Hashing Preprocessing Phase
Section titled “First a Brief Summary of FKS Hashing Preprocessing Phase”Before analyzing the FKS hashing preprocessing, let’s review its steps.
- Step 2: We take each row from the first step and find a hash function for each row that can distribute the keys into a row that is in length such that there are no collisions.
- By the end, we will have used hash functions: 1 in step 1 and in step 2.
Intuition: Why ?
Section titled “Intuition: Why ∑bi2\sum b_i^2∑bi2?”To understand why Step 1 limits the sum of squares, we consider a calculus problem:
Problem: Given and , minimize .
- Substituting , we get .
- To minimize it, we differentiate it once and set it equal to zero:
- Since the second derivative is positive, , this means that and is a minimum.
- Insight: Minimizing the sum of squares is a mathematical way to enforce an equal distribution.
- In FKS, . The sum of squares ranges from (perfectly equal, ) to (all in one bucket). FKS accepts any where .
- The most equal distribution will be when there are no collisions and each is equal to 1. Then :
- The most unequal distribution will occur when all keys go into the same bucket, so all the other buckets have 0 keys:
Step 2 Analysis: Second-level Collision Probability
Section titled “Step 2 Analysis: Second-level Collision Probability”We need to prove that Step 2 terminates quickly.
Scenario: We hash keys into cells.
- Let be the total number of collisions in a bucket.
- Using a Universal Hash Family, for any pair of keys, .
- Total number of pairs is .
- By Markov’s Inequality: .
- Conclusion: Since the failure probability is , the number of tries follows a Geometric Random Variable with success . Expected tries .
Step 1 Analysis: Preprocessing Time
Section titled “Step 1 Analysis: Preprocessing Time”We prove that we don’t need to resample too many times. Step 1 succeeds with probability at least , so expected tries is at most 2.
From Collisions to Sum of Squares
Section titled “From Collisions to Sum of Squares”Key observation: The total number of collisions is related to how keys distribute across buckets.
When keys hash to bucket , how many collision pairs are there? If we count ordered pairs (each unordered pair counted twice), we get ordered collision pairs from bucket . Summing over all buckets:
Since (all keys must go somewhere). Therefore:
This connects the sum of squares directly to collision count, tying Step 1’s success condition to a probability argument.
Expected Number of Collisions
Section titled “Expected Number of Collisions”When hashing keys into buckets using a universal hash function:
What’s the expected number of keys that collide with one particular key ?
- There are other keys
- Each hashes to the same bucket as with probability
- Expected collisions with :
Since this holds for every key:
Important fact: When hashing keys into buckets, .
Markov Bound for Step 1
Section titled “Markov Bound for Step 1”Using the result above where , and :
Step 1 fails when . By Markov’s inequality:
Conclusion:
- Probability Step 1 succeeds:
- Expected number of tries: (geometric with success probability )
- Total expected work in Step 1:
Summary of Guarantees
Section titled “Summary of Guarantees”- Query: worst-case (two hash function evaluations).
- Space: (since from Step 1’s condition).
- Preprocessing: expected (Geometric trials for both levels).