Lecture 02/11/2026 - FKS Hashing Analysis and Preprocessing
Scribes: Olivia Xu and Laura Torres
Summary of Lecture
Section titled “Summary of Lecture”- Comparison of FKS Hashing vs. Chaining (Worst-case vs. Expected)
- Mathematical Intuition: Minimizing Sum of Squares for Equality
- Analysis of Step 2: Probability of Second-level Collisions
- Analysis of Step 1: Expected Collisions and Markov Bound
FKS Hashing vs. Hashing with Chaining
Section titled “FKS Hashing vs. Hashing with Chaining”Professor Goswami began by distinguishing the guarantees:
- Hashing with Chaining: Query time is
expected. Preprocessing is always . - FKS Hashing: Query time is
worst-case. Preprocessing is expected.
First a Brief Summary of FKS Hashing Preprocessing Phase
Section titled “First a Brief Summary of FKS Hashing Preprocessing Phase”Before analyzing the FKS hashing preprocessing, let’s review its steps.
- Step 2: We take each
row from the first step and find a hash function for each row that can distribute the keys into a row that is in length such that there are no collisions. - By the end, we will have used
hash functions: 1 in step 1 and in step 2.
Intuition: Why ?
Section titled “Intuition: Why ?”To understand why Step 1 limits the sum of squares, we consider a calculus problem:
Problem: Given
- Substituting
, we get . - To minimize it, we differentiate it once and set it equal to zero:
- Since the second derivative is positive,
, this means that and is a minimum. - Insight: Minimizing the sum of squares is a mathematical way to enforce an equal distribution.
- In FKS,
. The sum of squares ranges from (perfectly equal, ) to (all in one bucket). FKS accepts any where . - The most equal distribution will be when there are no collisions and each
is equal to 1. Then :
- The most equal distribution will be when there are no collisions and each
- The most unequal distribution will occur when all keys go into the same bucket, so all the other buckets have 0 keys:
Step 2 Analysis: Second-level Collision Probability
Section titled “Step 2 Analysis: Second-level Collision Probability”We need to prove that Step 2 terminates quickly.
Scenario: We hash
- Let
be the total number of collisions in a bucket. - Using a Universal Hash Family, for any pair of keys,
. - Total number of pairs is
.
- By Markov’s Inequality:
. - Conclusion: Since the failure probability is
, the number of tries follows a Geometric Random Variable with success . Expected tries .
Step 1 Analysis: Preprocessing Time
Section titled “Step 1 Analysis: Preprocessing Time”We prove that we don’t need to resample
Definitions
Section titled “Definitions”- Total collisions
. - It can be shown that
. - As shown in the “pink fact”: When hashing
keys into buckets, the expected number of collisions . - (More accurately,
).
Markov Bound for Step 1
Section titled “Markov Bound for Step 1”- We know
. - Thus,
. - We want to find the probability that Step 1 fails, i.e.,
. - By Markov:
. - Conclusion: Expected number of tries for Step 1 is
. Total expected work is .
Summary of Guarantees
Section titled “Summary of Guarantees”- Query:
worst-case (two hash function evaluations). - Space:
(since from Step 1’s condition). - Preprocessing:
expected (Geometric trials for both levels).