Skip to content

Lecture 02/11/2026 - FKS Hashing Analysis and Preprocessing

Scribes: Olivia Xu and Laura Torres

  • Comparison of FKS Hashing vs. Chaining (Worst-case vs. Expected)
  • Mathematical Intuition: Minimizing Sum of Squares for Equality
  • Analysis of Step 2: Probability of Second-level Collisions
  • Analysis of Step 1: Expected Collisions and Markov Bound

Professor Goswami began by distinguishing the guarantees:

  • Hashing with Chaining: Query time is expected. Preprocessing is always .
  • FKS Hashing: Query time is worst-case. Preprocessing is expected.

First a Brief Summary of FKS Hashing Preprocessing Phase

Section titled “First a Brief Summary of FKS Hashing Preprocessing Phase”

Before analyzing the FKS hashing preprocessing, let’s review its steps.

  • Step 2: We take each row from the first step and find a hash function for each row that can distribute the keys into a row that is in length such that there are no collisions.
  • By the end, we will have used hash functions: 1 in step 1 and in step 2.

To understand why Step 1 limits the sum of squares, we consider a calculus problem:

Problem: Given and , minimize .

  • Substituting , we get .
  • To minimize it, we differentiate it once and set it equal to zero:

  • Since the second derivative is positive, , this means that and is a minimum.
  • Insight: Minimizing the sum of squares is a mathematical way to enforce an equal distribution.
  • In FKS, . The sum of squares ranges from (perfectly equal, ) to (all in one bucket). FKS accepts any where .
    • The most equal distribution will be when there are no collisions and each is equal to 1. Then :

  • The most unequal distribution will occur when all keys go into the same bucket, so all the other buckets have 0 keys:

Step 2 Analysis: Second-level Collision Probability

Section titled “Step 2 Analysis: Second-level Collision Probability”

We need to prove that Step 2 terminates quickly.

Scenario: We hash keys into cells.

  • Let be the total number of collisions in a bucket.
  • Using a Universal Hash Family, for any pair of keys, .
  • Total number of pairs is .

  • By Markov’s Inequality: .
  • Conclusion: Since the failure probability is , the number of tries follows a Geometric Random Variable with success . Expected tries .

We prove that we don’t need to resample too many times.

  • Total collisions .
  • It can be shown that .
  • As shown in the “pink fact”: When hashing keys into buckets, the expected number of collisions .
  • (More accurately, ).
  • We know .
  • Thus, .
  • We want to find the probability that Step 1 fails, i.e., .
  • By Markov: .
  • Conclusion: Expected number of tries for Step 1 is . Total expected work is .
  • Query: worst-case (two hash function evaluations).
  • Space: (since from Step 1’s condition).
  • Preprocessing: expected (Geometric trials for both levels).