Skip to content

Playlists
- Playlists
CSCI 316 Kong
CSCI 316 Teitelman
- CSCI 316 Teitelman
CSCI 328 Goswami
- CSCI 328 Goswami
CSCI 340 Akinlar
- CSCI 340 Akinlar
CSCI 343 Computer Architecture
- CSCI 343 Computer Architecture
CSCI 348
- CSCI 348
CSCI 370 Software Engineering
- CSCI 370 Software Engineering
CSCI 370 Goldberg
- CSCI 370 Goldberg
CSCI 381 Python
- CSCI 381 Python
CSCI 381 Goldberg Computability And Complexity
- CSCI 381 Goldberg Computability And Complexity

Lecture 02/11/2026 - FKS Hashing Analysis and Preprocessing

Scribes: Olivia Xu and Laura Torres

Summary of Lecture

Comparison of FKS Hashing vs. Chaining (Worst-case vs. Expected)
Mathematical Intuition: Minimizing Sum of Squares for Equality
Analysis of Step 2: Probability of Second-level Collisions
Analysis of Step 1: Expected Collisions and Markov Bound

FKS Hashing vs. Hashing with Chaining

Professor Goswami began by distinguishing the guarantees:

Hashing with Chaining: Query time is expected. Preprocessing is always .
FKS Hashing: Query time is worst-case. Preprocessing is expected.

First a Brief Summary of FKS Hashing Preprocessing Phase

Before analyzing the FKS hashing preprocessing, let’s review its steps.

Step 2: We take each row from the first step and find a hash function for each row that can distribute the keys into a row that is in length such that there are no collisions.
By the end, we will have used hash functions: 1 in step 1 and in step 2.

Intuition: Why ?

To understand why Step 1 limits the sum of squares, we consider a calculus problem:

Problem: Given and , minimize .

Substituting , we get .
To minimize it, we differentiate it once and set it equal to zero:

Since the second derivative is positive, , this means that and is a minimum.
Insight: Minimizing the sum of squares is a mathematical way to enforce an equal distribution.
In FKS, . The sum of squares ranges from (perfectly equal, ) to (all in one bucket). FKS accepts any where .
- The most equal distribution will be when there are no collisions and each is equal to 1. Then :

The most unequal distribution will occur when all keys go into the same bucket, so all the other buckets have 0 keys:

Step 2 Analysis: Second-level Collision Probability

We need to prove that Step 2 terminates quickly.

Scenario: We hash keys into cells.

Let be the total number of collisions in a bucket.
Using a Universal Hash Family, for any pair of keys, .
Total number of pairs is .

By Markov’s Inequality: .
Conclusion: Since the failure probability is , the number of tries follows a Geometric Random Variable with success . Expected tries .

Step 1 Analysis: Preprocessing Time

We prove that we don’t need to resample too many times.

Definitions

Total collisions .
It can be shown that .
As shown in the “pink fact”: When hashing keys into buckets, the expected number of collisions .
(More accurately, ).

Markov Bound for Step 1

We know .
Thus, .
We want to find the probability that Step 1 fails, i.e., .
By Markov: .
Conclusion: Expected number of tries for Step 1 is . Total expected work is .

Summary of Guarantees

Query: worst-case (two hash function evaluations).
Space: (since from Step 1’s condition).
Preprocessing: expected (Geometric trials for both levels).