Lecture 02/02/2026 - Independence, Geometric Random Variables, and Hashing

Scribes: Joshua Sin and Ye Htut Muang

Summary of the Lecture

Geometric Random Variables, what are they and how are they useful
The Coupon Collector Problem, its problem statement and how we can solve it using geometric random variables
The Membership/Dictionary Problem, its problem statement and how we can solve it using hashing with chaining
What is hashing with chaining and how we can use it to solve the Membership/Dictionary Problem

Independence

and are independent if

means the joint distribution of and . Independence of a random variable means that one r.v doesn’t affect the other variable.

Types of Independence

For a sequence of random variables , there are two notions of independence:

(1) Pairwise independence

(2) Mutual independence

Pairwise Independence / 2-wise Independence

, and are independent. This means finding does not help in finding . However, finding and together can help finding .

Mutual Independence / n-wise Independence

Finding all but (or finding ) does not give any information about . This type of independence is very rare. In computer science, finding purely independent random variables in perfect randomness is impossible.

Linearity of Variance for a Sequence of Random Variables

For any independent random variables and :

$𝕒 𝕣 𝕒 𝕣 𝕒 𝕣$

Suppose are pairwise independent (we just need pairwise to use this property):

$𝕒 𝕣 𝕒 𝕣 𝕒 𝕣$

By applying linearity of variance, let and :

$𝕒 𝕣 𝕒 𝕣$

Geometric Random Variable

Say we have a coin which turns up heads with probability . We toss it times until we get a heads. is random such that . Then, .

Expectation - Alternative Definition

Expectation and Variance of Geometric Random Variable

$𝕒 𝕣$

Coupon Collector Problem

Pokemon Character Example

Let there be 20 unique characters in Pokemon. Pokemon is collaborating with a cereal company, and when you buy cereal, you get one Pokemon character randomly. How many boxes of cereal in expectation do you need to buy to get all 20 Pokemon characters?

Let’s start with a simple question: If I already have 19 characters, what is the chance that I get the 20th unique character in my next cereal box? The answer is because we are choosing the 1 unseen character out of 20 total characters.

Now, let’s find the pattern of the probability of each case, as the probability will change every time we pick a new character:

If we have 0 characters, we have or 100% chance that we get a new character.
If we have 1 character, we have chance that we get a new character.
If we have 2 characters, we have chance that we get a new character.
If we have 19 characters, we have chance that we get a new character.

Let be the number of cereal boxes bought until we collect all 20 Pokemon characters. So, . We break down into . Consider to be the number of boxes bought after getting () characters but before getting the -th character. The variables are geometric random variables.

= the number of boxes bought before the 1st new character; and
= the number of boxes bought after getting the 1st character but before the 2nd new character; and
= the number of boxes bought after getting the 2nd character but before the 3rd new character; and
= the number of boxes bought after getting the 19th character but before the 20th new character; and

Now, we find the expectation of the total number of cereal boxes we need to buy to get all 20 Pokemon characters:

Since :

When we talk about expected runtime, it will be .

Generalization of Coupon Collector Problem

By the linearity of expectation, the total expected number of boxes is:

Since :

Membership/Dictionary Problem

Problem Statement

In the membership/dictionary problem, we have a set of keys, , from a universe . The goal is to store in a data structure such that, given a query element , we can quickly determine whether or not.

Methods to Solve

Brute Force: We take and compare it with every key . If it exists, we return the key.

Runtime:

Binary Search: First, we sort which takes preprocessing time (), and then we do the query using binary search. If the element exists, we return the key.

Runtime:

Is it possible to solve this problem and achieve constant querying time ()?

Algorithm 1 - Hashing with Chains

Given a hash function (where is the space we will use), we assume is perfectly random (meaning every possible hash value is equally likely to be selected from a large hash family set). For any key and any :

Algorithm:

(1) Initialize linked lists, one for each bucket

(2) For to :

compute
append to the linked list in bucket

Note: Query time depends on which bucket you have to search.

Example

Suppose we want to support membership queries on the set

using a hash table of size with hash function

Each table entry stores a chain of elements that hash to the same index.

Index	Stored Keys
0	5
1	(empty)
2
3	(empty)
4	(empty)

To answer a membership query for a key , we compute and search only the corresponding chain.

Example Queries

Query : . Searching the chain at index 2 finds 22, so .
Query : . The chain at index 4 is empty, so .