Skip to content

Lecture 02/02/2026 - Independence, Geometric Random Variables, and Hashing

Scribes: Joshua Sin and Ye Htut Muang

  • Geometric Random Variables, what are they and how are they useful
  • The Coupon Collector Problem, its problem statement and how we can solve it using geometric random variables
  • The Membership/Dictionary Problem, its problem statement and how we can solve it using hashing with chaining
  • What is hashing with chaining and how we can use it to solve the Membership/Dictionary Problem

and are independent if

means the joint distribution of and . Independence of a random variable means that one r.v doesn’t affect the other variable.

For a sequence of random variables , there are two notions of independence:

(1) Pairwise independence

(2) Mutual independence

Pairwise Independence / 2-wise Independence

Section titled “Pairwise Independence / 2-wise Independence”

, and are independent. This means finding does not help in finding . However, finding and together can help finding .

Finding all but (or finding ) does not give any information about . This type of independence is very rare. In computer science, finding purely independent random variables in perfect randomness is impossible.

Linearity of Variance for a Sequence of Random Variables

Section titled “Linearity of Variance for a Sequence of Random Variables”

For any independent random variables and :

𝕒𝕣𝕒𝕣𝕒𝕣

Suppose are pairwise independent (we just need pairwise to use this property):

𝕒𝕣𝕒𝕣𝕒𝕣

By applying linearity of variance, let and :

𝕒𝕣𝕒𝕣

Say we have a coin which turns up heads with probability . We toss it times until we get a heads. is random such that . Then, .

Expectation and Variance of Geometric Random Variable

Section titled “Expectation and Variance of Geometric Random Variable”

𝕒𝕣

Let there be 20 unique characters in Pokemon. Pokemon is collaborating with a cereal company, and when you buy cereal, you get one Pokemon character randomly. How many boxes of cereal in expectation do you need to buy to get all 20 Pokemon characters?

Let’s start with a simple question: If I already have 19 characters, what is the chance that I get the 20th unique character in my next cereal box? The answer is because we are choosing the 1 unseen character out of 20 total characters.

Now, let’s find the pattern of the probability of each case, as the probability will change every time we pick a new character:

  • If we have 0 characters, we have or 100% chance that we get a new character.
  • If we have 1 character, we have chance that we get a new character.
  • If we have 2 characters, we have chance that we get a new character.
  • If we have 19 characters, we have chance that we get a new character.

Let be the number of cereal boxes bought until we collect all 20 Pokemon characters. So, . We break down into . Consider to be the number of boxes bought after getting () characters but before getting the -th character. The variables are geometric random variables.

  • = the number of boxes bought before the 1st new character; and
  • = the number of boxes bought after getting the 1st character but before the 2nd new character; and
  • = the number of boxes bought after getting the 2nd character but before the 3rd new character; and
  • = the number of boxes bought after getting the 19th character but before the 20th new character; and

Now, we find the expectation of the total number of cereal boxes we need to buy to get all 20 Pokemon characters:

Since :

When we talk about expected runtime, it will be .

Generalization of Coupon Collector Problem

Section titled “Generalization of Coupon Collector Problem”

By the linearity of expectation, the total expected number of boxes is:

Since :

In the membership/dictionary problem, we have a set of keys, , from a universe . The goal is to store in a data structure such that, given a query element , we can quickly determine whether or not.

Brute Force: We take and compare it with every key . If it exists, we return the key.

  • Runtime:

Binary Search: First, we sort which takes preprocessing time (), and then we do the query using binary search. If the element exists, we return the key.

  • Runtime:

Is it possible to solve this problem and achieve constant querying time ()?

Given a hash function (where is the space we will use), we assume is perfectly random (meaning every possible hash value is equally likely to be selected from a large hash family set). For any key and any :

Algorithm:

(1) Initialize linked lists, one for each bucket

(2) For to :

  • compute
  • append to the linked list in bucket

Note: Query time depends on which bucket you have to search.

Suppose we want to support membership queries on the set

using a hash table of size with hash function

Each table entry stores a chain of elements that hash to the same index.

IndexStored Keys
05
1(empty)
2
3(empty)
4(empty)

To answer a membership query for a key , we compute and search only the corresponding chain.

  • Query : . Searching the chain at index 2 finds 22, so .
  • Query : . The chain at index 4 is empty, so .