Skip to content

CSCI 328 Midterm Fundamentals Problem Sets

(a) If we flip a fair coin 100 times, what is the expected number of heads?

(b) Using linearity of expectation, explain why the expectation of a binomial random variable XBinomial(n,p)X \sim \text{Binomial}(n, p) is npnp.

(c) If the expected number of heads is 50, what is the expected number of tails?

Solution

(a) For XBinomial(100,1/2)X \sim \text{Binomial}(100, 1/2), we have E[X]=np=100(1/2)=50E[X] = np = 100 \cdot (1/2) = 50 heads.

(b) We can write X=X1+X2++XnX = X_1 + X_2 + \cdots + X_n where each XiBernoulli(p)X_i \sim \text{Bernoulli}(p). By linearity of expectation (which holds even without independence):

E[X]=E[X1]+E[X2]++E[Xn]=p+p++p=npE[X] = E[X_1] + E[X_2] + \cdots + E[X_n] = p + p + \cdots + p = np

(c) The number of tails is 100X100 - X, so E[100X]=100E[X]=10050=50E[100 - X] = 100 - E[X] = 100 - 50 = 50 tails.


Problem 2: Bernoulli Random Variables and Variance

Section titled “Problem 2: Bernoulli Random Variables and Variance”

Consider flipping a fair coin once. Let X=1X = 1 if we get heads, and X=0X = 0 if we get tails.

(a) What is E[X]E[X] and Var(X)\text{Var}(X)?

(b) Now flip the coin 100 times. Let YY be the total number of heads. Express YY as a sum of independent Bernoulli random variables and use linearity to find E[Y]E[Y] and Var(Y)\text{Var}(Y).

(c) For n=1000n = 1000 coin flips with YBinomial(1000,1/2)Y \sim \text{Binomial}(1000, 1/2), compute E[Y]E[Y] and Var(Y)\text{Var}(Y).

Solution

(a) XBernoulli(1/2)X \sim \text{Bernoulli}(1/2), so:

E[X]=12,Var(X)=1212=14E[X] = \frac{1}{2}, \quad \text{Var}(X) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}

(b) Y=X1+X2++X100Y = X_1 + X_2 + \cdots + X_{100} where each XiBernoulli(1/2)X_i \sim \text{Bernoulli}(1/2). By linearity:

E[Y]=i=110012=50,Var(Y)=i=110014=25E[Y] = \sum_{i=1}^{100} \frac{1}{2} = 50, \quad \text{Var}(Y) = \sum_{i=1}^{100} \frac{1}{4} = 25

(c) For n=1000n = 1000:

E[Y]=100012=500,Var(Y)=100014=250E[Y] = 1000 \cdot \frac{1}{2} = 500, \quad \text{Var}(Y) = 1000 \cdot \frac{1}{4} = 250

Problem 3: Hashing with Chaining - Indicator Variables

Section titled “Problem 3: Hashing with Chaining - Indicator Variables”

In hashing with chaining, we query for an element qq. Let XiX_i be an indicator variable: Xi=1X_i = 1 if element xix_i collides with qq (i.e., h(q)=h(xi)h(q) = h(x_i)), and Xi=0X_i = 0 otherwise. Assume the hash function is perfectly random.

(a) If we have nn elements and mm buckets, what is Pr(Xi=1)\Pr(X_i = 1) for any element xix_i?

(b) The query time is Q=i=1nXiQ = \sum_{i=1}^{n} X_i. Using linearity of expectation, what is E[Q]E[Q]?

(c) If m=nm = n, what should Var(Q)\text{Var}(Q) be approximately?

Solution

(a) Since the hash function is perfectly random, each element hashes to each of the mm buckets with equal probability:

Pr(Xi=1)=Pr(h(q)=h(xi))=1m\Pr(X_i = 1) = \Pr(h(q) = h(x_i)) = \frac{1}{m}

(b) Each XiX_i is a Bernoulli random variable with E[Xi]=1/mE[X_i] = 1/m. By linearity of expectation:

E[Q]=i=1nE[Xi]=i=1n1m=nmE[Q] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{1}{m} = \frac{n}{m}

(c) For a Bernoulli variable with p=1/mp = 1/m:

Var(Xi)=p(1p)=1m(11m)\text{Var}(X_i) = p(1-p) = \frac{1}{m}\left(1 - \frac{1}{m}\right)

Assuming independence:

Var(Q)=i=1nVar(Xi)=n1m(11m)\text{Var}(Q) = \sum_{i=1}^{n} \text{Var}(X_i) = n \cdot \frac{1}{m}\left(1 - \frac{1}{m}\right)

If m=nm = n, then Var(Q)1\text{Var}(Q) \approx 1 (for large nn).


Problem 4: Hashing with Chaining - Expected Query Time

Section titled “Problem 4: Hashing with Chaining - Expected Query Time”

For hashing with chaining, assume nn keys are stored in mm buckets using a perfectly random hash function.

(a) What is the expected length of a chain?

(b) If m=nm = n (load factor α=1\alpha = 1), what is E[Q]E[Q] where QQ is the query time?

(c) Using Markov’s inequality, bound Pr(Q>10)\Pr(Q > 10) when E[Q]=1E[Q] = 1.

Solution

(a) Each of the nn keys hashes to one of mm buckets uniformly at random. By linearity of expectation:

E[chain length]=nmE[\text{chain length}] = \frac{n}{m}

(b) With m=nm = n:

E[Q]=nn=1E[Q] = \frac{n}{n} = 1

(c) By Markov’s inequality:

Pr(Q>10)E[Q]10=110=0.1\Pr(Q > 10) \le \frac{E[Q]}{10} = \frac{1}{10} = 0.1

A hash function is “perfectly random” if Pr(h(x)=i)=1/m\Pr(h(x) = i) = 1/m for any key xx and bucket ii.

(a) What does it mean for a hash function to be perfectly random?

(b) Why does the theoretical analysis of hashing depend on this assumption?

(c) If a hash function is biased (not perfectly random), how would it affect the expected query time?

Solution

(a) A perfectly random hash function maps each key uniformly to any of the mm buckets with equal probability 1/m1/m, independent of other keys.

(b) The O(1+α)O(1 + \alpha) expected query time analysis assumes that each key is equally likely to land in any bucket, collisions are independent, and the expected chain length is n/mn/m. Without perfect randomness, some buckets might be overloaded.

(c) If the hash function is biased, some buckets receive more keys than expected, expected chain length increases above n/mn/m, and expected query time becomes worse than O(1+α)O(1 + \alpha). In the worst case (all keys hash to one bucket), query time is O(n)O(n).



Problem 6: Balls and Bins - Foundational Analysis

Section titled “Problem 6: Balls and Bins - Foundational Analysis”

You throw n=1000n = 1000 balls (keys) uniformly at random into n=1000n = 1000 bins (hash slots).

(a) What is the expected load (number of balls per bin)?

(b) What is the expected number of empty bins?

(c) Using Chernoff bounds, estimate the probability that a particular bin receives at least 10 balls.

(d) Using a union bound, estimate the probability that the maximum load exceeds 10.

Solution

(a) Each ball lands in a uniformly random bin. By linearity of expectation:

E[load]=n1n=1E[\text{load}] = n \cdot \frac{1}{n} = 1

(b) A bin is empty if no balls land in it. The probability a given bin is empty is:

Pr(empty)=(11/n)ne10.368\Pr(\text{empty}) = (1 - 1/n)^n \approx e^{-1} \approx 0.368

Expected number of empty bins:

E[empty bins]=ne1=10000.368368E[\text{empty bins}] = n \cdot e^{-1} = 1000 \cdot 0.368 \approx 368

(c) Let XX = load of a particular bin. Then XBinomial(n,1/n)X \sim \text{Binomial}(n, 1/n) with E[X]=1E[X] = 1.

We want Pr(X10)\Pr(X \geq 10). With (1+δ)1=10(1 + \delta) \cdot 1 = 10, we have δ=9\delta = 9:

Pr(X10)=Pr(X(1+9)E[X])eE[X]92/3=e271012\Pr(X \geq 10) = \Pr(X \geq (1+9) \cdot E[X]) \le e^{-E[X] \cdot 9^2 / 3} = e^{-27} \approx 10^{-12}

(d) The probability that any bin (out of n=1000n = 1000) exceeds load 10:

Pr(max load10)nPr(X10)=10001012=109\Pr(\text{max load} \geq 10) \le n \cdot \Pr(X \geq 10) = 1000 \cdot 10^{-12} = 10^{-9}

Problem 7: Balls and Bins - Comprehensive Analysis

Section titled “Problem 7: Balls and Bins - Comprehensive Analysis”

Balls and Bins: throw nn balls uniformly at random into nn bins.

(a) What is the probability that a specific bin receives no balls?

(b) By linearity of expectation, what is the expected number of empty bins?

(c) Write out the explicit product expression for the probability that all nn balls land in different bins (no collisions).

(d) Write the generalized closed form formula for the probability that all nn balls land in different bins when thrown into nn bins.

(e) Using the balls and bins framework, explain why collisions (two items hashing to the same bucket) are likely even when the number of items equals the number of buckets.

Solution

(a) Each ball has probability 11/n1 - 1/n of missing a specific bin. With nn independent balls:

Pr(bin empty)=(11n)ne10.368\Pr(\text{bin empty}) = \left(1 - \frac{1}{n}\right)^n \approx e^{-1} \approx 0.368

(b) By linearity of expectation, the expected number of empty bins is:

E[# empty]=n(11n)nne10.368nE[\text{\# empty}] = n \cdot \left(1 - \frac{1}{n}\right)^n \approx n \cdot e^{-1} \approx 0.368n

(c) The probability that all nn balls land in different bins is:

Pr(no collision)=nnn1nn2n1n=n!nn\Pr(\text{no collision}) = \frac{n}{n} \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdots \frac{1}{n} = \frac{n!}{n^n}

(d) The generalized closed form formula for mm balls and nn bins is:

Pr(no collision)em2/2n\Pr(\text{no collision}) \approx e^{-m^2/2n}

For our case, where we have equal numbers of balls and bins (m=nm = n):

Pr(no collision)en2/2n=en/2\Pr(\text{no collision}) \approx e^{-n^2/2n} = e^{-n/2}

(e) If the expected number of empty bins is about 0.368n0.368n, then the expected number of non-empty bins is about 0.632n<n0.632n < n. By pigeonhole principle, if fewer than nn bins are used for nn balls, some bins must have more than one ball.



Problem 8: Birthday Paradox and Hash Collisions

Section titled “Problem 8: Birthday Paradox and Hash Collisions”

Suppose you have a hash table with n=100n = 100 slots and you insert k=10k = 10 keys using a random hash function.

(a) Using the birthday paradox, estimate the probability that at least two keys collide (hash to the same slot).

(b) Use Chernoff bounds to bound the probability that a particular slot receives 3 or more keys.

(c) Using a union bound, estimate the probability that any slot receives 3 or more keys.

Solution

(a) We want Pr(at least one collision)\Pr(\text{at least one collision}). It’s easier to compute the complement:

Pr(no collision)=nnn1nn2nnk+1n=i=0k1(1in)ek(k1)/(2n)\Pr(\text{no collision}) = \frac{n}{n} \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdots \frac{n-k+1}{n} = \prod_{i=0}^{k-1} \left(1 - \frac{i}{n}\right) \approx e^{-k(k-1)/(2n)}

With n=100n = 100 and k=10k = 10:

Pr(no collision)e109/200=e0.450.638\Pr(\text{no collision}) \approx e^{-10 \cdot 9 / 200} = e^{-0.45} \approx 0.638 Pr(at least one collision)10.638=0.36236%\Pr(\text{at least one collision}) \approx 1 - 0.638 = 0.362 \approx 36\%

(b) Let XX = number of keys hashing to a particular slot. Then XBinomial(k,1/n)=Binomial(10,0.01)X \sim \text{Binomial}(k, 1/n) = \text{Binomial}(10, 0.01) with E[X]=k/n=0.1E[X] = k/n = 0.1.

We want Pr(X3)\Pr(X \geq 3). Using Chernoff with (1+δ)μ=3(1 + \delta)\mu = 3:

(1+δ)0.1=3    δ=29(1 + \delta) \cdot 0.1 = 3 \implies \delta = 29 Pr(X3)e0.1292/3=e28.031013\Pr(X \geq 3) \le e^{-0.1 \cdot 29^2 / 3} = e^{-28.03} \approx 10^{-13}

(c) The probability that any of the n=100n = 100 slots has 3 or more keys is at most:

Pr( slot with 3 keys)nPr(X3 for a given slot)1001013=1011\Pr(\exists \text{ slot with } \geq 3 \text{ keys}) \le n \cdot \Pr(X \geq 3 \text{ for a given slot}) \le 100 \cdot 10^{-13} = 10^{-11}

(a) In the birthday paradox with 365 days, what is the probability that no two people in a room of 24 people share a birthday?

(b) Using the complement, what is the probability that at least two people share a birthday?

(c) How is the birthday paradox related to the “balls and bins” framework?

(d) Explain why this probability is counterintuitive to many people.

Solution

(a) The probability that all 24 people have different birthdays is:

Pr(all different)=3653653643653633653423650.4616\Pr(\text{all different}) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{342}{365} \approx 0.4616

(b) Using the complement rule:

Pr(at least 2 share)=10.46160.538454%\Pr(\text{at least 2 share}) = 1 - 0.4616 \approx 0.5384 \approx 54\%

(c) In balls and bins, we have 24 balls (people) and 365 bins (days). The probability formula for no collisions is exactly the same as the birthday paradox.

(d) Many people intuitively think about the probability of sharing a specific birthday (which is low) rather than the probability of any two people sharing a birthday (which is much higher). With 24 people, there are (242)=276\binom{24}{2} = 276 pairs to compare, which is why collisions are likely.



Problem 10: Markov’s Inequality - First Concentration Inequality

Section titled “Problem 10: Markov’s Inequality - First Concentration Inequality”

(a) State Markov’s inequality and identify what information you need to apply it.

(b) Using Markov’s inequality, if the expected query time for hashing is E[Q]=1E[Q] = 1 and you want to bound Pr(Q>50)\Pr(Q > 50), what is the bound?

(c) Why is Markov’s inequality weaker than Chebyshev’s inequality for the same random variable?

Solution

(a) Markov’s Inequality: For a non-negative random variable XX and threshold t>0t > 0:

Pr(X>t)E[X]t\Pr(X > t) \le \frac{E[X]}{t}

You only need to know the expectation E[X]E[X], and XX must be non-negative.

(b) With E[Q]=1E[Q] = 1 and t=50t = 50:

Pr(Q>50)150=0.02=2%\Pr(Q > 50) \le \frac{1}{50} = 0.02 = 2\%

(c) Markov uses only the expectation, while Chebyshev uses both expectation and variance. If the variance is small, Chebyshev gives a much tighter bound. In the same example, Chebyshev would give Pr(Q>50)Var(Q)502\Pr(Q > 50) \le \frac{\text{Var}(Q)}{50^2}, which is typically much smaller than 1/501/50 when variance is small.


(a) State Chebyshev’s inequality.

(b) For a random variable with E[X]=50E[X] = 50 and Var(X)=25\text{Var}(X) = 25, use Chebyshev’s inequality to bound Pr(X5010)\Pr(|X - 50| \geq 10).

(c) What does Chebyshev’s inequality tell you about how concentrated a random variable is around its mean?

Solution

(a) Chebyshev’s Inequality: For any random variable XX and threshold t>0t > 0:

Pr(XE[X]t)Var(X)t2\Pr(|X - E[X]| \geq t) \le \frac{\text{Var}(X)}{t^2}

(b) With E[X]=50E[X] = 50, Var(X)=25\text{Var}(X) = 25, and t=10t = 10:

Pr(X5010)25102=25100=14\Pr(|X - 50| \geq 10) \le \frac{25}{10^2} = \frac{25}{100} = \frac{1}{4}

(c) Chebyshev’s inequality says that the probability of being far from the mean (by more than tt units) is bounded by Var(X)/t2\text{Var}(X) / t^2. Small variance means the random variable is tightly concentrated around its mean. As tt increases, the probability of large deviations decreases quadratically.


(a) State Chernoff bounds for a sum of independent Bernoulli random variables.

(b) For XBinomial(1000,1/2)X \sim \text{Binomial}(1000, 1/2) with μ=E[X]=500\mu = E[X] = 500, use the simplified Chernoff bound for 0<δ10 < \delta \leq 1 to bound Pr(X550)\Pr(X \geq 550).

(c) Compare this to what Chebyshev’s inequality would give for the same event. Which bound is tighter?

Solution

(a) Chernoff Bounds: For independent Bernoulli random variables X1,,XnX_1, \ldots, X_n with X=XiX = \sum X_i and μ=E[X]\mu = E[X]:

For 0<δ10 < \delta \leq 1:

Pr(X(1+δ)μ)eμδ2/3\Pr(X \geq (1+\delta)\mu) \le e^{-\mu\delta^2/3}

(b) We have Pr(X550)=Pr(X(1+0.1)500)\Pr(X \geq 550) = \Pr(X \geq (1 + 0.1) \cdot 500), so δ=0.1\delta = 0.1.

Pr(X550)e500(0.1)2/3=e5/30.188\Pr(X \geq 550) \le e^{-500 \cdot (0.1)^2 / 3} = e^{-5/3} \approx 0.188

(c) For Chebyshev with Var(X)=250\text{Var}(X) = 250 and deviation t=50t = 50:

Pr(X50050)250502=0.1\Pr(|X - 500| \geq 50) \le \frac{250}{50^2} = 0.1

Chernoff gives 0.188\approx 0.188 while Chebyshev gives 0.10.1. Chebyshev is tighter in this case, but Chernoff becomes exponentially better for larger deviations.


Problem 13: Comparing Markov, Chebyshev, and Chernoff

Section titled “Problem 13: Comparing Markov, Chebyshev, and Chernoff”

Compare three concentration bounds (Markov, Chebyshev, Chernoff) applied to the maximum load in balls-and-bins with nn balls and nn bins.

(a) State each inequality and the requirements for applying it.

(b) For n=1000n = 1000, compute each bound on Pr(max load2lnn)\Pr(\text{max load} \geq 2 \ln n).

(c) Explain why Chernoff is exponentially tighter than the others.

(d) Which bound would you use in practice, and why?

Solution

(a)

Markov’s Inequality: Pr(Xt)E[X]t\Pr(X \geq t) \le \frac{E[X]}{t} - Requires only E[X]E[X] (non-negative RV).

Chebyshev’s Inequality: Pr(XE[X]c)Var(X)c2\Pr(|X - E[X]| \geq c) \le \frac{\text{Var}(X)}{c^2} - Requires both E[X]E[X] and Var(X)\text{Var}(X).

Chernoff Bound: Pr(X(1+δ)μ)eμδ2/3\Pr(X \geq (1+\delta)\mu) \le e^{-\mu \delta^2/3} - Requires XX is a sum of independent Bernoulli RVs.

(b) For n=1000n = 1000, bounding Pr(max load2ln1000)=Pr(M13.8)\Pr(\text{max load} \geq 2 \ln 1000) = \Pr(M \geq 13.8):

Markov: Pr(M13.8)6.913.80.5\Pr(M \geq 13.8) \le \frac{6.9}{13.8} \approx 0.5

Chebyshev: Pr(M13.8)nVar(Li)(13.81)26.1\Pr(M \geq 13.8) \le n \cdot \frac{\text{Var}(L_i)}{(13.8 - 1)^2} \approx 6.1 (absurd)

Chernoff: Pr(L13.8)e541024\Pr(L \geq 13.8) \le e^{-54} \approx 10^{-24}, Union bound: 102110^{-21}

(c) Why Chernoff is exponentially better:

  • Markov uses only expectation (weakest info)
  • Chebyshev uses expectation + variance (better, but still limited)
  • Chernoff exploits the structure: when X=XiX = \sum X_i with independent Bernoullis, concentration is exponential

(d) In practice:

Use Chernoff when: You’re analyzing sums of independent Bernoulli RVs (common in randomized algorithms) or need tight bounds for large-scale systems.

Use Chebyshev when: You don’t have independence (only pairwise is needed) or the distribution is unknown.

Use Markov when: You only know the expectation or need a simple, quick bound.

For balls-and-bins, Chernoff is the right choice because the load is literally a sum of independent Bernoullis.


Problem 14: Comparing Markov, Chebyshev, and Chernoff with Specific Example

Section titled “Problem 14: Comparing Markov, Chebyshev, and Chernoff with Specific Example”

Consider XBinomial(100,1/2)X \sim \text{Binomial}(100, 1/2) with E[X]=50E[X] = 50 and Var(X)=25\text{Var}(X) = 25.

(a) Use all three inequalities (Markov, Chebyshev, Chernoff) to bound Pr(X75)\Pr(X \geq 75).

(b) Rank the three bounds from loosest to tightest. Which inequality is most useful for this tail event, and why?

(c) As we ask for increasingly extreme deviations (e.g., Pr(X100)\Pr(X \geq 100)), which inequality’s advantage grows?

Solution

(a)

  • Markov: Pr(X75)5075=0.667\Pr(X \geq 75) \le \frac{50}{75} = 0.667
  • Chebyshev: Pr(X5025)25252=0.04\Pr(|X - 50| \geq 25) \le \frac{25}{25^2} = 0.04
  • Chernoff: With δ=0.5\delta = 0.5, we get Pr(X75)e50(0.5)2/30.015\Pr(X \geq 75) \le e^{-50 \cdot (0.5)^2/3} \approx 0.015

(b) Ranking from loosest to tightest: Markov (0.667) > Chebyshev (0.04) > Chernoff (0.015).

Chernoff is most useful because it exploits the structure of independent Bernoullis and gives exponentially small bounds for tail events.

(c) As deviations grow larger, Chernoff’s exponential decay becomes increasingly powerful. For instance, at X100X \geq 100, Chernoff still gives an exponentially small bound while Markov and Chebyshev grow much more slowly.



Consider the Coupon Collector problem: you want to collect all 6 unique Pokemon characters by buying cereal boxes. Each box contains one random character with equal probability.

(a) If you already have 3 unique characters, what is the probability that the next box gives you a new character?

(b) Let XiX_i be the number of boxes bought after collecting i1i-1 characters but before collecting the ii-th character. What distribution does XiX_i follow, and what is E[Xi]E[X_i] for i=4i = 4?

(c) What is the expected total number of boxes needed to collect all 6 characters?

Solution

(a) You have 3 characters, so 3 are new. The probability of getting a new character is 636=12\frac{6-3}{6} = \frac{1}{2}.

(b) XiX_i follows a geometric distribution with success probability p=6(i1)6p = \frac{6 - (i-1)}{6}. For i=4i = 4:

p=36=12,E[X4]=1p=2p = \frac{3}{6} = \frac{1}{2}, \quad E[X_4] = \frac{1}{p} = 2

(c) By linearity of expectation:

E[X]=i=16E[Xi]=6(1+12+13+14+15+16)14.7 boxesE[X] = \sum_{i=1}^{6} E[X_i] = 6 \left(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6}\right) \approx 14.7 \text{ boxes}

Problem 16: Coupon Collector - General Analysis

Section titled “Problem 16: Coupon Collector - General Analysis”

In the Coupon Collector problem, you repeatedly draw coupons uniformly at random from a set of UU distinct types. The goal is to understand how many draws are needed to collect at least one coupon of every type.

(a) Let WiW_i denote the number of coupons needed to go from having i1i-1 distinct types to ii distinct types. What distribution does WiW_i follow?

(b) Compute E[Wi]E[W_i] and derive the total expected number of coupons E[W]E[W] needed to collect all UU types.

(c) For U=365U = 365 (as in the birthday problem analog), compute the expected total number of coupons needed to see all 365 types.

Solution

(a) When we have i1i-1 types, there are U(i1)U - (i-1) unseen types out of UU total. Each drawn coupon is new with probability pi=U(i1)Up_i = \frac{U - (i-1)}{U}. Thus WiW_i follows a geometric distribution with success probability pip_i.

(b) For a geometric RV with success probability pip_i:

E[Wi]=1pi=UU(i1)E[W_i] = \frac{1}{p_i} = \frac{U}{U - (i-1)}

By linearity of expectation:

E[W]=i=1UE[Wi]=Uj=1U1j=UHUE[W] = \sum_{i=1}^{U} E[W_i] = U \sum_{j=1}^{U} \frac{1}{j} = U \cdot H_U

For large UU: E[W]Uln(U)E[W] \approx U \ln(U).

(c) For U=365U = 365:

E[W]=365H365365(ln365+0.577)2364 couponsE[W] = 365 \cdot H_{365} \approx 365 \cdot (\ln 365 + 0.577) \approx 2364 \text{ coupons}

Problem 17: Applying Concentration Inequalities to Collision Analysis

Section titled “Problem 17: Applying Concentration Inequalities to Collision Analysis”

In FKS hashing, we hash nn keys into nn buckets. Let bib_i be the number of keys in bucket ii, and let C=ibi(bi1)C = \sum_i b_i(b_i-1) be the total number of collisions (ordered pairs in the same bucket). We know C=bi2n2C = \frac{\sum b_i^2 - n}{2}.

(a) When hashing nn keys uniformly into nn buckets, compute E[C]E[C] by considering: for each key xjx_j, how many other keys collide with xjx_j in expectation?

(b) Use Markov’s inequality to bound Pr(bi2>4n)\Pr(\sum b_i^2 > 4n) given that E[bi2]<2nE[\sum b_i^2] < 2n.

(c) Suppose instead we use Chebyshev. If Var(bi2)=O(n2)\text{Var}(\sum b_i^2) = O(n^2), can Chebyshev give a strong enough bound to show the probability is less than 1/2? Discuss why or why not.

Solution

(a) For key xjx_j, there are n1n-1 other keys. Each collides with xjx_j with probability 1/n1/n. So:

E[collisions with xj]=(n1)1n<1E[\text{collisions with } x_j] = (n-1) \cdot \frac{1}{n} < 1

Summing over all nn keys: E[C]<nE[C] < n.

(b) Using C=bi2n2C = \frac{\sum b_i^2 - n}{2} and E[C]<nE[C] < n, we have E[bi2]<2nE[\sum b_i^2] < 2n.

By Markov:

Pr(bi2>4n)E[bi2]4n<2n4n=12\Pr(\sum b_i^2 > 4n) \le \frac{E[\sum b_i^2]}{4n} < \frac{2n}{4n} = \frac{1}{2}

(c) Chebyshev would give:

Pr(bi22n2n)Var(bi2)(2n)2=O(n2)4n2=O(1)\Pr(|\sum b_i^2 - 2n| \geq 2n) \le \frac{\text{Var}(\sum b_i^2)}{(2n)^2} = \frac{O(n^2)}{4n^2} = O(1)

This is not strong enough. This illustrates why Markov’s bound is actually better for this problem: the deviation we care about (4n4n vs. mean of 2n2n) is a factor of 2, and Markov is perfectly calibrated for this regime.



Linear probing resolves collisions by placing keys in consecutive empty slots: if slot h(x)h(x) is occupied, try h(x)+1h(x) + 1, h(x)+2h(x) + 2, etc.

(a) Describe the insertion and query operations in linear probing.

(b) What is “primary clustering,” and why does it occur?

(c) Using Donald Knuth’s analysis, what is the expected query time for a successful lookup with load factor α=1/5\alpha = 1/5?

Solution

(a) Insertion of key xx:

  1. Compute h(x)h(x)
  2. If slot h(x)h(x) is empty, insert xx there
  3. Otherwise, probe h(x)+1,h(x)+2,h(x) + 1, h(x) + 2, \ldots until finding an empty slot

Query for key qq:

  1. Compute h(q)h(q)
  2. Probe h(q),h(q)+1,h(q)+2,h(q), h(q) + 1, h(q) + 2, \ldots until finding qq or an empty slot

(b) Primary clustering is the formation of long contiguous runs of occupied slots.

Why it occurs: A key xx hashes to slot h(x)h(x). If occupied, it’s placed at the next free slot. Any future key yy with h(y)h(y) in the occupied run probes into it, extending the run further. The run grows like a “snowball.”

(c) By Donald Knuth’s analysis, the expected query time for a successful lookup is:

E[Succ. Query]=1+11αE[\text{Succ. Query}] = 1 + \frac{1}{1 - \alpha}

With α=1/5\alpha = 1/5:

E[Succ. Query]=1+14/5=1+1.25=2.25E[\text{Succ. Query}] = 1 + \frac{1}{4/5} = 1 + 1.25 = 2.25

Problem 19: Linear Probing - Detailed Analysis

Section titled “Problem 19: Linear Probing - Detailed Analysis”

In a hash table with mm cells and nn keys (load factor α=n/m\alpha = n/m), let LiL_i be the length of chain ii.

(a) What is E[Li]E[L_i]?

(b) For α=0.5\alpha = 0.5 and m=1000m = 1000, use Chernoff bounds to bound the probability that any chain exceeds length 3.

(c) Using Chebyshev’s inequality, explain why a union bound on the variance would not give a tight bound for this problem.

Solution

(a) Each key independently hashes to cell ii with probability 1/m1/m. Thus LiBinomial(n,1/m)L_i \sim \text{Binomial}(n, 1/m).

E[Li]=n1m=αE[L_i] = n \cdot \frac{1}{m} = \alpha

(b) For α=0.5\alpha = 0.5 and m=1000m = 1000, we have n=500n = 500 keys. For a single cell with E[Li]=0.5E[L_i] = 0.5:

Pr(Li>3)=Pr(Li>(1+δ)E[Li]) where δ=5\Pr(L_i > 3) = \Pr(L_i > (1 + \delta) E[L_i]) \text{ where } \delta = 5

By Chernoff:

Pr(Li>3)e0.525/30.015\Pr(L_i > 3) \le e^{-0.5 \cdot 25/3} \approx 0.015

By union bound over m=1000m = 1000 cells:

Pr( cell >3)10000.015=15\Pr(\exists \text{ cell } > 3) \le 1000 \cdot 0.015 = 15

(c) For a chain with LiBinomial(n,1/m)L_i \sim \text{Binomial}(n, 1/m), we have Var(Li)0.5\text{Var}(L_i) \approx 0.5. Using Chebyshev to bound Pr(Li>3)\Pr(L_i > 3):

Pr(LiE[Li]2.5)0.56.250.08\Pr(L_i - E[L_i] \geq 2.5) \le \frac{0.5}{6.25} \approx 0.08

By union bound:

Pr( cell >3)10000.08=80\Pr(\exists \text{ cell } > 3) \le 1000 \cdot 0.08 = 80

This is completely useless (bound > 1). Chebyshev only uses expectation and variance, while Chernoff exploits the structure of sums of independent Bernoullis, giving exponentially tighter bounds.



FKS hashing uses a two-level approach: a primary hash table with secondary hash tables in each bucket.

(a) Why does FKS guarantee O(1)O(1) worst-case lookup time?

(b) What is the expected preprocessing time for FKS, and how does it compare to hashing with chaining?

Solution

(a) FKS uses:

  • Primary table: Hash nn keys into m=nm = n buckets using function h1h_1
  • Secondary tables: For each bucket ii with bib_i keys, use a secondary table of size mi=2bi2m_i = 2b_i^2 with function h2(i)h_2^{(i)}

This guarantees zero collisions at the secondary level with positive probability, so:

  • Query = 1 probe in primary + 1 probe in secondary = O(1)O(1) worst-case

(b)

  • Hashing with chaining: O(n)O(n) worst-case preprocessing (insert each key once)
  • FKS: O(n)O(n) expected preprocessing

The difference: FKS may need to retry hash functions if the choice is unlucky (i.e., bi2>4n\sum b_i^2 > 4n), but by Markov’s inequality, the expected number of retries is constant.


Compare Hashing with Chaining (HWC) and FKS Hashing.

(a) What are the query time guarantees for each?

(b) What are the preprocessing time guarantees for each?

(c) When would you choose HWC over FKS, and vice versa?

Solution

(a) Query time:

  • HWC: E[Q]=O(1+α)E[Q] = O(1 + \alpha) (expected). Worst-case is O(n)O(n).
  • FKS: O(1)O(1) worst-case (exactly 2 probes)

(b) Preprocessing time:

  • HWC: O(n)O(n) always (insert each key once)
  • FKS: O(n)O(n) expected (may retry hash functions, but expected constant retries)

(c)

Choose HWC when: Simplicity and ease of implementation matter, insertions and deletions are frequent, the load factor might be high, or constant factors are important.

Choose FKS when: Worst-case query time guarantees are critical, the set is static or rarely updated, or you need theoretical guarantees.



Problem 22: Bloom Filters - Basic Operations

Section titled “Problem 22: Bloom Filters - Basic Operations”

A Bloom filter uses mm bits and kk hash functions to test set membership with false positives but no false negatives.

(a) Describe the insert and query operations.

(b) Why is the false positive probability approximately (1ekn/m)k\left(1 - e^{-kn/m}\right)^k?

(c) What is the optimal number of hash functions kk to minimize false positive rate?

Solution

(a) Insert element xx into set SS:

  1. Compute h1(x),h2(x),,hk(x)h_1(x), h_2(x), \ldots, h_k(x)
  2. Set bits at these positions to 11

Query whether xSx \in S:

  1. Compute h1(x),h2(x),,hk(x)h_1(x), h_2(x), \ldots, h_k(x)
  2. If all bits at these positions are 11, return “possibly in SS
  3. If any bit is 00, return “definitely not in SS

(b) After inserting nn elements with kk hash functions:

The probability a specific bit is not set by all knkn hash function invocations:

Pr[bit is 0]=(11m)knekn/m\Pr[\text{bit is } 0] = \left(1 - \frac{1}{m}\right)^{kn} \approx e^{-kn/m}

So the probability a bit is set to 11:

Pr[bit is 1]=1ekn/m\Pr[\text{bit is } 1] = 1 - e^{-kn/m}

A false positive occurs when all kk queried bits are 11:

Pr[false positive]=(1ekn/m)k\Pr[\text{false positive}] = \left(1 - e^{-kn/m}\right)^k

(c) To minimize the false positive probability, take the derivative with respect to kk and set it to zero. The optimal value is:

k=mnln2k = \frac{m}{n} \ln 2

At this optimal kk, the false positive probability becomes:

(12)k\left(\frac{1}{2}\right)^k

Problem 23: Bloom Filters - Comprehensive Analysis

Section titled “Problem 23: Bloom Filters - Comprehensive Analysis”

A Bloom filter uses m=100,000m = 100,000 bits and k=7k = 7 hash functions to store a set of n=10,000n = 10,000 elements.

(a) What is the expected number of bits set to 1?

(b) Estimate the false positive probability.

(c) If you want to reduce the false positive rate to ε=0.001\varepsilon = 0.001, how should you adjust mm and/or kk?

Solution

(a) Each of the kn=70,000kn = 70,000 hash function evaluations sets a bit to 1.

The probability a given bit remains 0 is:

Pr(bit 0)=(11/m)knekn/m=e0.70.497\Pr(\text{bit } 0) = (1 - 1/m)^{kn} \approx e^{-kn/m} = e^{-0.7} \approx 0.497

Expected number of bits set to 1:

E[bits set]=m(1ekn/m)50,300E[\text{bits set}] = m \cdot (1 - e^{-kn/m}) \approx 50,300

(b) For a query of a non-member, all kk hash function positions must coincidentally be set to 1:

Pr(false positive)=(1ekn/m)k=(0.503)70.69%\Pr(\text{false positive}) = (1 - e^{-kn/m})^k = (0.503)^7 \approx 0.69\%

(c) From the formula Pr(FP)=(1ekn/m)kε\Pr(\text{FP}) = (1 - e^{-kn/m})^k \le \varepsilon:

For fixed n=10,000n = 10,000, to achieve ε=0.001\varepsilon = 0.001, we need to increase mm.

At the optimal k=mnln2k = \frac{m}{n} \ln 2, solving gives:

m184,000 bits,k13 hash functionsm \approx 184,000 \text{ bits}, \quad k \approx 13 \text{ hash functions}

Problem 24: Bloom Filter False Positive Dynamics

Section titled “Problem 24: Bloom Filter False Positive Dynamics”

In Bloom filters, how does the false positive rate change as more elements are inserted?

(a) As nn increases, does the false positive probability increase or decrease?

(b) If you fix mm (the number of bits) and insert more and more elements, what happens to the false positive rate?

(c) How should you adjust mm and kk as nn grows to maintain a constant false positive rate?

Solution

(a) The false positive probability increases as nn increases.

With nn elements inserted and optimal k=mnln2k = \frac{m}{n} \ln 2:

Pr[false positive]=(12)k=(12)(m/n)ln2\Pr[\text{false positive}] = \left(\frac{1}{2}\right)^k = \left(\frac{1}{2}\right)^{(m/n) \ln 2}

As nn increases, the exponent (m/n)ln2(m/n) \ln 2 decreases (assuming fixed mm), so the base (1/2)(1/2) raised to a smaller power gives a larger result.

(b) If you fix mm and keep inserting elements:

  • More elements \to more bits are set to 11
  • Higher fraction of 11 bits \to higher chance all kk queried bits are 11
  • False positive rate increases monotonically

(c) To maintain a constant false positive rate ε\varepsilon as nn grows:

Solution: Increase mm proportionally to nn while adjusting kk to maintain the target rate.

In practice:

  • Space: Use O(nlog(1/ε))O(n \log(1/\varepsilon)) bits (linear in nn, logarithmic in target error)
  • Hash functions: k=mnln2k = \frac{m}{n} \ln 2 (adjusts automatically as m/nm/n is fixed)


Problem 25: Linear Probing vs Bloom Filters

Section titled “Problem 25: Linear Probing vs Bloom Filters”

Compare Linear Probing and Bloom Filters.

(a) What is the main difference in what they do?

(b) In terms of space, which is more efficient for very large sets?

(c) When would you use Linear Probing vs Bloom Filter?

Solution

(a) Linear Probing:

  • Solves the dictionary problem: store nn keys and support insert, delete, lookup with exact answers
  • Uses O(n)O(n) space
  • Gives exact answers

Bloom Filter:

  • Solves the approximate membership problem: test if element is in set with false positives
  • Uses O(nlog(1/ε))O(n \log(1/\varepsilon)) space
  • Gives probabilistic answers (no false negatives, but false positives allowed)

(b) Space comparison: For a set of nn elements from a large universe:

  • Linear Probing: Must store each full key. Space is O(nlogU)O(n \log U) bits
  • Bloom Filter: Uses only O(nlog(1/ε))O(n \log(1/\varepsilon)) bits, independent of universe size

For large UU, Bloom filter is exponentially more space-efficient.

(c)

Use Linear Probing when: You need exact membership testing, you need to support insertions and deletions, the set fits in memory, or false positives are unacceptable.

Use Bloom Filter when: Space is critical, false positives are tolerable, you only need membership testing (no need to retrieve associated data), or the universe is very large.


FKS Hashing vs. Bloom Filters: which should you use for a membership test on nn elements?

(a) Compare the space used by each.

(b) Compare the query time of each.

(c) What are the key tradeoffs?

Solution

(a) Space:

  • FKS: O(n)O(n) words (each word stores a key). Total: O(nlogU)O(n \log U) bits
  • Bloom Filter: O(nlog(1/ε))O(n \log(1/\varepsilon)) bits, independent of UU

For universe size U=232U = 2^{32} and error rate ε=0.01\varepsilon = 0.01:

  • FKS: 32n32n bits
  • Bloom: 10n\approx 10n bits (3× smaller)

(b) Query time:

  • FKS: O(1)O(1) worst-case (exactly 2 probes)
  • Bloom: O(k)O(k) where k=O(log(1/ε))k = O(\log(1/\varepsilon)) hash function evaluations

(c) Tradeoffs:

AspectFKSBloom
ExactnessExact answersFalse positives allowed
SpaceO(nlogU)O(n \log U) bitsO(nlog(1/ε))O(n \log(1/\varepsilon)) bits
SpeedO(1)O(1) worst-caseO(log(1/ε))O(\log(1/\varepsilon)) hashes
DynamismHard to insert/deleteCannot delete
Use caseExact dictionarySpace-critical membership

Decision rule:

  • Use FKS for exact membership when you can afford the space
  • Use Bloom when space is critical and false positives are acceptable