CSCI 328 Midterm Fundamentals Problem Sets

Problem 1: Expectation Basics

(a) If we flip a fair coin 100 times, what is the expected number of heads?

(b) Using linearity of expectation, explain why the expectation of a binomial random variable $X \sim \text{Binomial}(n, p)$ is $np$ .

(c) If the expected number of heads is 50, what is the expected number of tails?

Solution

(a) For $X \sim \text{Binomial}(100, 1/2)$ , we have $E[X] = np = 100 \cdot (1/2) = 50$ heads.

(b) We can write $X = X_1 + X_2 + \cdots + X_n$ where each $X_i \sim \text{Bernoulli}(p)$ . By linearity of expectation (which holds even without independence):

E[X] = E[X_1] + E[X_2] + \cdots + E[X_n] = p + p + \cdots + p = np

(c) The number of tails is $100 - X$ , so $E[100 - X] = 100 - E[X] = 100 - 50 = 50$ tails.

Problem 2: Bernoulli Random Variables and Variance

Consider flipping a fair coin once. Let $X = 1$ if we get heads, and $X = 0$ if we get tails.

(a) What is $E[X]$ and $\text{Var}(X)$ ?

(b) Now flip the coin 100 times. Let $Y$ be the total number of heads. Express $Y$ as a sum of independent Bernoulli random variables and use linearity to find $E[Y]$ and $\text{Var}(Y)$ .

(c) For $n = 1000$ coin flips with $Y \sim \text{Binomial}(1000, 1/2)$ , compute $E[Y]$ and $\text{Var}(Y)$ .

Solution

(a) $X \sim \text{Bernoulli}(1/2)$ , so:

E[X] = \frac{1}{2}, \quad \text{Var}(X) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}

(b) $Y = X_1 + X_2 + \cdots + X_{100}$ where each $X_i \sim \text{Bernoulli}(1/2)$ . By linearity:

E[Y] = \sum_{i=1}^{100} \frac{1}{2} = 50, \quad \text{Var}(Y) = \sum_{i=1}^{100} \frac{1}{4} = 25

(c) For $n = 1000$ :

E[Y] = 1000 \cdot \frac{1}{2} = 500, \quad \text{Var}(Y) = 1000 \cdot \frac{1}{4} = 250

Problem 3: Hashing with Chaining - Indicator Variables

In hashing with chaining, we query for an element $q$ . Let $X_i$ be an indicator variable: $X_i = 1$ if element $x_i$ collides with $q$ (i.e., $h(q) = h(x_i)$ ), and $X_i = 0$ otherwise. Assume the hash function is perfectly random.

(a) If we have $n$ elements and $m$ buckets, what is $\Pr(X_i = 1)$ for any element $x_i$ ?

(b) The query time is $Q = \sum_{i=1}^{n} X_i$ . Using linearity of expectation, what is $E[Q]$ ?

(c) If $m = n$ , what should $\text{Var}(Q)$ be approximately?

Solution

(a) Since the hash function is perfectly random, each element hashes to each of the $m$ buckets with equal probability:

\Pr(X_i = 1) = \Pr(h(q) = h(x_i)) = \frac{1}{m}

(b) Each $X_i$ is a Bernoulli random variable with $E[X_i] = 1/m$ . By linearity of expectation:

E[Q] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{1}{m} = \frac{n}{m}

(c) For a Bernoulli variable with $p = 1/m$ :

\text{Var}(X_i) = p(1-p) = \frac{1}{m}\left(1 - \frac{1}{m}\right)

Assuming independence:

\text{Var}(Q) = \sum_{i=1}^{n} \text{Var}(X_i) = n \cdot \frac{1}{m}\left(1 - \frac{1}{m}\right)

If $m = n$ , then $\text{Var}(Q) \approx 1$ (for large $n$ ).

Problem 4: Hashing with Chaining - Expected Query Time

For hashing with chaining, assume $n$ keys are stored in $m$ buckets using a perfectly random hash function.

(a) What is the expected length of a chain?

(b) If $m = n$ (load factor $\alpha = 1$ ), what is $E[Q]$ where $Q$ is the query time?

(c) Using Markov’s inequality, bound $\Pr(Q > 10)$ when $E[Q] = 1$ .

Solution

(a) Each of the $n$ keys hashes to one of $m$ buckets uniformly at random. By linearity of expectation:

E[\text{chain length}] = \frac{n}{m}

(b) With $m = n$ :

E[Q] = \frac{n}{n} = 1

(c) By Markov’s inequality:

\Pr(Q > 10) \le \frac{E[Q]}{10} = \frac{1}{10} = 0.1

Problem 5: Hash Function Properties

A hash function is “perfectly random” if $\Pr(h(x) = i) = 1/m$ for any key $x$ and bucket $i$ .

(a) What does it mean for a hash function to be perfectly random?

(b) Why does the theoretical analysis of hashing depend on this assumption?

(c) If a hash function is biased (not perfectly random), how would it affect the expected query time?

Solution

(a) A perfectly random hash function maps each key uniformly to any of the $m$ buckets with equal probability $1/m$ , independent of other keys.

(b) The $O(1 + \alpha)$ expected query time analysis assumes that each key is equally likely to land in any bucket, collisions are independent, and the expected chain length is $n/m$ . Without perfect randomness, some buckets might be overloaded.

(c) If the hash function is biased, some buckets receive more keys than expected, expected chain length increases above $n/m$ , and expected query time becomes worse than $O(1 + \alpha)$ . In the worst case (all keys hash to one bucket), query time is $O(n)$ .

Problem 6: Balls and Bins - Foundational Analysis

You throw $n = 1000$ balls (keys) uniformly at random into $n = 1000$ bins (hash slots).

(a) What is the expected load (number of balls per bin)?

(b) What is the expected number of empty bins?

(c) Using Chernoff bounds, estimate the probability that a particular bin receives at least 10 balls.

(d) Using a union bound, estimate the probability that the maximum load exceeds 10.

Solution

(a) Each ball lands in a uniformly random bin. By linearity of expectation:

E[\text{load}] = n \cdot \frac{1}{n} = 1

(b) A bin is empty if no balls land in it. The probability a given bin is empty is:

\Pr(\text{empty}) = (1 - 1/n)^n \approx e^{-1} \approx 0.368

Expected number of empty bins:

E[\text{empty bins}] = n \cdot e^{-1} = 1000 \cdot 0.368 \approx 368

(c) Let $X$ = load of a particular bin. Then $X \sim \text{Binomial}(n, 1/n)$ with $E[X] = 1$ .

We want $\Pr(X \geq 10)$ . With $(1 + \delta) \cdot 1 = 10$ , we have $\delta = 9$ :

\Pr(X \geq 10) = \Pr(X \geq (1+9) \cdot E[X]) \le e^{-E[X] \cdot 9^2 / 3} = e^{-27} \approx 10^{-12}

(d) The probability that any bin (out of $n = 1000$ ) exceeds load 10:

\Pr(\text{max load} \geq 10) \le n \cdot \Pr(X \geq 10) = 1000 \cdot 10^{-12} = 10^{-9}

Problem 7: Balls and Bins - Comprehensive Analysis

Balls and Bins: throw $n$ balls uniformly at random into $n$ bins.

(a) What is the probability that a specific bin receives no balls?

(b) By linearity of expectation, what is the expected number of empty bins?

(c) Write out the explicit product expression for the probability that all $n$ balls land in different bins (no collisions).

(d) Write the generalized closed form formula for the probability that all $n$ balls land in different bins when thrown into $n$ bins.

(e) Using the balls and bins framework, explain why collisions (two items hashing to the same bucket) are likely even when the number of items equals the number of buckets.

Solution

(a) Each ball has probability $1 - 1/n$ of missing a specific bin. With $n$ independent balls:

\Pr(\text{bin empty}) = \left(1 - \frac{1}{n}\right)^n \approx e^{-1} \approx 0.368

(b) By linearity of expectation, the expected number of empty bins is:

E[\text{\# empty}] = n \cdot \left(1 - \frac{1}{n}\right)^n \approx n \cdot e^{-1} \approx 0.368n

(c) The probability that all $n$ balls land in different bins is:

\Pr(\text{no collision}) = \frac{n}{n} \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdots \frac{1}{n} = \frac{n!}{n^n}

(d) The generalized closed form formula for $m$ balls and $n$ bins is:

\Pr(\text{no collision}) \approx e^{-m^2/2n}

For our case, where we have equal numbers of balls and bins ( $m = n$ ):

\Pr(\text{no collision}) \approx e^{-n^2/2n} = e^{-n/2}

(e) If the expected number of empty bins is about $0.368n$ , then the expected number of non-empty bins is about $0.632n < n$ . By pigeonhole principle, if fewer than $n$ bins are used for $n$ balls, some bins must have more than one ball.

Problem 8: Birthday Paradox and Hash Collisions

Suppose you have a hash table with $n = 100$ slots and you insert $k = 10$ keys using a random hash function.

(a) Using the birthday paradox, estimate the probability that at least two keys collide (hash to the same slot).

(b) Use Chernoff bounds to bound the probability that a particular slot receives 3 or more keys.

(c) Using a union bound, estimate the probability that any slot receives 3 or more keys.

Solution

(a) We want $\Pr(\text{at least one collision})$ . It’s easier to compute the complement:

\Pr(\text{no collision}) = \frac{n}{n} \cdot \frac{n-1}{n} \cdot \frac{n-2}{n} \cdots \frac{n-k+1}{n} = \prod_{i=0}^{k-1} \left(1 - \frac{i}{n}\right) \approx e^{-k(k-1)/(2n)}

With $n = 100$ and $k = 10$ :

\Pr(\text{no collision}) \approx e^{-10 \cdot 9 / 200} = e^{-0.45} \approx 0.638

\Pr(\text{at least one collision}) \approx 1 - 0.638 = 0.362 \approx 36\%

(b) Let $X$ = number of keys hashing to a particular slot. Then $X \sim \text{Binomial}(k, 1/n) = \text{Binomial}(10, 0.01)$ with $E[X] = k/n = 0.1$ .

We want $\Pr(X \geq 3)$ . Using Chernoff with $(1 + \delta)\mu = 3$ :

(1 + \delta) \cdot 0.1 = 3 \implies \delta = 29

\Pr(X \geq 3) \le e^{-0.1 \cdot 29^2 / 3} = e^{-28.03} \approx 10^{-13}

(c) The probability that any of the $n = 100$ slots has 3 or more keys is at most:

\Pr(\exists \text{ slot with } \geq 3 \text{ keys}) \le n \cdot \Pr(X \geq 3 \text{ for a given slot}) \le 100 \cdot 10^{-13} = 10^{-11}

Problem 9: Classic Birthday Paradox

(a) In the birthday paradox with 365 days, what is the probability that no two people in a room of 24 people share a birthday?

(b) Using the complement, what is the probability that at least two people share a birthday?

(c) How is the birthday paradox related to the “balls and bins” framework?

(d) Explain why this probability is counterintuitive to many people.

Solution

(a) The probability that all 24 people have different birthdays is:

\Pr(\text{all different}) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdots \frac{342}{365} \approx 0.4616

(b) Using the complement rule:

\Pr(\text{at least 2 share}) = 1 - 0.4616 \approx 0.5384 \approx 54\%

(c) In balls and bins, we have 24 balls (people) and 365 bins (days). The probability formula for no collisions is exactly the same as the birthday paradox.

(d) Many people intuitively think about the probability of sharing a specific birthday (which is low) rather than the probability of any two people sharing a birthday (which is much higher). With 24 people, there are $\binom{24}{2} = 276$ pairs to compare, which is why collisions are likely.

Problem 10: Markov’s Inequality - First Concentration Inequality

(a) State Markov’s inequality and identify what information you need to apply it.

(b) Using Markov’s inequality, if the expected query time for hashing is $E[Q] = 1$ and you want to bound $\Pr(Q > 50)$ , what is the bound?

(c) Why is Markov’s inequality weaker than Chebyshev’s inequality for the same random variable?

Solution

(a) Markov’s Inequality: For a non-negative random variable $X$ and threshold $t > 0$ :

\Pr(X > t) \le \frac{E[X]}{t}

You only need to know the expectation $E[X]$ , and $X$ must be non-negative.

(b) With $E[Q] = 1$ and $t = 50$ :

\Pr(Q > 50) \le \frac{1}{50} = 0.02 = 2\%

(c) Markov uses only the expectation, while Chebyshev uses both expectation and variance. If the variance is small, Chebyshev gives a much tighter bound. In the same example, Chebyshev would give $\Pr(Q > 50) \le \frac{\text{Var}(Q)}{50^2}$ , which is typically much smaller than $1/50$ when variance is small.

Problem 11: Chebyshev’s Inequality

(a) State Chebyshev’s inequality.

(b) For a random variable with $E[X] = 50$ and $\text{Var}(X) = 25$ , use Chebyshev’s inequality to bound $\Pr(|X - 50| \geq 10)$ .

(c) What does Chebyshev’s inequality tell you about how concentrated a random variable is around its mean?

Solution

(a) Chebyshev’s Inequality: For any random variable $X$ and threshold $t > 0$ :

\Pr(|X - E[X]| \geq t) \le \frac{\text{Var}(X)}{t^2}

(b) With $E[X] = 50$ , $\text{Var}(X) = 25$ , and $t = 10$ :

\Pr(|X - 50| \geq 10) \le \frac{25}{10^2} = \frac{25}{100} = \frac{1}{4}

(c) Chebyshev’s inequality says that the probability of being far from the mean (by more than $t$ units) is bounded by $\text{Var}(X) / t^2$ . Small variance means the random variable is tightly concentrated around its mean. As $t$ increases, the probability of large deviations decreases quadratically.

Problem 12: Chernoff Bounds

(a) State Chernoff bounds for a sum of independent Bernoulli random variables.

(b) For $X \sim \text{Binomial}(1000, 1/2)$ with $\mu = E[X] = 500$ , use the simplified Chernoff bound for $0 < \delta \leq 1$ to bound $\Pr(X \geq 550)$ .

(c) Compare this to what Chebyshev’s inequality would give for the same event. Which bound is tighter?

Solution

(a) Chernoff Bounds: For independent Bernoulli random variables $X_1, \ldots, X_n$ with $X = \sum X_i$ and $\mu = E[X]$ :

For $0 < \delta \leq 1$ :

\Pr(X \geq (1+\delta)\mu) \le e^{-\mu\delta^2/3}

(b) We have $\Pr(X \geq 550) = \Pr(X \geq (1 + 0.1) \cdot 500)$ , so $\delta = 0.1$ .

\Pr(X \geq 550) \le e^{-500 \cdot (0.1)^2 / 3} = e^{-5/3} \approx 0.188

(c) For Chebyshev with $\text{Var}(X) = 250$ and deviation $t = 50$ :

\Pr(|X - 500| \geq 50) \le \frac{250}{50^2} = 0.1

Chernoff gives $\approx 0.188$ while Chebyshev gives $0.1$ . Chebyshev is tighter in this case, but Chernoff becomes exponentially better for larger deviations.

Problem 13: Comparing Markov, Chebyshev, and Chernoff

Compare three concentration bounds (Markov, Chebyshev, Chernoff) applied to the maximum load in balls-and-bins with $n$ balls and $n$ bins.

(a) State each inequality and the requirements for applying it.

(b) For $n = 1000$ , compute each bound on $\Pr(\text{max load} \geq 2 \ln n)$ .

(c) Explain why Chernoff is exponentially tighter than the others.

(d) Which bound would you use in practice, and why?

Solution

(a)

Markov’s Inequality: $\Pr(X \geq t) \le \frac{E[X]}{t}$ - Requires only $E[X]$ (non-negative RV).

Chebyshev’s Inequality: $\Pr(|X - E[X]| \geq c) \le \frac{\text{Var}(X)}{c^2}$ - Requires both $E[X]$ and $\text{Var}(X)$ .

Chernoff Bound: $\Pr(X \geq (1+\delta)\mu) \le e^{-\mu \delta^2/3}$ - Requires $X$ is a sum of independent Bernoulli RVs.

(b) For $n = 1000$ , bounding $\Pr(\text{max load} \geq 2 \ln 1000) = \Pr(M \geq 13.8)$ :

Markov: $\Pr(M \geq 13.8) \le \frac{6.9}{13.8} \approx 0.5$

Chebyshev: $\Pr(M \geq 13.8) \le n \cdot \frac{\text{Var}(L_i)}{(13.8 - 1)^2} \approx 6.1$ (absurd)

Chernoff: $\Pr(L \geq 13.8) \le e^{-54} \approx 10^{-24}$ , Union bound: $10^{-21}$

(c) Why Chernoff is exponentially better:

Markov uses only expectation (weakest info)
Chebyshev uses expectation + variance (better, but still limited)
Chernoff exploits the structure: when $X = \sum X_i$ with independent Bernoullis, concentration is exponential

(d) In practice:

Use Chernoff when: You’re analyzing sums of independent Bernoulli RVs (common in randomized algorithms) or need tight bounds for large-scale systems.

Use Chebyshev when: You don’t have independence (only pairwise is needed) or the distribution is unknown.

Use Markov when: You only know the expectation or need a simple, quick bound.

For balls-and-bins, Chernoff is the right choice because the load is literally a sum of independent Bernoullis.

Problem 14: Comparing Markov, Chebyshev, and Chernoff with Specific Example

Consider $X \sim \text{Binomial}(100, 1/2)$ with $E[X] = 50$ and $\text{Var}(X) = 25$ .

(a) Use all three inequalities (Markov, Chebyshev, Chernoff) to bound $\Pr(X \geq 75)$ .

(b) Rank the three bounds from loosest to tightest. Which inequality is most useful for this tail event, and why?

(c) As we ask for increasingly extreme deviations (e.g., $\Pr(X \geq 100)$ ), which inequality’s advantage grows?

Solution

(a)

Markov: $\Pr(X \geq 75) \le \frac{50}{75} = 0.667$
Chebyshev: $\Pr(|X - 50| \geq 25) \le \frac{25}{25^2} = 0.04$
Chernoff: With $\delta = 0.5$ , we get $\Pr(X \geq 75) \le e^{-50 \cdot (0.5)^2/3} \approx 0.015$

(b) Ranking from loosest to tightest: Markov (0.667) > Chebyshev (0.04) > Chernoff (0.015).

Chernoff is most useful because it exploits the structure of independent Bernoullis and gives exponentially small bounds for tail events.

(c) As deviations grow larger, Chernoff’s exponential decay becomes increasingly powerful. For instance, at $X \geq 100$ , Chernoff still gives an exponentially small bound while Markov and Chebyshev grow much more slowly.

Problem 15: Coupon Collector Problem

Consider the Coupon Collector problem: you want to collect all 6 unique Pokemon characters by buying cereal boxes. Each box contains one random character with equal probability.

(a) If you already have 3 unique characters, what is the probability that the next box gives you a new character?

(b) Let $X_i$ be the number of boxes bought after collecting $i-1$ characters but before collecting the $i$ -th character. What distribution does $X_i$ follow, and what is $E[X_i]$ for $i = 4$ ?

(c) What is the expected total number of boxes needed to collect all 6 characters?

Solution

(a) You have 3 characters, so 3 are new. The probability of getting a new character is $\frac{6-3}{6} = \frac{1}{2}$ .

(b) $X_i$ follows a geometric distribution with success probability $p = \frac{6 - (i-1)}{6}$ . For $i = 4$ :

p = \frac{3}{6} = \frac{1}{2}, \quad E[X_4] = \frac{1}{p} = 2

(c) By linearity of expectation:

E[X] = \sum_{i=1}^{6} E[X_i] = 6 \left(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6}\right) \approx 14.7 \text{ boxes}

Problem 16: Coupon Collector - General Analysis

In the Coupon Collector problem, you repeatedly draw coupons uniformly at random from a set of $U$ distinct types. The goal is to understand how many draws are needed to collect at least one coupon of every type.

(a) Let $W_i$ denote the number of coupons needed to go from having $i-1$ distinct types to $i$ distinct types. What distribution does $W_i$ follow?

(b) Compute $E[W_i]$ and derive the total expected number of coupons $E[W]$ needed to collect all $U$ types.

(c) For $U = 365$ (as in the birthday problem analog), compute the expected total number of coupons needed to see all 365 types.

Solution

(a) When we have $i-1$ types, there are $U - (i-1)$ unseen types out of $U$ total. Each drawn coupon is new with probability $p_i = \frac{U - (i-1)}{U}$ . Thus $W_i$ follows a geometric distribution with success probability $p_i$ .

(b) For a geometric RV with success probability $p_i$ :

E[W_i] = \frac{1}{p_i} = \frac{U}{U - (i-1)}

By linearity of expectation:

E[W] = \sum_{i=1}^{U} E[W_i] = U \sum_{j=1}^{U} \frac{1}{j} = U \cdot H_U

For large $U$ : $E[W] \approx U \ln(U)$ .

(c) For $U = 365$ :

E[W] = 365 \cdot H_{365} \approx 365 \cdot (\ln 365 + 0.577) \approx 2364 \text{ coupons}

Problem 17: Applying Concentration Inequalities to Collision Analysis

In FKS hashing, we hash $n$ keys into $n$ buckets. Let $b_i$ be the number of keys in bucket $i$ , and let $C = \sum_i b_i(b_i-1)$ be the total number of collisions (ordered pairs in the same bucket). We know $C = \frac{\sum b_i^2 - n}{2}$ .

(a) When hashing $n$ keys uniformly into $n$ buckets, compute $E[C]$ by considering: for each key $x_j$ , how many other keys collide with $x_j$ in expectation?

(b) Use Markov’s inequality to bound $\Pr(\sum b_i^2 > 4n)$ given that $E[\sum b_i^2] < 2n$ .

(c) Suppose instead we use Chebyshev. If $\text{Var}(\sum b_i^2) = O(n^2)$ , can Chebyshev give a strong enough bound to show the probability is less than 1/2? Discuss why or why not.

Solution

(a) For key $x_j$ , there are $n-1$ other keys. Each collides with $x_j$ with probability $1/n$ . So:

E[\text{collisions with } x_j] = (n-1) \cdot \frac{1}{n} < 1

Summing over all $n$ keys: $E[C] < n$ .

(b) Using $C = \frac{\sum b_i^2 - n}{2}$ and $E[C] < n$ , we have $E[\sum b_i^2] < 2n$ .

By Markov:

\Pr(\sum b_i^2 > 4n) \le \frac{E[\sum b_i^2]}{4n} < \frac{2n}{4n} = \frac{1}{2}

(c) Chebyshev would give:

\Pr(|\sum b_i^2 - 2n| \geq 2n) \le \frac{\text{Var}(\sum b_i^2)}{(2n)^2} = \frac{O(n^2)}{4n^2} = O(1)

This is not strong enough. This illustrates why Markov’s bound is actually better for this problem: the deviation we care about ( $4n$ vs. mean of $2n$ ) is a factor of 2, and Markov is perfectly calibrated for this regime.

Problem 18: Linear Probing Basics

Linear probing resolves collisions by placing keys in consecutive empty slots: if slot $h(x)$ is occupied, try $h(x) + 1$ , $h(x) + 2$ , etc.

(a) Describe the insertion and query operations in linear probing.

(b) What is “primary clustering,” and why does it occur?

(c) Using Donald Knuth’s analysis, what is the expected query time for a successful lookup with load factor $\alpha = 1/5$ ?

Solution

(a) Insertion of key $x$ :

Compute $h(x)$
If slot $h(x)$ is empty, insert $x$ there
Otherwise, probe $h(x) + 1, h(x) + 2, \ldots$ until finding an empty slot

Query for key $q$ :

Compute $h(q)$
Probe $h(q), h(q) + 1, h(q) + 2, \ldots$ until finding $q$ or an empty slot

(b) Primary clustering is the formation of long contiguous runs of occupied slots.

Why it occurs: A key $x$ hashes to slot $h(x)$ . If occupied, it’s placed at the next free slot. Any future key $y$ with $h(y)$ in the occupied run probes into it, extending the run further. The run grows like a “snowball.”

(c) By Donald Knuth’s analysis, the expected query time for a successful lookup is:

E[\text{Succ. Query}] = 1 + \frac{1}{1 - \alpha}

With $\alpha = 1/5$ :

E[\text{Succ. Query}] = 1 + \frac{1}{4/5} = 1 + 1.25 = 2.25

Problem 19: Linear Probing - Detailed Analysis

In a hash table with $m$ cells and $n$ keys (load factor $\alpha = n/m$ ), let $L_i$ be the length of chain $i$ .

(a) What is $E[L_i]$ ?

(b) For $\alpha = 0.5$ and $m = 1000$ , use Chernoff bounds to bound the probability that any chain exceeds length 3.

(c) Using Chebyshev’s inequality, explain why a union bound on the variance would not give a tight bound for this problem.

Solution

(a) Each key independently hashes to cell $i$ with probability $1/m$ . Thus $L_i \sim \text{Binomial}(n, 1/m)$ .

E[L_i] = n \cdot \frac{1}{m} = \alpha

(b) For $\alpha = 0.5$ and $m = 1000$ , we have $n = 500$ keys. For a single cell with $E[L_i] = 0.5$ :

\Pr(L_i > 3) = \Pr(L_i > (1 + \delta) E[L_i]) \text{ where } \delta = 5

By Chernoff:

\Pr(L_i > 3) \le e^{-0.5 \cdot 25/3} \approx 0.015

By union bound over $m = 1000$ cells:

\Pr(\exists \text{ cell } > 3) \le 1000 \cdot 0.015 = 15

(c) For a chain with $L_i \sim \text{Binomial}(n, 1/m)$ , we have $\text{Var}(L_i) \approx 0.5$ . Using Chebyshev to bound $\Pr(L_i > 3)$ :

\Pr(L_i - E[L_i] \geq 2.5) \le \frac{0.5}{6.25} \approx 0.08

By union bound:

\Pr(\exists \text{ cell } > 3) \le 1000 \cdot 0.08 = 80

This is completely useless (bound > 1). Chebyshev only uses expectation and variance, while Chernoff exploits the structure of sums of independent Bernoullis, giving exponentially tighter bounds.

Problem 20: FKS Hashing

FKS hashing uses a two-level approach: a primary hash table with secondary hash tables in each bucket.

(a) Why does FKS guarantee $O(1)$ worst-case lookup time?

(b) What is the expected preprocessing time for FKS, and how does it compare to hashing with chaining?

Solution

(a) FKS uses:

Primary table: Hash $n$ keys into $m = n$ buckets using function $h_1$
Secondary tables: For each bucket $i$ with $b_i$ keys, use a secondary table of size $m_i = 2b_i^2$ with function $h_2^{(i)}$

This guarantees zero collisions at the secondary level with positive probability, so:

Query = 1 probe in primary + 1 probe in secondary = $O(1)$ worst-case

(b)

Hashing with chaining: $O(n)$ worst-case preprocessing (insert each key once)
FKS: $O(n)$ expected preprocessing

The difference: FKS may need to retry hash functions if the choice is unlucky (i.e., $\sum b_i^2 > 4n$ ), but by Markov’s inequality, the expected number of retries is constant.

Problem 21: HWC vs FKS Hashing

Compare Hashing with Chaining (HWC) and FKS Hashing.

(a) What are the query time guarantees for each?

(b) What are the preprocessing time guarantees for each?

(c) When would you choose HWC over FKS, and vice versa?

Solution

(a) Query time:

HWC: $E[Q] = O(1 + \alpha)$ (expected). Worst-case is $O(n)$ .
FKS: $O(1)$ worst-case (exactly 2 probes)

(b) Preprocessing time:

HWC: $O(n)$ always (insert each key once)
FKS: $O(n)$ expected (may retry hash functions, but expected constant retries)

(c)

Choose HWC when: Simplicity and ease of implementation matter, insertions and deletions are frequent, the load factor might be high, or constant factors are important.

Choose FKS when: Worst-case query time guarantees are critical, the set is static or rarely updated, or you need theoretical guarantees.

Problem 22: Bloom Filters - Basic Operations

A Bloom filter uses $m$ bits and $k$ hash functions to test set membership with false positives but no false negatives.

(a) Describe the insert and query operations.

(b) Why is the false positive probability approximately $\left(1 - e^{-kn/m}\right)^k$ ?

(c) What is the optimal number of hash functions $k$ to minimize false positive rate?

Solution

(a) Insert element $x$ into set $S$ :

Compute $h_1(x), h_2(x), \ldots, h_k(x)$
Set bits at these positions to $1$

Query whether $x \in S$ :

Compute $h_1(x), h_2(x), \ldots, h_k(x)$
If all bits at these positions are $1$ , return “possibly in $S$ ”
If any bit is $0$ , return “definitely not in $S$ ”

(b) After inserting $n$ elements with $k$ hash functions:

The probability a specific bit is not set by all $kn$ hash function invocations:

\Pr[\text{bit is } 0] = \left(1 - \frac{1}{m}\right)^{kn} \approx e^{-kn/m}

So the probability a bit is set to $1$ :

\Pr[\text{bit is } 1] = 1 - e^{-kn/m}

A false positive occurs when all $k$ queried bits are $1$ :

\Pr[\text{false positive}] = \left(1 - e^{-kn/m}\right)^k

(c) To minimize the false positive probability, take the derivative with respect to $k$ and set it to zero. The optimal value is:

k = \frac{m}{n} \ln 2

At this optimal $k$ , the false positive probability becomes:

\left(\frac{1}{2}\right)^k

Problem 23: Bloom Filters - Comprehensive Analysis

A Bloom filter uses $m = 100,000$ bits and $k = 7$ hash functions to store a set of $n = 10,000$ elements.

(a) What is the expected number of bits set to 1?

(b) Estimate the false positive probability.

(c) If you want to reduce the false positive rate to $\varepsilon = 0.001$ , how should you adjust $m$ and/or $k$ ?

Solution

(a) Each of the $kn = 70,000$ hash function evaluations sets a bit to 1.

The probability a given bit remains 0 is:

\Pr(\text{bit } 0) = (1 - 1/m)^{kn} \approx e^{-kn/m} = e^{-0.7} \approx 0.497

Expected number of bits set to 1:

E[\text{bits set}] = m \cdot (1 - e^{-kn/m}) \approx 50,300

(b) For a query of a non-member, all $k$ hash function positions must coincidentally be set to 1:

\Pr(\text{false positive}) = (1 - e^{-kn/m})^k = (0.503)^7 \approx 0.69\%

(c) From the formula $\Pr(\text{FP}) = (1 - e^{-kn/m})^k \le \varepsilon$ :

For fixed $n = 10,000$ , to achieve $\varepsilon = 0.001$ , we need to increase $m$ .

At the optimal $k = \frac{m}{n} \ln 2$ , solving gives:

m \approx 184,000 \text{ bits}, \quad k \approx 13 \text{ hash functions}

Problem 24: Bloom Filter False Positive Dynamics

In Bloom filters, how does the false positive rate change as more elements are inserted?

(a) As $n$ increases, does the false positive probability increase or decrease?

(b) If you fix $m$ (the number of bits) and insert more and more elements, what happens to the false positive rate?

(c) How should you adjust $m$ and $k$ as $n$ grows to maintain a constant false positive rate?

Solution

(a) The false positive probability increases as $n$ increases.

With $n$ elements inserted and optimal $k = \frac{m}{n} \ln 2$ :

\Pr[\text{false positive}] = \left(\frac{1}{2}\right)^k = \left(\frac{1}{2}\right)^{(m/n) \ln 2}

As $n$ increases, the exponent $(m/n) \ln 2$ decreases (assuming fixed $m$ ), so the base $(1/2)$ raised to a smaller power gives a larger result.

(b) If you fix $m$ and keep inserting elements:

More elements $\to$ more bits are set to $1$
Higher fraction of $1$ bits $\to$ higher chance all $k$ queried bits are $1$
False positive rate increases monotonically

(c) To maintain a constant false positive rate $\varepsilon$ as $n$ grows:

Solution: Increase $m$ proportionally to $n$ while adjusting $k$ to maintain the target rate.

In practice:

Space: Use $O(n \log(1/\varepsilon))$ bits (linear in $n$ , logarithmic in target error)
Hash functions: $k = \frac{m}{n} \ln 2$ (adjusts automatically as $m/n$ is fixed)

Problem 25: Linear Probing vs Bloom Filters

Compare Linear Probing and Bloom Filters.

(a) What is the main difference in what they do?

(b) In terms of space, which is more efficient for very large sets?

(c) When would you use Linear Probing vs Bloom Filter?

Solution

(a) Linear Probing:

Solves the dictionary problem: store $n$ keys and support insert, delete, lookup with exact answers
Uses $O(n)$ space
Gives exact answers

Bloom Filter:

Solves the approximate membership problem: test if element is in set with false positives
Uses $O(n \log(1/\varepsilon))$ space
Gives probabilistic answers (no false negatives, but false positives allowed)

(b) Space comparison: For a set of $n$ elements from a large universe:

Linear Probing: Must store each full key. Space is $O(n \log U)$ bits
Bloom Filter: Uses only $O(n \log(1/\varepsilon))$ bits, independent of universe size

For large $U$ , Bloom filter is exponentially more space-efficient.

(c)

Use Linear Probing when: You need exact membership testing, you need to support insertions and deletions, the set fits in memory, or false positives are unacceptable.

Use Bloom Filter when: Space is critical, false positives are tolerable, you only need membership testing (no need to retrieve associated data), or the universe is very large.

Problem 26: FKS vs Bloom Filters

FKS Hashing vs. Bloom Filters: which should you use for a membership test on $n$ elements?

(a) Compare the space used by each.

(b) Compare the query time of each.

(c) What are the key tradeoffs?

Solution

(a) Space:

FKS: $O(n)$ words (each word stores a key). Total: $O(n \log U)$ bits
Bloom Filter: $O(n \log(1/\varepsilon))$ bits, independent of $U$

For universe size $U = 2^{32}$ and error rate $\varepsilon = 0.01$ :

FKS: $32n$ bits
Bloom: $\approx 10n$ bits (3× smaller)

(b) Query time:

FKS: $O(1)$ worst-case (exactly 2 probes)
Bloom: $O(k)$ where $k = O(\log(1/\varepsilon))$ hash function evaluations

(c) Tradeoffs:

Aspect	FKS	Bloom
Exactness	Exact answers	False positives allowed
Space	$O(n \log U)$ bits	$O(n \log(1/\varepsilon))$ bits
Speed	$O(1)$ worst-case	$O(\log(1/\varepsilon))$ hashes
Dynamism	Hard to insert/delete	Cannot delete
Use case	Exact dictionary	Space-critical membership

Decision rule:

Use FKS for exact membership when you can afford the space
Use Bloom when space is critical and false positives are acceptable