CSCI 328 Midterm Example 1

CSCI 381/780 Midterm

The exam has 5 questions, adding up to 120 points. Max score is 100. Please do the problems in order.

Chernoff Bound: Let $\{X_i\}_{i=1}^n$ be i.i.d. Bernoulli random variables and $X = \sum_{i=1}^n X_i$ . Let $\mu = E[X]$ . Then for any $0 < \delta \le 1$ :

Pr(X \ge (1+\delta)\mu) \le e^{-\mu\delta^2/3}

Let $u$ and $v$ be two random $n$ -bitvectors, i.e., $u$ and $v$ are bitvectors of length $n$ such that each position is equally likely to be 0 or 1.
- (15 points) The Hamming distance between two bitvectors is defined as the number of positions in which they differ. For example, $HD((1,0,0), (1,1,1)) = 2$ . Calculate $E(u,v)$ , i.e., the expected Hamming distance between $v$ and $u$ . Using Chebyshev, give an upper bound on the probability that the Hamming distance between $u$ and $v$ is larger than $3n/4$ or less than $n/4$ .
- (20 points) The dot product of two vectors $u$ and $v$ is defined as $u^T v = \sum_{i=1}^n u[i]v[i]$ . Calculate $E(u^T v)$ , i.e., the expected dot product of $u$ and $v$ . Using Chernoff, give an upper bound on the probability that the dot product of $u$ and $v$ is larger than $n/3$ .
Solution

Part (a): Hamming Distance

Let’s start by understanding what we’re looking for. The Hamming distance counts how many positions differ between the two vectors. Let’s denote it as $HD$ .

Finding the expected Hamming distance:

Here’s the key insight: we can think of the total Hamming distance as the sum of indicator variables for each position. At position $i$ , the bits differ if $u[i] \neq v[i]$ .

Let’s calculate the probability that position $i$ differs:
- If $u[i] = 0$ and $v[i] = 1$ : probability is $(1/2)(1/2) = 1/4$
- If $u[i] = 1$ and $v[i] = 0$ : probability is $(1/2)(1/2) = 1/4$
$P(u[i] \neq v[i]) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$
Since we have $n$ independent positions, and at each position there’s a $1/2$ chance of differing:
$E[HD] = n \cdot \frac{1}{2} = \frac{n}{2}$
Finding the variance:

To use Chebyshev’s inequality, we need the variance. Think of each position as an independent trial: position $i$ contributes 1 to the Hamming distance (if it differs) or 0 (if it matches). Each position is a Bernoulli(1/2) random variable.

For a Bernoulli(1/2) random variable, the variance is $(1/2)(1/2) = 1/4$ . Since all $n$ positions are independent:
$\text{Var}(HD) = n \cdot \frac{1}{4} = \frac{n}{4}$
Applying Chebyshev’s inequality:

We want to bound the probability that $HD$ deviates significantly from its expected value. Specifically, we want the probability that $HD$ is either very large ( $> 3n/4$ ) or very small ( $< n/4$ ). Notice that both of these are exactly $n/4$ away from the mean:
$P(HD > 3n/4 \text{ or } HD < n/4) = P(|HD - n/2| > n/4)$
By Chebyshev’s inequality:
$\begin{align*} P(|HD - n/2| > n/4) &\leq \frac{\text{Var}(HD)}{(n/4)^2} \\ &= \frac{n/4}{n^2/16} = \frac{4}{n} \end{align*}$
Part (b): Dot Product

The dot product $u^T v = \sum_{i=1}^n u[i]v[i]$ sums the products of corresponding bits. Let’s think about when $u[i]v[i] = 1$ : this happens only when both $u[i] = 1$ AND $v[i] = 1$ .

Finding the expected dot product:

Since each bit is independently 0 or 1 with probability 1/2:
- Probability that $u[i] = 1$ is $1/2$
- Probability that $v[i] = 1$ is $1/2$
- These are independent, so $P(u[i] = 1 \text{ and } v[i] = 1) = (1/2)(1/2) = 1/4$
Each position contributes an expected value of $1/4$ to the dot product:
$E[u^T v] = n \cdot \frac{1}{4} = \frac{n}{4}$
Applying the Chernoff bound:

We want to bound $P(u^T v > n/3)$ . The Chernoff bound is useful here because it gives us exponentially small probabilities.

The Chernoff bound has the form: $P(X \geq (1 + \delta)\mu) \leq e^{-\mu\delta^2/3}$ , where $\mu = E[X]$ .

We need to express our target $n/3$ in this form. We have $\mu = n/4$ , so:
$n/3 = (1 + \delta) \cdot \frac{n}{4}$
Solving for $\delta$ :
$\begin{align*} 1 + \delta &= \frac{n/3}{n/4} = \frac{4}{3} \\ \delta &= \frac{1}{3} \end{align*}$
Now we apply the Chernoff bound with $\mu = n/4$ and $\delta = 1/3$ :
$\begin{align*} P(u^T v > n/3) &\leq e^{-\mu\delta^2/3} \\ &= e^{-(n/4)(1/3)^2/3} \\ &= e^{-(n/4)(1/27)} \\ &= e^{-n/108} \end{align*}$
This exponential bound tells us that as $n$ grows, the probability of the dot product exceeding $n/3$ decreases exponentially—very reassuring!

(15 points) Consider the following wheel-of-fortune game: A player bets on one of the numbers 1 through 6. Three dice are then rolled, and if the number bet by the player appears $i$ times, $i = 1, 2, 3$ then the player wins $\$4i^2$ (note that it doesn’t matter what the actual number the player bet is, only how many times it appears). If the number bet by the player does not appear on any of the dice, then the player loses $\$10$ . Decide if this game is fair to the player by calculating expected winnings from one play of the game.
Solution

Let $X$ denote the number of times the bet number appears on the three dice. We need to find the probability distribution of $X$ , since the payout depends on $X$ .

Setting up the probabilities:

For each die:
- Probability the bet number appears: $1/6$
- Probability it doesn’t appear: $5/6$
Let’s calculate $P(X = k)$ for each possible value:

When $X = 0$ (number doesn’t appear on any die):
$P(X = 0) = \left(\frac{5}{6}\right)^3 = \frac{125}{216}$
When $X = 1$ (appears on exactly one die):
$\begin{align*} P(X = 1) &= 3 \cdot \frac{1}{6} \cdot \left(\frac{5}{6}\right)^2 \\ &= \frac{75}{216} \end{align*}$
When $X = 2$ (appears on exactly two dice):
$\begin{align*} P(X = 2) &= 3 \cdot \left(\frac{1}{6}\right)^2 \cdot \frac{5}{6} \\ &= \frac{15}{216} \end{align*}$
When $X = 3$ (appears on all three dice):
$P(X = 3) = \left(\frac{1}{6}\right)^3 = \frac{1}{216}$
Computing expected winnings:

Now recall the payout structure:
- If $X = 0$ : lose $10$ dollars
- If $X = 1$ : win $4(1)^2 = 4$ dollars
- If $X = 2$ : win $4(2)^2 = 16$ dollars
- If $X = 3$ : win $4(3)^2 = 36$ dollars
By the definition of expectation:
$\begin{align*} E[\text{winnings}] &= \frac{75 \cdot 4 + 15 \cdot 16}{216} \\ &\quad + \frac{36 - 125 \cdot 10}{216} \\ &= \frac{300 + 240 + 36 - 1250}{216} \\ &= -\frac{337}{108} \approx -3.12 \end{align*}$
Conclusion:

The expected winnings are negative, approximately $-\$ 3.12$ per play. This means that on average, over many plays, a player loses money. Therefore, the game is not fair to the player.
(20 points) Suppose you are given that $1 + 1/2^3 + 1/3^3 + 1/4^3 + \cdots = 1.20205\ldots$ (the actual value doesn’t matter, just that it is a constant) and that $1 + 1/2^2 + 1/3^2 + 1/4^2 + \cdots = \pi^2/6$ .

Using this information, design a random variable $X$ that can take any value $n > 1$ (so infinitely many values), such that the expectation of $X$ is finite, but the variance of $X$ is infinite.

Solution

Designing the random variable:

The key is to use the given series to control the behavior of expectation and variance. Let’s define:
$P(X = n) = \frac{c}{n^3}, \quad n = 2, 3, 4, \ldots$
where $c$ is chosen to make this a valid probability distribution (normalize so probabilities sum to 1).

Finding the normalization constant:

For probabilities to sum to 1:
$\begin{align*} \sum_{n=2}^{\infty} \frac{c}{n^3} &= 1 \\ c &= \frac{1}{\sum_{n=2}^{\infty} \frac{1}{n^3}} \end{align*}$
We’re given that $\sum_{n=1}^{\infty} \frac{1}{n^3} = 1.20205$ , so:
$\sum_{n=2}^{\infty} \frac{1}{n^3} = 1.20205 - 1 = 0.20205$
Computing the expectation:

Now let’s check if the expectation is finite:
$\begin{align*} E[X] &= \sum_{n=2}^{\infty} n \cdot \frac{c}{n^3} \\ &= c \sum_{n=2}^{\infty} \frac{1}{n^2} \end{align*}$
We’re told that $\sum_{n=1}^{\infty} \frac{1}{n^2} = \pi^2/6 \approx 1.645$ . Therefore:
$\sum_{n=2}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6} - 1 < \infty$
Since $c$ is a finite constant and the sum is finite, $E[X]$ is finite.

Computing the second moment:

For the variance, we need $E[X^2]$ :
$\begin{align*} E[X^2] &= \sum_{n=2}^{\infty} n^2 \cdot \frac{c}{n^3} \\ &= c \sum_{n=2}^{\infty} \frac{1}{n} \end{align*}$
Here’s the crucial difference: $\sum_{n=2}^{\infty} \frac{1}{n}$ is the harmonic series, which diverges to infinity. This is a famous result!

Therefore $E[X^2] = \infty$ , which means:
$\begin{align*} \text{Var}(X) &= E[X^2] - (E[X])^2 \\ &= \infty - (\text{finite})^2 = \infty \end{align*}$
We’ve successfully constructed a random variable with finite expectation but infinite variance!
(30 points) Alice and her younger brother Bob play chess. Alice is the better player, and wins any match with probability 0.75. They play a tournament of 200 matches, with ice-cream as the prize. What is the expected number of games that Alice wins?

Bob wants to negotiate a deal where his chances of getting ice-cream are at least $1 - e^{-2} \approx 0.864$ . He can stipulate a number $X < 200$ , and demand Alice win at least $X$ games in order to win the tournament. Obviously, Bob can stipulate $X = 200$ , and he will almost surely get ice-cream, but that would be cheating. What is the lowest value of $X$ where Bob can still get ice-cream with probability at least $1 - e^{-2}$ ?

Solution

Part 1: Expected Number of Games Alice Wins

Let $Y$ be the number of games Alice wins. Since $Y \sim \text{Binomial}(200, 0.75)$ :
$E[Y] = 200 \cdot 0.75 = 150$
Part 2: Finding the Lowest Value of X

Bob gets ice-cream if Alice wins fewer than $X$ games, i.e., if $Y < X$ or equivalently $P(Y \geq X) \leq e^{-2}$ .

We want to find the smallest $X$ such that $P(Y \geq X) \leq e^{-2}$ .

Using the Chernoff bound, for $Y \geq (1 + \delta) \mu$ where $\mu = 150$ :
$P(Y \geq (1 + \delta)\mu) \leq e^{-\mu\delta^2/3}$
We want $e^{-\mu\delta^2/3} \leq e^{-2}$ , which requires:
$\begin{align*} \frac{\mu\delta^2}{3} \geq 2 &\implies \delta^2 \geq \frac{6}{\mu} \\ &= \frac{6}{150} = \frac{1}{25} \end{align*}$
Therefore $\delta \geq 1/5$ .

With $\delta = 1/5$ :
$\begin{align*} X &= (1 + \delta)\mu = \left(1 + \frac{1}{5}\right) \cdot 150 \\ &= \frac{6}{5} \cdot 150 = 180 \end{align*}$
We verify: $P(Y \geq 180) \leq e^{-150 \cdot (1/5)^2/3} = e^{-150 \cdot (1/25) / 3} = e^{-2}$ .

Therefore, the lowest value Bob can stipulate is $X = 180$ .
(20 points) What is the main advantage/disadvantage of using Cuckoo Hashing over Hashing with Chaining? What is the main advantage/disadvantage of using Hashing with Chaining over FKS hashing?

CSCI 328 Midterm Example 1

CSCI 381/780 Midterm

Part (a): Hamming Distance

Part (b): Dot Product

Part 1: Expected Number of Games Alice Wins

Part 2: Finding the Lowest Value of X