Skip to content

CSCI 328 Midterm Example 1

The exam has 5 questions, adding up to 120 points. Max score is 100. Please do the problems in order.

Chernoff Bound: Let {Xi}i=1n\{X_i\}_{i=1}^n be i.i.d. Bernoulli random variables and X=i=1nXiX = \sum_{i=1}^n X_i. Let μ=E[X]\mu = E[X]. Then for any 0<δ10 < \delta \le 1:

Pr(X(1+δ)μ)eμδ2/3Pr(X \ge (1+\delta)\mu) \le e^{-\mu\delta^2/3}
  1. Let uu and vv be two random nn-bitvectors, i.e., uu and vv are bitvectors of length nn such that each position is equally likely to be 0 or 1.

    • (15 points) The Hamming distance between two bitvectors is defined as the number of positions in which they differ. For example, HD((1,0,0),(1,1,1))=2HD((1,0,0), (1,1,1)) = 2. Calculate E(u,v)E(u,v), i.e., the expected Hamming distance between vv and uu. Using Chebyshev, give an upper bound on the probability that the Hamming distance between uu and vv is larger than 3n/43n/4 or less than n/4n/4.

     

    • (20 points) The dot product of two vectors uu and vv is defined as uTv=i=1nu[i]v[i]u^T v = \sum_{i=1}^n u[i]v[i]. Calculate E(uTv)E(u^T v), i.e., the expected dot product of uu and vv. Using Chernoff, give an upper bound on the probability that the dot product of uu and vv is larger than n/3n/3.
    Solution

    Part (a): Hamming Distance

    Let’s start by understanding what we’re looking for. The Hamming distance counts how many positions differ between the two vectors. Let’s denote it as HDHD.

    Finding the expected Hamming distance:

    Here’s the key insight: we can think of the total Hamming distance as the sum of indicator variables for each position. At position ii, the bits differ if u[i]v[i]u[i] \neq v[i].

    Let’s calculate the probability that position ii differs:

    • If u[i]=0u[i] = 0 and v[i]=1v[i] = 1: probability is (1/2)(1/2)=1/4(1/2)(1/2) = 1/4
    • If u[i]=1u[i] = 1 and v[i]=0v[i] = 0: probability is (1/2)(1/2)=1/4(1/2)(1/2) = 1/4
    P(u[i]v[i])=14+14=12P(u[i] \neq v[i]) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}

    Since we have nn independent positions, and at each position there’s a 1/21/2 chance of differing:

    E[HD]=n12=n2E[HD] = n \cdot \frac{1}{2} = \frac{n}{2}

    Finding the variance:

    To use Chebyshev’s inequality, we need the variance. Think of each position as an independent trial: position ii contributes 1 to the Hamming distance (if it differs) or 0 (if it matches). Each position is a Bernoulli(1/2) random variable.

    For a Bernoulli(1/2) random variable, the variance is (1/2)(1/2)=1/4(1/2)(1/2) = 1/4. Since all nn positions are independent:

    Var(HD)=n14=n4\text{Var}(HD) = n \cdot \frac{1}{4} = \frac{n}{4}

    Applying Chebyshev’s inequality:

    We want to bound the probability that HDHD deviates significantly from its expected value. Specifically, we want the probability that HDHD is either very large (>3n/4> 3n/4) or very small (<n/4< n/4). Notice that both of these are exactly n/4n/4 away from the mean:

    P(HD>3n/4 or HD<n/4)=P(HDn/2>n/4)P(HD > 3n/4 \text{ or } HD < n/4) = P(|HD - n/2| > n/4)

    By Chebyshev’s inequality:

    P(HDn/2>n/4)Var(HD)(n/4)2=n/4n2/16=4n\begin{align*} P(|HD - n/2| > n/4) &\leq \frac{\text{Var}(HD)}{(n/4)^2} \\ &= \frac{n/4}{n^2/16} = \frac{4}{n} \end{align*}

    Part (b): Dot Product

    The dot product uTv=i=1nu[i]v[i]u^T v = \sum_{i=1}^n u[i]v[i] sums the products of corresponding bits. Let’s think about when u[i]v[i]=1u[i]v[i] = 1: this happens only when both u[i]=1u[i] = 1 AND v[i]=1v[i] = 1.

    Finding the expected dot product:

    Since each bit is independently 0 or 1 with probability 1/2:

    • Probability that u[i]=1u[i] = 1 is 1/21/2
    • Probability that v[i]=1v[i] = 1 is 1/21/2
    • These are independent, so P(u[i]=1 and v[i]=1)=(1/2)(1/2)=1/4P(u[i] = 1 \text{ and } v[i] = 1) = (1/2)(1/2) = 1/4

    Each position contributes an expected value of 1/41/4 to the dot product:

    E[uTv]=n14=n4E[u^T v] = n \cdot \frac{1}{4} = \frac{n}{4}

    Applying the Chernoff bound:

    We want to bound P(uTv>n/3)P(u^T v > n/3). The Chernoff bound is useful here because it gives us exponentially small probabilities.

    The Chernoff bound has the form: P(X(1+δ)μ)eμδ2/3P(X \geq (1 + \delta)\mu) \leq e^{-\mu\delta^2/3}, where μ=E[X]\mu = E[X].

    We need to express our target n/3n/3 in this form. We have μ=n/4\mu = n/4, so:

    n/3=(1+δ)n4n/3 = (1 + \delta) \cdot \frac{n}{4}

    Solving for δ\delta:

    1+δ=n/3n/4=43δ=13\begin{align*} 1 + \delta &= \frac{n/3}{n/4} = \frac{4}{3} \\ \delta &= \frac{1}{3} \end{align*}

    Now we apply the Chernoff bound with μ=n/4\mu = n/4 and δ=1/3\delta = 1/3:

    P(uTv>n/3)eμδ2/3=e(n/4)(1/3)2/3=e(n/4)(1/27)=en/108\begin{align*} P(u^T v > n/3) &\leq e^{-\mu\delta^2/3} \\ &= e^{-(n/4)(1/3)^2/3} \\ &= e^{-(n/4)(1/27)} \\ &= e^{-n/108} \end{align*}

    This exponential bound tells us that as nn grows, the probability of the dot product exceeding n/3n/3 decreases exponentially—very reassuring!

  1. (15 points) Consider the following wheel-of-fortune game: A player bets on one of the numbers 1 through 6. Three dice are then rolled, and if the number bet by the player appears ii times, i=1,2,3i = 1, 2, 3 then the player wins $4i2\$4i^2 (note that it doesn’t matter what the actual number the player bet is, only how many times it appears). If the number bet by the player does not appear on any of the dice, then the player loses $10\$10. Decide if this game is fair to the player by calculating expected winnings from one play of the game.

    Solution

    Let XX denote the number of times the bet number appears on the three dice. We need to find the probability distribution of XX, since the payout depends on XX.

    Setting up the probabilities:

    For each die:

    • Probability the bet number appears: 1/61/6
    • Probability it doesn’t appear: 5/65/6

    Let’s calculate P(X=k)P(X = k) for each possible value:

    When X=0X = 0 (number doesn’t appear on any die):

    P(X=0)=(56)3=125216P(X = 0) = \left(\frac{5}{6}\right)^3 = \frac{125}{216}

    When X=1X = 1 (appears on exactly one die):

    P(X=1)=316(56)2=75216\begin{align*} P(X = 1) &= 3 \cdot \frac{1}{6} \cdot \left(\frac{5}{6}\right)^2 \\ &= \frac{75}{216} \end{align*}

    When X=2X = 2 (appears on exactly two dice):

    P(X=2)=3(16)256=15216\begin{align*} P(X = 2) &= 3 \cdot \left(\frac{1}{6}\right)^2 \cdot \frac{5}{6} \\ &= \frac{15}{216} \end{align*}

    When X=3X = 3 (appears on all three dice):

    P(X=3)=(16)3=1216P(X = 3) = \left(\frac{1}{6}\right)^3 = \frac{1}{216}

    Computing expected winnings:

    Now recall the payout structure:

    • If X=0X = 0: lose 1010 dollars
    • If X=1X = 1: win 4(1)2=44(1)^2 = 4 dollars
    • If X=2X = 2: win 4(2)2=164(2)^2 = 16 dollars
    • If X=3X = 3: win 4(3)2=364(3)^2 = 36 dollars

    By the definition of expectation:

    E[winnings]=754+1516216+3612510216=300+240+361250216=3371083.12\begin{align*} E[\text{winnings}] &= \frac{75 \cdot 4 + 15 \cdot 16}{216} \\ &\quad + \frac{36 - 125 \cdot 10}{216} \\ &= \frac{300 + 240 + 36 - 1250}{216} \\ &= -\frac{337}{108} \approx -3.12 \end{align*}

    Conclusion:

    The expected winnings are negative, approximately -\3.12$ per play. This means that on average, over many plays, a player loses money. Therefore, the game is not fair to the player.

  2. (20 points) Suppose you are given that 1+1/23+1/33+1/43+=1.202051 + 1/2^3 + 1/3^3 + 1/4^3 + \cdots = 1.20205\ldots (the actual value doesn’t matter, just that it is a constant) and that 1+1/22+1/32+1/42+=π2/61 + 1/2^2 + 1/3^2 + 1/4^2 + \cdots = \pi^2/6.

    Using this information, design a random variable XX that can take any value n>1n > 1 (so infinitely many values), such that the expectation of XX is finite, but the variance of XX is infinite.

    Solution

    Designing the random variable:

    The key is to use the given series to control the behavior of expectation and variance. Let’s define:

    P(X=n)=cn3,n=2,3,4,P(X = n) = \frac{c}{n^3}, \quad n = 2, 3, 4, \ldots

    where cc is chosen to make this a valid probability distribution (normalize so probabilities sum to 1).

    Finding the normalization constant:

    For probabilities to sum to 1:

    n=2cn3=1c=1n=21n3\begin{align*} \sum_{n=2}^{\infty} \frac{c}{n^3} &= 1 \\ c &= \frac{1}{\sum_{n=2}^{\infty} \frac{1}{n^3}} \end{align*}

    We’re given that n=11n3=1.20205\sum_{n=1}^{\infty} \frac{1}{n^3} = 1.20205, so:

    n=21n3=1.202051=0.20205\sum_{n=2}^{\infty} \frac{1}{n^3} = 1.20205 - 1 = 0.20205

    Computing the expectation:

    Now let’s check if the expectation is finite:

    E[X]=n=2ncn3=cn=21n2\begin{align*} E[X] &= \sum_{n=2}^{\infty} n \cdot \frac{c}{n^3} \\ &= c \sum_{n=2}^{\infty} \frac{1}{n^2} \end{align*}

    We’re told that n=11n2=π2/61.645\sum_{n=1}^{\infty} \frac{1}{n^2} = \pi^2/6 \approx 1.645. Therefore:

    n=21n2=π261<\sum_{n=2}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6} - 1 < \infty

    Since cc is a finite constant and the sum is finite, E[X]E[X] is finite.

    Computing the second moment:

    For the variance, we need E[X2]E[X^2]:

    E[X2]=n=2n2cn3=cn=21n\begin{align*} E[X^2] &= \sum_{n=2}^{\infty} n^2 \cdot \frac{c}{n^3} \\ &= c \sum_{n=2}^{\infty} \frac{1}{n} \end{align*}

    Here’s the crucial difference: n=21n\sum_{n=2}^{\infty} \frac{1}{n} is the harmonic series, which diverges to infinity. This is a famous result!

    Therefore E[X2]=E[X^2] = \infty, which means:

    Var(X)=E[X2](E[X])2=(finite)2=\begin{align*} \text{Var}(X) &= E[X^2] - (E[X])^2 \\ &= \infty - (\text{finite})^2 = \infty \end{align*}

    We’ve successfully constructed a random variable with finite expectation but infinite variance!

  3. (30 points) Alice and her younger brother Bob play chess. Alice is the better player, and wins any match with probability 0.75. They play a tournament of 200 matches, with ice-cream as the prize. What is the expected number of games that Alice wins?

    Bob wants to negotiate a deal where his chances of getting ice-cream are at least 1e20.8641 - e^{-2} \approx 0.864. He can stipulate a number X<200X < 200, and demand Alice win at least XX games in order to win the tournament. Obviously, Bob can stipulate X=200X = 200, and he will almost surely get ice-cream, but that would be cheating. What is the lowest value of XX where Bob can still get ice-cream with probability at least 1e21 - e^{-2}?

    Solution

    Part 1: Expected Number of Games Alice Wins

    Let YY be the number of games Alice wins. Since YBinomial(200,0.75)Y \sim \text{Binomial}(200, 0.75):

    E[Y]=2000.75=150E[Y] = 200 \cdot 0.75 = 150

    Part 2: Finding the Lowest Value of X

    Bob gets ice-cream if Alice wins fewer than XX games, i.e., if Y<XY < X or equivalently P(YX)e2P(Y \geq X) \leq e^{-2}.

    We want to find the smallest XX such that P(YX)e2P(Y \geq X) \leq e^{-2}.

    Using the Chernoff bound, for Y(1+δ)μY \geq (1 + \delta) \mu where μ=150\mu = 150:

    P(Y(1+δ)μ)eμδ2/3P(Y \geq (1 + \delta)\mu) \leq e^{-\mu\delta^2/3}

    We want eμδ2/3e2e^{-\mu\delta^2/3} \leq e^{-2}, which requires:

    μδ232    δ26μ=6150=125\begin{align*} \frac{\mu\delta^2}{3} \geq 2 &\implies \delta^2 \geq \frac{6}{\mu} \\ &= \frac{6}{150} = \frac{1}{25} \end{align*}

    Therefore δ1/5\delta \geq 1/5.

    With δ=1/5\delta = 1/5:

    X=(1+δ)μ=(1+15)150=65150=180\begin{align*} X &= (1 + \delta)\mu = \left(1 + \frac{1}{5}\right) \cdot 150 \\ &= \frac{6}{5} \cdot 150 = 180 \end{align*}

    We verify: P(Y180)e150(1/5)2/3=e150(1/25)/3=e2P(Y \geq 180) \leq e^{-150 \cdot (1/5)^2/3} = e^{-150 \cdot (1/25) / 3} = e^{-2}.

    Therefore, the lowest value Bob can stipulate is X=180X = 180.

  4. (20 points) What is the main advantage/disadvantage of using Cuckoo Hashing over Hashing with Chaining? What is the main advantage/disadvantage of using Hashing with Chaining over FKS hashing?