Skip to content

Lecture 13 on 03/11/2026 - Streaming: Approximate Counting; Epsilon-Delta Guarantee; Morris Counter

Counting Problem: Given a stream S={x1,,xn}S = \{x_1, \ldots, x_n\}, how many keys in total have appeared so far?

For this problem, we simply count nn.

Naive Approach: Increment a counter each time. At the end, the counter stores C=nC = n, which requires logn\log n bits of space.

Question: Can we do better? For example, can we use loglogn\log \log n space?

Application: Tracking hits to Wikipedia pages. We cannot track counts exactly with fewer than logn\log n bits. However, for most applications, we don’t need exact values of nn. We’ll define ε\varepsilon to be the relative error:

X^XXε\frac{|\hat{X} - X|}{X} \le \varepsilon

In general, if we want to estimate some quantity Q(s)Q(s) from a stream SS, we define a guarantee for the random variable estimate Q^(s)\hat{Q}(s).

Definition: A random variable Q^(s)\hat{Q}(s) has an (ε,δ)(\varepsilon, \delta)-guarantee if:

Pr(Q^(s)Q(s)Q(s)ε)1δ\Pr\left(\frac{|\hat{Q}(s) - Q(s)|}{Q(s)} \le \varepsilon\right) \ge 1 - \delta

Where:

  • ε\varepsilon: accuracy or relative error
  • δ\delta: failure probability

This can equivalently be written as:

Pr(Q^QεQ)1δ\Pr(|\hat{Q} - Q| \le \varepsilon Q) \ge 1 - \delta

All algorithms in the streaming section will aim to achieve an (ε,δ)(\varepsilon, \delta)-guarantee.

Approximate Counting with (ε,δ)-Guarantee

Section titled “Approximate Counting with (ε,δ)-Guarantee”

Goal: We want an estimate Z^\hat{Z} of nn, such that Z^\hat{Z} has an (ε,δ)(\varepsilon, \delta)-guarantee:

Pr(Z^nnε)1δ\Pr\left(\frac{|\hat{Z} - n|}{n} \le \varepsilon\right) \ge 1 - \delta

Equivalently:

Pr(Z^nεn)1δ\Pr(|\hat{Z} - n| \le \varepsilon n) \ge 1 - \delta

Goal: We want to remember an estimate Z^\hat{Z} of nn with an (ε,δ)(\varepsilon, \delta)-guarantee.

Algorithm:

  1. Start with a counter C0=0C_0 = 0
  2. When a key xi+1x_{i+1} appears, increment CC by 1 with probability 12Ci\frac{1}{2^{C_i}}; otherwise keep CC unchanged
  3. At the end, when asked for the count, return 2Cn12^{C_n} - 1

Theorem:

E(2Cn1)=nE(2^{C_n} - 1) = n Var(2Cn)n22\text{Var}(2^{C_n}) \approx \frac{n^2}{2}

Steps to Obtain an (ε,δ)-Guarantee Algorithm

Section titled “Steps to Obtain an (ε,δ)-Guarantee Algorithm”

Suppose we want to estimate a quantity QQ.

Step 1: Find an unbiased estimator

Find an estimate Q^\hat{Q} such that E(Q^)=QE(\hat{Q}) = Q.

Step 2: Compute variance

Compute Var(Q^)\text{Var}(\hat{Q}).

Step 3: Boost (run multiple copies)

Run tt independent copies of Q^\hat{Q}: Q^1,Q^2,,Q^t\hat{Q}_1, \hat{Q}_2, \ldots, \hat{Q}_t

Return the average:

P^=Q^1+Q^2++Q^tt\hat{P} = \frac{\hat{Q}_1 + \hat{Q}_2 + \cdots + \hat{Q}_t}{t}

Guarantees from boosting:

  1. Unbiasedness: E(P^)=QE(\hat{P}) = Q

  2. Concentration (by Chebyshev’s inequality):

    Pr(P^Qc)Var(P^)c2=Var(Q^)tc2\Pr(|\hat{P} - Q| \ge c) \le \frac{\text{Var}(\hat{P})}{c^2} = \frac{\text{Var}(\hat{Q})}{tc^2}

    This shows that by running tt copies and averaging, we reduce the variance by a factor of tt, giving us a concentration bound that improves with more copies.