Lecture 13 on 03/11/2026 - Streaming: Approximate Counting; Epsilon-Delta Guarantee; Morris Counter

Approximate Counting

Problem Setup

Counting Problem: Given a stream $S = \{x_1, \ldots, x_n\}$ , how many keys in total have appeared so far?

For this problem, we simply count $n$ .

Naive Approach: Increment a counter each time. At the end, the counter stores $C = n$ , which requires $\log n$ bits of space.

Question: Can we do better? For example, can we use $\log \log n$ space?

Application: Tracking hits to Wikipedia pages. We cannot track counts exactly with fewer than $\log n$ bits. However, for most applications, we don’t need exact values of $n$ . We’ll define $\varepsilon$ to be the relative error:

\frac{|\hat{X} - X|}{X} \le \varepsilon

The (ε,δ)-Guarantee

In general, if we want to estimate some quantity $Q(s)$ from a stream $S$ , we define a guarantee for the random variable estimate $\hat{Q}(s)$ .

Definition: A random variable $\hat{Q}(s)$ has an $(\varepsilon, \delta)$ -guarantee if:

\Pr\left(\frac{|\hat{Q}(s) - Q(s)|}{Q(s)} \le \varepsilon\right) \ge 1 - \delta

Where:

$\varepsilon$ : accuracy or relative error
$\delta$ : failure probability

This can equivalently be written as:

\Pr(|\hat{Q} - Q| \le \varepsilon Q) \ge 1 - \delta

All algorithms in the streaming section will aim to achieve an $(\varepsilon, \delta)$ -guarantee.

Approximate Counting with (ε,δ)-Guarantee

Goal: We want an estimate $\hat{Z}$ of $n$ , such that $\hat{Z}$ has an $(\varepsilon, \delta)$ -guarantee:

\Pr\left(\frac{|\hat{Z} - n|}{n} \le \varepsilon\right) \ge 1 - \delta

Equivalently:

\Pr(|\hat{Z} - n| \le \varepsilon n) \ge 1 - \delta

Morris Counter

Algorithm

Goal: We want to remember an estimate $\hat{Z}$ of $n$ with an $(\varepsilon, \delta)$ -guarantee.

Algorithm:

Start with a counter $C_0 = 0$
When a key $x_{i+1}$ appears, increment $C$ by 1 with probability $\frac{1}{2^{C_i}}$ ; otherwise keep $C$ unchanged
At the end, when asked for the count, return $2^{C_n} - 1$

Theorem:

E(2^{C_n} - 1) = n

\text{Var}(2^{C_n}) \approx \frac{n^2}{2}

Steps to Obtain an (ε,δ)-Guarantee Algorithm

Suppose we want to estimate a quantity $Q$ .

Step 1: Find an unbiased estimator

Find an estimate $\hat{Q}$ such that $E(\hat{Q}) = Q$ .

Given independent random variables $X_1, \ldots, X_t$ with $\text{Var}(X_i) = \sigma$ for all $1 \le i \le t$ , consider their average:

Y = \frac{X_1 + X_2 + \cdots + X_t}{t}

To find $\text{Var}(Y)$ , we first note that for the sum $S = X_1 + X_2 + \cdots + X_t$ , variance is additive for independent random variables:

\text{Var}(S) = \text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_t) = t\sigma

Now, applying the scaling rule $\text{Var}(aX) = a^2 \text{Var}(X)$ :

\text{Var}(Y) = \text{Var}\left(\frac{S}{t}\right) = \left(\frac{1}{t}\right)^2 \text{Var}(S) = \frac{1}{t^2} \cdot t\sigma = \frac{\sigma}{t}

Therefore, averaging $t$ independent copies reduces the variance by a factor of $t$ .

Step 2: Compute variance

Compute $\text{Var}(\hat{Q})$ .

Step 3: Boost (run multiple copies)

Run $t$ independent copies of $\hat{Q}$ : $\hat{Q}_1, \hat{Q}_2, \ldots, \hat{Q}_t$

Return the average:

\hat{P} = \frac{\hat{Q}_1 + \hat{Q}_2 + \cdots + \hat{Q}_t}{t}

Guarantees from boosting:

Unbiasedness: $E(\hat{P}) = Q$
Concentration (by Chebyshev’s inequality):
$\Pr(|\hat{P} - Q| \ge c) \le \frac{\text{Var}(\hat{P})}{c^2} = \frac{\text{Var}(\hat{Q})}{tc^2}$
This shows that by running $t$ copies and averaging, we reduce the variance by a factor of $t$ , giving us a concentration bound that improves with more copies.