Lecture 13 on 03/11/2026 - Streaming: Approximate Counting; Epsilon-Delta Guarantee; Morris Counter
Approximate Counting
Section titled “Approximate Counting”Problem Setup
Section titled “Problem Setup”Counting Problem: Given a stream , how many keys in total have appeared so far?
For this problem, we simply count .
Naive Approach: Increment a counter each time. At the end, the counter stores , which requires bits of space.
Question: Can we do better? For example, can we use space?
Application: Tracking hits to Wikipedia pages. We cannot track counts exactly with fewer than bits. However, for most applications, we don’t need exact values of . We’ll define to be the relative error:
The (ε,δ)-Guarantee
Section titled “The (ε,δ)-Guarantee”In general, if we want to estimate some quantity from a stream , we define a guarantee for the random variable estimate .
Definition: A random variable has an -guarantee if:
Where:
- : accuracy or relative error
- : failure probability
This can equivalently be written as:
All algorithms in the streaming section will aim to achieve an -guarantee.
Approximate Counting with (ε,δ)-Guarantee
Section titled “Approximate Counting with (ε,δ)-Guarantee”Goal: We want an estimate of , such that has an -guarantee:
Equivalently:
Morris Counter
Section titled “Morris Counter”Algorithm
Section titled “Algorithm”Goal: We want to remember an estimate of with an -guarantee.
Algorithm:
- Start with a counter
- When a key appears, increment by 1 with probability ; otherwise keep unchanged
- At the end, when asked for the count, return
Theorem:
Steps to Obtain an (ε,δ)-Guarantee Algorithm
Section titled “Steps to Obtain an (ε,δ)-Guarantee Algorithm”Suppose we want to estimate a quantity .
Step 1: Find an unbiased estimator
Find an estimate such that .
Step 2: Compute variance
Compute .
Step 3: Boost (run multiple copies)
Run independent copies of :
Return the average:
Guarantees from boosting:
-
Unbiasedness:
-
Concentration (by Chebyshev’s inequality):
This shows that by running copies and averaging, we reduce the variance by a factor of , giving us a concentration bound that improves with more copies.