Final Exam Problem Sets

Problem 27: Morris Counter - Streaming Algorithms

(a) Describe the Morris Counter algorithm and analyze its space complexity compared to a naive counting approach. What is the trade-off being made?

(b) What are the mean and variance of the Morris Counter’s output? Why is the variance a problem?

(c) Describe three successive strategies for improving the Morris Counter’s reliability. How does each strategy reduce the failure probability, and what is the total space cost?

(d) For an application requiring count estimation within $\varepsilon N$ of the true count with confidence $\geq 1 - \delta$ , where $\varepsilon = 0.1$ and $\delta = 0.01$ , how many parallel Morris counters are needed in each of the three strategies from part (c)?

Solution

(a) Morris Counter algorithm:

Initialize counter $C = 0$
On each new stream element, increment $C$ to $C+1$ with probability $p = 1/2^C$
Return estimate $\hat{n} = 2^C - 1$

Space complexity: Only $C$ is stored, where $0 \le C \le \log_2 n$ . This requires $O(\log \log n)$ bits.

Naive approach: Storing the exact count requires $\Theta(\log n)$ bits.

Trade-off: By randomizing increments, we achieve exponential space savings ( $O(\log \log n)$ vs. $O(\log n)$ ), but accept a random error in the estimate. The output is approximate, not exact.

(b) Mean and variance:

From the Morris Counter algorithm:

$E[\hat{n}] = E[2^C - 1] = N$ (unbiased)
$\text{Var}(\hat{n}) = \text{Var}(2^C - 1) \approx \frac{N(N-1)}{2} = O(N^2)$

Why it’s a problem: The standard deviation is $O(N)$ , so the output can deviate wildly from the true count. With high probability, the estimate could be off by $N$ or more, making a single Morris Counter unreliable for high-confidence estimates.

(c) Three successive strategies:

Strategy 1: Morris (basic)

Run a single Morris Counter
Success probability: undefined (no confidence guarantee)
Failure probability: unbounded
Space: $O(\log \log N)$ bits

Strategy 2: Morris+ (weak estimate via averaging)

Run $T = \lceil 1/\varepsilon^2 \rceil$ independent Morris Counters in parallel
Return the average of the $T$ estimates
By Chebyshev’s inequality (or Exercise 4.9(b)), this gives an $(\varepsilon, 3/4)$ -estimate
Success probability: $\geq 3/4$ (failure probability $\le 1/4$ )
Space: $O(1/\varepsilon^2 \cdot \log \log N)$ bits

Strategy 3: Morris++ (strong estimate via boosting)

Run $M = \lceil \log(1/\delta) \rceil$ independent Morris+ instances in parallel
Return the median of their outputs
By the median-of-weak-estimates trick (Exercise 4.9(c)), this gives a $(\varepsilon, \delta)$ -estimate
Success probability: $\geq 1 - \delta$ (arbitrary confidence)
Space: $O(\log(1/\delta) \cdot 1/\varepsilon^2 \cdot \log \log N)$ bits

Each strategy improves failure probability at the cost of more parallel instances (and more space).

(d) Numerical calculation for $\varepsilon = 0.1$ , $\delta = 0.01$ :

Morris: No reliability guarantee.

Morris+:

T = \lceil 1/\varepsilon^2 \rceil = \lceil 1/(0.1)^2 \rceil = \lceil 100 \rceil = 100 \text{ counters}

Morris++:

M = \lceil \log(1/\delta) \rceil = \lceil \log(1/0.01) \rceil = \lceil \log(100) \rceil \approx \lceil 6.64 \rceil = 7 \text{ instances of Morris+}

Total counters needed:

M \times T = 7 \times 100 = 700 \text{ Morris Counters}