Final Exam Problem Sets
Problem 27: Morris Counter - Streaming Algorithms
Section titled “Problem 27: Morris Counter - Streaming Algorithms”(a) Describe the Morris Counter algorithm and analyze its space complexity compared to a naive counting approach. What is the trade-off being made?
(b) What are the mean and variance of the Morris Counter’s output? Why is the variance a problem?
(c) Describe three successive strategies for improving the Morris Counter’s reliability. How does each strategy reduce the failure probability, and what is the total space cost?
(d) For an application requiring count estimation within of the true count with confidence , where and , how many parallel Morris counters are needed in each of the three strategies from part (c)?
Solution
(a) Morris Counter algorithm:
- Initialize counter
- On each new stream element, increment to with probability
- Return estimate
Space complexity: Only is stored, where . This requires bits.
Naive approach: Storing the exact count requires bits.
Trade-off: By randomizing increments, we achieve exponential space savings ( vs. ), but accept a random error in the estimate. The output is approximate, not exact.
(b) Mean and variance:
From the Morris Counter algorithm:
- (unbiased)
Why it’s a problem: The standard deviation is , so the output can deviate wildly from the true count. With high probability, the estimate could be off by or more, making a single Morris Counter unreliable for high-confidence estimates.
(c) Three successive strategies:
Strategy 1: Morris (basic)
- Run a single Morris Counter
- Success probability: undefined (no confidence guarantee)
- Failure probability: unbounded
- Space: bits
Strategy 2: Morris+ (weak estimate via averaging)
- Run independent Morris Counters in parallel
- Return the average of the estimates
- By Chebyshev’s inequality (or Exercise 4.9(b)), this gives an -estimate
- Success probability: (failure probability )
- Space: bits
Strategy 3: Morris++ (strong estimate via boosting)
- Run independent Morris+ instances in parallel
- Return the median of their outputs
- By the median-of-weak-estimates trick (Exercise 4.9(c)), this gives a -estimate
- Success probability: (arbitrary confidence)
- Space: bits
Each strategy improves failure probability at the cost of more parallel instances (and more space).
(d) Numerical calculation for , :
Morris: No reliability guarantee.
Morris+:
Morris++:
Total counters needed: