Skip to content

Final Exam Problem Sets

Problem 27: Morris Counter - Streaming Algorithms

Section titled “Problem 27: Morris Counter - Streaming Algorithms”

(a) Describe the Morris Counter algorithm and analyze its space complexity compared to a naive counting approach. What is the trade-off being made?

(b) What are the mean and variance of the Morris Counter’s output? Why is the variance a problem?

(c) Describe three successive strategies for improving the Morris Counter’s reliability. How does each strategy reduce the failure probability, and what is the total space cost?

(d) For an application requiring count estimation within εN\varepsilon N of the true count with confidence 1δ\geq 1 - \delta, where ε=0.1\varepsilon = 0.1 and δ=0.01\delta = 0.01, how many parallel Morris counters are needed in each of the three strategies from part (c)?

Solution

(a) Morris Counter algorithm:

  1. Initialize counter C=0C = 0
  2. On each new stream element, increment CC to C+1C+1 with probability p=1/2Cp = 1/2^C
  3. Return estimate n^=2C1\hat{n} = 2^C - 1

Space complexity: Only CC is stored, where 0Clog2n0 \le C \le \log_2 n. This requires O(loglogn)O(\log \log n) bits.

Naive approach: Storing the exact count requires Θ(logn)\Theta(\log n) bits.

Trade-off: By randomizing increments, we achieve exponential space savings (O(loglogn)O(\log \log n) vs. O(logn)O(\log n)), but accept a random error in the estimate. The output is approximate, not exact.


(b) Mean and variance:

From the Morris Counter algorithm:

  • E[n^]=E[2C1]=NE[\hat{n}] = E[2^C - 1] = N (unbiased)
  • Var(n^)=Var(2C1)N(N1)2=O(N2)\text{Var}(\hat{n}) = \text{Var}(2^C - 1) \approx \frac{N(N-1)}{2} = O(N^2)

Why it’s a problem: The standard deviation is O(N)O(N), so the output can deviate wildly from the true count. With high probability, the estimate could be off by NN or more, making a single Morris Counter unreliable for high-confidence estimates.


(c) Three successive strategies:

Strategy 1: Morris (basic)

  • Run a single Morris Counter
  • Success probability: undefined (no confidence guarantee)
  • Failure probability: unbounded
  • Space: O(loglogN)O(\log \log N) bits

Strategy 2: Morris+ (weak estimate via averaging)

  • Run T=1/ε2T = \lceil 1/\varepsilon^2 \rceil independent Morris Counters in parallel
  • Return the average of the TT estimates
  • By Chebyshev’s inequality (or Exercise 4.9(b)), this gives an (ε,3/4)(\varepsilon, 3/4)-estimate
  • Success probability: 3/4\geq 3/4 (failure probability 1/4\le 1/4)
  • Space: O(1/ε2loglogN)O(1/\varepsilon^2 \cdot \log \log N) bits

Strategy 3: Morris++ (strong estimate via boosting)

  • Run M=log(1/δ)M = \lceil \log(1/\delta) \rceil independent Morris+ instances in parallel
  • Return the median of their outputs
  • By the median-of-weak-estimates trick (Exercise 4.9(c)), this gives a (ε,δ)(\varepsilon, \delta)-estimate
  • Success probability: 1δ\geq 1 - \delta (arbitrary confidence)
  • Space: O(log(1/δ)1/ε2loglogN)O(\log(1/\delta) \cdot 1/\varepsilon^2 \cdot \log \log N) bits

Each strategy improves failure probability at the cost of more parallel instances (and more space).


(d) Numerical calculation for ε=0.1\varepsilon = 0.1, δ=0.01\delta = 0.01:

Morris: No reliability guarantee.

Morris+:

T=1/ε2=1/(0.1)2=100=100 countersT = \lceil 1/\varepsilon^2 \rceil = \lceil 1/(0.1)^2 \rceil = \lceil 100 \rceil = 100 \text{ counters}

Morris++:

M=log(1/δ)=log(1/0.01)=log(100)6.64=7 instances of Morris+M = \lceil \log(1/\delta) \rceil = \lceil \log(1/0.01) \rceil = \lceil \log(100) \rceil \approx \lceil 6.64 \rceil = 7 \text{ instances of Morris+}

Total counters needed:

M×T=7×100=700 Morris CountersM \times T = 7 \times 100 = 700 \text{ Morris Counters}