Skip to content

Lecture 15 on 03/18/2026 - Epsilon-Delta Approximate Median; Morris+ and ++; Variance of Morris Counter

In an ideal scenario, we would compute the exact median—the element with rank exactly N2\frac{N}{2}. However, this is often infeasible in streaming contexts where we cannot afford to store or revisit all elements. Instead, we settle for an approximate median: an element whose rank is close enough to the true median rank.

By “close enough,” we mean within an additive error band: we allow the rank to deviate by up to εN\varepsilon N in either direction. This gives us a practical solution that uses limited space and time while still providing a meaningful approximation.

rank(median)=N2\text{rank}(\text{median}) = \frac{N}{2}

Input: A stream S={x1,x2,,xN}S = \{x_1, x_2, \ldots, x_N\} and ε,δ(0,1)\varepsilon, \delta \in (0,1)

Goal: Return mm such that (12ε)Nrank(m)(12+ε)N\left(\frac{1}{2} - \varepsilon\right)N \leq \text{rank}(m) \leq \left(\frac{1}{2} + \varepsilon\right)N w.p. 1δ\geq 1-\delta

The algorithm succeeds if the returned median’s rank falls within the acceptable window. Consider the sorted stream of NN elements. The acceptable range for the rank of the median spans from (1/2ε)N(1/2 - \varepsilon)N (the left boundary) to (1/2+ε)N(1/2 + \varepsilon)N (the right boundary).

The size of this acceptable range is:

(12+ε)N(12ε)N=2εN\left(\frac{1}{2} + \varepsilon\right)N - \left(\frac{1}{2} - \varepsilon\right)N = 2\varepsilon N

The remaining (12ε)N(1 - 2\varepsilon)N elements fall in the two bad regions combined. If the returned element’s rank falls outside the acceptable range - either below (1/2ε)N(1/2 - \varepsilon)N or above (1/2+ε)N(1/2 + \varepsilon)N - the algorithm fails.

The algorithm is surprisingly simple and relies on the power of sampling:

  • Uniformly sample t=O(ε2logδ1)t = O(\varepsilon^{-2} \log \delta^{-1}) elements independently from the stream
  • Compute the median of these tt samples
  • Return this sampled median as our estimate

The intuition is that when we sample uniformly, elements in the “bad” regions (far from the true median) will also be sampled proportionally to their frequency. If we sample enough elements, it becomes unlikely that more than half of our samples fall into these bad regions, which would be necessary for the median of our samples to also be bad.

Let mm be the output of the Algorithm. Then,

(12ε)Nrank(m)(12+ε)N\left(\frac{1}{2} - \varepsilon\right)N \leq \text{rank}(m) \leq \left(\frac{1}{2} + \varepsilon\right)N

with probability at least 1δ1-\delta.

To understand why the algorithm works, let’s first identify when it would fail.

Partition the sorted stream into three regions:

  • Left bad region (SLS_L): the smallest (12ε)N\left(\frac{1}{2} - \varepsilon\right)N elements
  • Good region (middle): the middle 2εN2\varepsilon N elements—this is our acceptable range
  • Right bad region (SRS_R): the largest (12ε)N\left(\frac{1}{2} - \varepsilon\right)N elements

Let TT be the set of tt samples. If the median of our samples falls in SLS_L, then by definition of median, at least half of the tt samples must also be in SLS_L. The same argument applies symmetrically for SRS_R.

To formalize this, let TL=TSLT_L = T \cap S_L and TR=TSRT_R = T \cap S_R be the samples that fall in the left and right bad regions respectively. Then the algorithm fails if and only if TL>t2|T_L| > \frac{t}{2} or TR>t2|T_R| > \frac{t}{2}.

Now we’ll use concentration inequalities to bound how likely it is that too many samples fall into bad regions.

Step 1: Probability of sampling a bad element

When we uniformly sample an element from the stream, the probability it lands in SLS_L is simply:

Pr(sampled elementSL)=SLN=(1/2ε)NN=12ε\Pr(\text{sampled element} \in S_L) = \frac{|S_L|}{N} = \frac{(1/2 - \varepsilon)N}{N} = \frac{1}{2} - \varepsilon

Since we sample independently, each of the tt samples is a Bernoulli trial with success probability p=12εp = \frac{1}{2} - \varepsilon (where “success” means landing in SLS_L).

Step 2: Expected number of bad samples

By linearity of expectation:

E[TL]=t(12ε)E[|T_L|] = t \cdot \left(\frac{1}{2} - \varepsilon\right)

This tells us that we expect about (12ε)\left(\frac{1}{2} - \varepsilon\right) fraction of our samples to fall in SLS_L. Since 12ε<12\frac{1}{2} - \varepsilon < \frac{1}{2}, we expect fewer than half of the samples to be bad—which is good! But we need to bound how much the actual count can deviate from this expectation.

Step 3: Setting up the concentration problem

We want to bound the probability that TL|T_L| exceeds a critical threshold. The critical threshold is when more than half our samples are bad: TL>t2|T_L| > \frac{t}{2}.

To apply Chernoff’s bound, we need to express this event as a multiplicative deviation from the expectation. Let’s rewrite:

P(TL>t2)=P(TL>(1+δ)E[TL])P\left(|T_L| > \frac{t}{2}\right) = P\left(|T_L| > (1 + \delta') E[|T_L|]\right)

where δ\delta' is a deviation parameter. Solving for δ\delta':

t2=(1+δ)(12ε)t\frac{t}{2} = (1 + \delta')\left(\frac{1}{2} - \varepsilon\right)t

Dividing both sides by (12ε)t\left(\frac{1}{2} - \varepsilon\right)t:

δ=t/2(1/2ε)t1=1/21/2ε1=ε1/2ε\delta' = \frac{t/2}{(1/2 - \varepsilon)t} - 1 = \frac{1/2}{1/2 - \varepsilon} - 1 = \frac{\varepsilon}{1/2 - \varepsilon}

Step 4: Applying Chernoff’s bound

Chernoff’s bound tells us that for a sum of independent Bernoulli random variables with expectation μ\mu, the probability of exceeding (1+δ)μ(1+\delta')\mu decays exponentially:

P(X>(1+δ)μ)eδ2μ3P(X > (1+\delta')\mu) \leq e^{-\frac{\delta'^2 \mu}{3}}

Plugging in our values (μ=E[TL]=(1/2ε)t\mu = E[|T_L|] = (1/2 - \varepsilon)t and δ=ε1/2ε\delta' = \frac{\varepsilon}{1/2 - \varepsilon}):

P(TL>t2)e13(ε1/2ε)2(1/2ε)t=eε23(1/2ε)tP\left(|T_L| > \frac{t}{2}\right) \leq e^{-\frac{1}{3} \cdot \left(\frac{\varepsilon}{1/2 - \varepsilon}\right)^2 \cdot (1/2 - \varepsilon)t} = e^{-\frac{\varepsilon^2}{3(1/2 - \varepsilon)} \cdot t}

Similarly:

P(TR>t2)<e13ε21/2εtP\left(|T_R| > \frac{t}{2}\right) < e^{-\frac{1}{3} \cdot \frac{\varepsilon^2}{1/2 - \varepsilon} \cdot t}

By union bound:

P(TL>t2 or TR>t2)P(TL>t2)+P(TR>t2)=2e13ε21/2εtP\left(|T_L| > \frac{t}{2} \text{ or } |T_R| > \frac{t}{2}\right) \leq P\left(|T_L| > \frac{t}{2}\right) + P\left(|T_R| > \frac{t}{2}\right) = 2e^{-\frac{1}{3} \cdot \frac{\varepsilon^2}{1/2 - \varepsilon} \cdot t}

We want to bound 2e13ε21/2εt2e^{-\frac{1}{3} \cdot \frac{\varepsilon^2}{1/2 - \varepsilon} \cdot t} by δ\delta:

2e13ε21/2εt<δ    13ε21/2εt<lnδ22e^{-\frac{1}{3} \cdot \frac{\varepsilon^2}{1/2 - \varepsilon} \cdot t} < \delta \iff -\frac{1}{3} \cdot \frac{\varepsilon^2}{1/2 - \varepsilon} \cdot t < \ln\frac{\delta}{2}

Solving for tt:

t>1/2εε23ln2δt > \frac{1/2 - \varepsilon}{\varepsilon^2} \cdot 3 \cdot \ln\frac{2}{\delta}

Conclusion: If t>3ε2ln2δt > \frac{3}{\varepsilon^2} \ln\frac{2}{\delta}, then with probability at most δ\delta the algorithm fails.

Hence, if t=ε2logδ1t = \varepsilon^{-2} \log \delta^{-1} (which is t>3ε2ln2δt > \frac{3}{\varepsilon^2} \ln\frac{2}{\delta} up to constant factors), then with probability at least 1δ1-\delta the algorithm succeeds.

Input: A stream x1,x2,,xNx_1, x_2, \ldots, x_N

Goal: Count the number of elements using loglogN\log \log N bits.

Let CiC_i denote the current counter value.

Update rule: CiCi+1C_i \leftarrow C_i + 1 with probability 12Ci\frac{1}{2^{C_i}}

  • E[Out]=NE[\text{Out}] = N
  • Var(Out)=N(N1)2\text{Var}(\text{Out}) = \frac{N(N-1)}{2}

To compute variance, use the formula: Var(x)=E[x2]E[x]2\text{Var}(x) = E[x^2] - E[x]^2

We need E[(2Ci+1)2]E\left[(2^{C_{i+1}})^2\right], which can be found in a similar fashion to E[2Ci+1]E[2^{C_i+1}].

Morris Counter provides an elegant solution using only O(loglogN)O(\log \log N) bits, with the following attributes:

  • Mean: E[Out]=NE[\text{Out}] = N (correct on average)
  • Variance: Var(Out)=O(N2)\text{Var}(\text{Out}) = O(N^2) (highly variable, a significant drawback)

The large variance means that while the expected output is correct, the actual output can deviate wildly from NN. For example, with high probability the output could be as far as NN away from the true count—rendering the estimate unreliable.

The natural solution is to run multiple independent Morris Counters and average their outputs. Exercise 4.9(b) tells us that to get an (ε,3/4)(\varepsilon, 3/4)-estimate (within εN\varepsilon N with probability 3/43/4), we need:

Number of parallel instances=O(r2ε2)\text{Number of parallel instances} = O\left(\frac{r^2}{\varepsilon^2}\right)

where r=Var(Out)E[Out]=N(N1)/2N12r = \frac{\sqrt{\text{Var}(\text{Out})}}{E[\text{Out}]} = \frac{\sqrt{N(N-1)/2}}{N} \approx \frac{1}{\sqrt{2}} is a constant. So we need O(1/ε2)O(1/\varepsilon^2) parallel machines.

By averaging the outputs of O(1/ε2)O(1/\varepsilon^2) Morris Counters, we get an estimate within εN\varepsilon N with probability at least 3/43/4.

But we still have a problem: the failure probability is 1/41/4, which may be too high. We can boost confidence further by running O(log(1/δ))O(\log(1/\delta)) copies of Morris+ in parallel and taking their median.

By Exercise 4.9(c), taking the median of O(log(1/δ))O(\log(1/\delta)) independent weak estimates (each with failure probability 1/41/4) yields a strong estimate with failure probability at most δ\delta.

Final result: Morris++ uses O(log(1/δ)1ε2)O\left(\log(1/\delta) \cdot \frac{1}{\varepsilon^2}\right) parallel machines and achieves an (ε,δ)(\varepsilon, \delta)-estimate of the count.

Let XnX_n denote the counter’s state when nn items have passed.

E[(2Xn)2]=E[22Xn]=j=0P(Xn1=j)E[22XnXn1=j]E[(2^{X_n})^2] = E[2^{2X_n}] = \sum_{j=0}^{\infty} P(X_{n-1} = j) E[2^{2X_n} \mid X_{n-1} = j]

Expanding based on the transition probabilities:

=P(Xn1=j)(12j22(j+1)+(112j)22j)= \sum P(X_{n-1} = j) \left(\frac{1}{2^j} \cdot 2^{2(j+1)} + \left(1 - \frac{1}{2^j}\right) \cdot 2^{2j}\right)

Simplifying (noting 2j+22j=32j2^{j+2} - 2^j = 3 \cdot 2^j):

=P(Xn1=j)(22j+32j)= \sum P(X_{n-1} = j) (2^{2j} + 3 \cdot 2^j)

Breaking this into two separate expectations:

=E[22Xn1]+3E[2Xn1]= E[2^{2X_{n-1}}] + 3 \cdot E[2^{X_{n-1}}]

Note: From the Morris Counter algorithm, E[2Xn]=n+1E[2^{X_n}] = n + 1, so E[2Xn1]=nE[2^{X_{n-1}}] = n. Therefore:

E[22Xn]=E[22Xn1]+3nE[2^{2X_n}] = E[2^{2X_{n-1}}] + 3n

Solving by telescoping (with base case E[22X0]=20=1E[2^{2X_0}] = 2^0 = 1):

E[22Xn]=E[22X0]+3(1+2++n)=1+3n(n+1)2=32n2+32n+1\begin{aligned} E[2^{2X_n}] &= E[2^{2X_0}] + 3(1 + 2 + \cdots + n) \\ &= 1 + 3 \cdot \frac{n(n+1)}{2} \\ &= \frac{3}{2}n^2 + \frac{3}{2}n + 1 \end{aligned}

Therefore, the variance is:

Var[2Xn]=(32n2+32n+1)(n+1)2=n(n1)2n2\text{Var}[2^{X_n}] = \left(\frac{3}{2}n^2 + \frac{3}{2}n + 1\right) - (n+1)^2 = \frac{n(n-1)}{2} \leq n^2