Lecture 19 (04/15/2026) - AMS Sampling: Guarantee Boosting; Counting Distinct Elements; Uniform RV
Scribes: Vinesh Seepersaud, Anastasiia Tcyrenzhapova
Summary
Section titled “Summary”- Frequency moment estimation in streams
- AMS sampling for estimating
- Boosting expectation to an guarantee
- Using a non-Bernoulli Chernoff bound
- Proof of a key lemma used to bound the number of copies
Frequency Moment Estimation
Section titled “Frequency Moment Estimation”Let the stream be
For each item , let be its frequency. The goal is to estimate
More generally, one may estimate higher moments such as .
AMS Sampling
Section titled “AMS Sampling”The AMS (Alon-Matias-Szegedy) estimator works as follows:
-
- Sample a position uniformly at random from the stream.
- Let the sampled item be .
- Let be the number of occurrences of at or after position .
- Output
For the -th moment, the estimator becomes
A key fact is , so one run of AMS gives an unbiased estimator for the second frequency moment.
To see why, think about what happens when we sample position and it lands on item . Say this is the -th occurrence of in the stream ( being the first). Then there are occurrences of from position onward, so . Since each position is equally likely to be sampled (with probability ), the contribution to from item is:
The last equality is the fact that the sum of the first odd numbers equals : . Summing the contributions over all distinct items gives .
Boosting to an Guarantee
Section titled “Boosting to an (ε,δ)(\varepsilon, \delta)(ε,δ) Guarantee”We want an output such that
This means that with probability at least , the estimate has relative error at most .
Averaging Independent Copies
Section titled “Averaging Independent Copies”Run the AMS estimator independently times, producing . Define
By linearity of expectation, .
Chernoff Bound for Non-Bernoulli Variables
Section titled “Chernoff Bound for Non-Bernoulli Variables”Because each is not Bernoulli, we use a generalized Chernoff bound. If are i.i.d. random variables in and , then
Bounding the Output Range
Section titled “Bounding the Output Range”Since and , we have , hence . So we may take .
Substituting and gives
To make this at most , it is enough that
The issue is that both (the max frequency) and (the quantity we’re trying to estimate in the first place) are unknown while the stream is being processed. So even though we know the right formula for , we can’t compute it yet.
Key Lemma
Section titled “Key Lemma”To get a computable bound, we replace the unknown ratio with an upper bound involving only things we know ahead of time. The key fact is:
Since — the universe size — is known before the stream begins, this gives us a concrete value to use. Substituting for may overestimate the true minimum number of copies needed, but that is fine: we just run a few more copies than strictly necessary, and the guarantee still holds.
Using this, it suffices to choose . Therefore, running
independent copies of AMS and averaging them yields an -approximation for .
Proof of the Key Lemma
Section titled “Proof of the Key Lemma”We prove that .
First note that
We need two lower bounds on . The first comes from asking: among all ways to distribute total frequency across items, which minimizes ? The answer is to spread frequency evenly — give each item . This is the same principle as: for fixed sum, the sum of squares is minimized when all values are equal. The minimum is then . So:
The second bound is simpler: one of the terms in is , so trivially .
Now consider two cases, depending on whether is “small” or “large” relative to . Why that threshold? It’s where both bounds on become equal: if , then , so the two lower bounds coincide. Below that threshold the first bound () does the work; above it the second bound () does the work.
Case 1. Suppose is small:
Multiplying both sides by :
Now substitute the lower bound into the denominator (a larger denominator makes the fraction smaller, so replacing with the smaller value only increases the right-hand side — keeping the inequality valid):
Case 2. Suppose is large:
Since , we have . Multiplying both sides by gives
Substituting the lower bound into the denominator:
and so .
Conclusion
Section titled “Conclusion”This lecture followed the standard boosting pattern for streaming algorithms:
-
- Design an unbiased estimator. Find a one-shot randomized algorithm that returns the right answer in expectation, but may be far off on any individual run.
- Repeat independently. Run copies in parallel and average their outputs.
- Determine how many copies suffice. Apply a concentration bound (here, the non-Bernoulli Chernoff bound) to find the minimum that achieves the guarantee.
For AMS sampling, a subtlety arose in step 3: the minimum involved unknown quantities ( and ). The Key Lemma resolved this by bounding , replacing unknown values with the known universe size .
Running copies of AMS sampling and averaging gives an output satisfying
This was for . What about ?
Example: Running copies suffices for estimating . Note that when this becomes , matching the result above. For , the exponent grows toward 1, meaning more copies are needed for higher moments.
Counting Distinct Elements
Section titled “Counting Distinct Elements”Context: You’re working for Amazon, you have a stream of purchases made by people, and at any moment you want to answer how many distinct products the company has sold today.
Given a stream with , output - how many distinct elements have appeared in the stream so far.
For example, for this stream , the output should be 4 because we have 4 different distinct items in the stream: .
The naive approach: start with an empty set, and whenever you see the next item in the stream, check whether it’s in the set. If it is not, add it. In the end, the set will have only distinct elements and require space that is at least . So given the stream where , we’ll need at least bits.
The Flajolet-Martin algorithm addresses this problem:
- First: “Idealized algorithm”
- Next: Practical algorithm
Ideal Algorithm
Section titled “Ideal Algorithm”You have a hash function that takes a number and gives you a uniformly random number between 0 and 1:
where is the upper bound on the number of distinct elements and is the continuous interval. Hash each to a value . We only maintain the smallest hash seen so far.
Output:
We only maintain the — the smallest hash value seen so far. The intuition behind this output formula: if there are distinct elements, each gets an independent uniform hash in , so the minimum of such values has expected value (derived in the Uniform Random Variable section below). Inverting gives approximately , and subtracting 1 gives . So is an unbiased estimator for .
Uniform Random Variable
Section titled “Uniform Random Variable”Continuous random variable .
Density:
Cumulative distribution function:
For a uniform r.v. we do not talk about probability of a single point but only about probability of a small segment (a single point always has probability of zero).
The expectation of a continuous r.v. is defined via its density:
As we close this lecture, we are posed the question: Suppose you throw two darts, and .
What is the expected position of the smaller of the two darts?
What is the expected position of the bigger of the two darts?
We’ll explore this further in the next lecture.