Lecture 18 (04/13/2026) - Frequency Moments; AMS Sampling Algorithm
Scribes: Simrandeep Singh and Amrina Qayyum
Plan of the Remaining Course
Section titled “Plan of the Remaining Course”In the streaming algorithms part of the course, we have already seen:
- sampling,
- counting,
- approximate median,
- heavy hitters using Count-Min Sketch.
The remaining topics in the streaming section are:
- frequency estimation,
- counting distinct elements.
After that, the course will move to other topics.
A course summary shown in class listed the following topics:
-
- Dictionary problem:
- HWC
- FKS
- linear probing
- cuckoo hashing
- Approximate membership problem:
- Bloom filter
- Streaming algorithms:
- sampling
- counting: Morris, Morris+, Morris++
- approximate median
- heavy hitters: CMS with guarantee
- frequency estimation
- HyperLogLog (counting distinct elements)
- External memory algorithms
- Nearest neighbor, dimensionality reduction
- Gen AI models, privacy (differential privacy)
The probability tools covered in the course: Bernoulli, Geometric, Coupon Collector, Expectation, Variance, Markov, Chebyshev, Chernoff, Balls & Bins.
Summary of Lecture
Section titled “Summary of Lecture”In this lecture, we covered the following topics:
- Heavy Hitters,
- Count-Min Sketch,
- Bloom Filter Discussion,
- Frequency Moment Estimation,
- AMS Sampling,
- Chernoff Bound for Non-Bernoulli Variables.
This lecture began with a brief review of heavy hitters and the Count-Min Sketch. The discussion then moved to practical questions and limitations related to Bloom filters, including locality of memory access, adaptive handling of false positives, counting variants, and range- or distance-based queries. The second main part of the lecture introduced frequency moment estimation, with particular emphasis on the second frequency moment. The AMS sampling method was presented for estimating this quantity, followed by a proof that the estimator is correct in expectation. The lecture ended with the idea of boosting accuracy by repetition, a connection to Exercise 4.9, and a Chernoff-style concentration bound for non-Bernoulli random variables. The remaining issue of the unknown maximum frequency was postponed to the next class, which will continue with HyperLogLog for counting distinct elements.
Heavy Hitters
Section titled “Heavy Hitters”Consider a stream
Assume that the stream contains numbers from .
For any , define
that is, the number of times item appears in the stream.
For example,
The goal is to keep track of the top- most frequent items, called the heavy hitters.
Why Allow Error?
Section titled “Why Allow Error?”If no error is allowed, then even for we may need to store the full stream. For this reason, the lecture allows approximation.
How to Maintain Top-
Section titled “How to Maintain Top-KKK”It is enough to build a data structure that answers queries of the form
Then we maintain a min-heap of size .
When a new item appears:
-
- estimate using the data structure,
- compare this estimate with the minimum-frequency item in the heap,
- if the new estimate is larger, remove the minimum item and insert the new one,
- if the item is already in the heap, update its value in the heap.
Thus, the method uses space proportional to , rather than storing frequencies for all items.
Count-Min Sketch
Section titled “Count-Min Sketch”Let
The goal is to store the stream so as to answer frequency queries:
Parameters
Section titled “Parameters”The Count-Min Sketch has an guarantee. Take
For each , choose a hash function
Data Structure
Section titled “Data Structure”The structure is an matrix of counters.
Update Rule
Section titled “Update Rule”When an item arrives:
-
- compute ,
- increment the counters in those cells by .
Query Rule
Section titled “Query Rule”To estimate the frequency of an item :
-
- compute ,
- look at the corresponding counters,
- output the minimum of those values.
Call the returned value .
Basic Property
Section titled “Basic Property”Count-Min Sketch never underestimates:
Hence it only overestimates frequencies.
Guarantee
Section titled “Guarantee”Let be the length of the stream. Then
Equivalently,
Important Note
Section titled “Important Note”This is an additive error guarantee with respect to the stream length , not a relative error guarantee.
Proof Sketch
Section titled “Proof Sketch”Fix one row, and let be the corresponding counter for item in that row. Since every true occurrence of increments that counter,
From the class proof,
Define
Then
By Markov’s inequality, for one row,
Now suppose we have rows. For the Count-Min Sketch estimate to exceed , every row must have error at least . Therefore,
Hence,
Questions About Bloom Filters and Related Variants
Section titled “Questions About Bloom Filters and Related Variants”After reviewing heavy hitters and Count-Min Sketch, the lecture discussed several questions about Bloom filters and related structures.
(a) Can We Avoid Reading Scattered Cells?
Section titled “(a) Can We Avoid Reading Scattered Cells?”In Bloom filters and Count-Min Sketch, the query algorithm reads multiple cells that are usually scattered in memory. A natural question is whether one can design a structure with similar guarantees but with a query algorithm that avoids such scattered access. A related structure mentioned in class was the Quotient Filter (Bender et al.).
(b) Can a Bloom Filter Fix Its Mistakes?
Section titled “(b) Can a Bloom Filter Fix Its Mistakes?”A Bloom filter can return a false positive. Suppose a query is not in the true data set, but the Bloom filter says “yes.” If the real data set is then checked and this is confirmed to be a false positive, the question is whether the filter can be updated so that the same incorrect answer does not occur again. This idea was discussed in class in connection with a Broom Filter.
(c) What if the Set is Actually a Multiset?
Section titled “(c) What if the Set is Actually a Multiset?”If the stored object is a multiset, then items may repeat. Instead of asking only whether , one may ask:
That is, how many times does appear? This leads to counting filters / counting Bloom filters, which are closely related to Count-Min Sketch.
(d) Range Query
Section titled “(d) Range Query”Instead of asking whether one element belongs to the set, one may ask:
(e) Distance Query
Section titled “(e) Distance Query”Another possible query is:
This leads to distance-sensitive Bloom filters.
Frequency Moment Estimation
Section titled “Frequency Moment Estimation”The next main topic in class was frequency moment estimation.
Consider the stream
Its item frequencies are:
Therefore,
Moments
Section titled “Moments”If is a random variable, then:
- is the second moment,
- is the third moment,
- in general, is the -th moment.
For stream frequencies:
- the -th moment is the number of distinct elements,
- the -st moment is the sum of frequencies, i.e. the stream length,
- the -nd moment is the sum of squared frequencies.
Problem Statement
Section titled “Problem Statement”Given a stream
define
The goal is to output
This is the second frequency moment.
Why care about this quantity? One well-known application is the Gini index from economics, which measures income inequality. Imagine the stream items are salary brackets, and each bracket’s frequency is the number of people earning in that range. When income is concentrated — a few brackets have very high frequencies — the sum of squared frequencies is large. When income is spread more evenly across brackets, the sum is smaller. The Gini index uses exactly this structure to quantify how unequal a distribution is.
More generally, if is any function with , then the goal can be generalized to
Examples:
AMS Sampling
Section titled “AMS Sampling”The algorithm introduced in class was AMS sampling (Alon-Matias-Szegedy).
Algorithm
Section titled “Algorithm”-
- Sample an element from the stream uniformly at random. Suppose the sampled element is .
- Count how many times appears at or after position in the stream. Call this value .
- Output for the second moment.
For the -th moment, the output becomes
For the general function , the output becomes
Example
Section titled “Example”Take
Suppose the sampled value is and from that sampled position onward it appears times. Then the AMS output is
The true value of the second moment in this stream is , so one run gives a random estimate rather than the exact answer.
To develop some intuition for why this works on average, consider tracing the algorithm across all possible sampled elements. Item has occurrences, so it could be sampled at any of its three positions in the stream:
- Sampling the first occurrence of : , output .
- Sampling the second occurrence of : , output .
- Sampling the third occurrence of : , output .
Sampling item (which appears exactly once) gives , output .
Each of the stream positions is equally likely to be sampled. If you computed the AMS output for every position and averaged all 11 results, you would get exactly — the true second moment. The theorem below proves this holds in general.
Expected Value of AMS Sampling
Section titled “Expected Value of AMS Sampling”Define
The claim proved in class is
Let
Then
Since the stream is sampled uniformly over positions,
Now, conditioned on sampling value , each of its occurrences is equally likely to be chosen. If the sampled occurrence is the -th occurrence of item , then the number of copies of from that point onward is
So
Substituting into the expectation gives
The factors and cancel, and also cancels. Let
Then
Now the inner sum telescopes. Each consecutive pair of terms cancels: the at the end of the first term is cancelled by the at the start of the second term; similarly the from the second is cancelled by the from the third; and so on. Everything cancels except the very last positive term:
Therefore,
General Form
Section titled “General Form”The same proof works for
and gives
From Expectation to an Guarantee
Section titled “From Expectation to an (ε,δ)(\varepsilon,\delta)(ε,δ) Guarantee”One run of AMS sampling gives a single random estimate that is correct in expectation but could be far off on any individual run. To get an guarantee — meaning the estimate is within a factor of the truth with probability at least — we use the standard boosting strategy: run the algorithm independent times
and report the average
The question is: how large does need to be? This is where the Chernoff bound comes in — it lets us solve for the minimum that achieves the desired guarantee. This is the same approach used in Exercise 4.9 of the textbook.
Chernoff Bound for Non-Bernoulli Variables
Section titled “Chernoff Bound for Non-Bernoulli Variables”A version of Chernoff’s bound for non-Bernoulli variables was written in class.
Let be i.i.d. random variables taking values in , and let
Then, for :
Applying This to AMS Sampling
Section titled “Applying This to AMS Sampling”For AMS sampling,
Expanding , the terms cancel and this simplifies to
Let
Since counts how many times the sampled element appears from its position onward, can never exceed the frequency of the most frequent element in the stream. Even in the best case — sampling the most frequent item at its very first occurrence — you get at most copies remaining. So
Hence (dropping the since it only makes smaller)
So, for the non-Bernoulli Chernoff bound, one may take
With in hand, the Chernoff bound gives a concrete tail bound on . Setting that tail bound equal to and solving for gives the minimum number of AMS repetitions needed to achieve the desired guarantee.