Skip to content

Lecture 20 (04/22/2026) - Finish Counting Distinct Elements; Uniform RV; Introduce Flajolet-Martin

Scribes: Malaika Khan and Saartaj Alam

  • The idealized algorithm for the counting distinct elements problem
  • Characteristics for uniform random variables and how it can be used in the distinct elements problem
  • Analysis of the Flajolet algorithm

Problem: Given a stream {x1,,xm}\{x_1, \ldots, x_m\} with xi{1,,n}x_i \in \{1, \ldots, n\}, output (F0F_0) how many distinct elements have appeared in the stream so far?

  • Real World Application: Imagine a cell tower with users connecting and disconnecting - how many people have connected to the cell tower?
  • We will first look at an idealized algorithm and then a practical algorithm!

Use a hash function h:{1,,n}[0,1]h : \{1, \ldots, n\} \to [0,1] where [0,1][0,1] is the continuous interval.

When we say the hash function is “random,” we mean that for any given input, its output value is equally likely to be any point in [0,1][0,1] — like throwing a dart at the interval at random. But the hash function is also consistent: if element 5 hashes to 0.31, it will always hash to 0.31 no matter how many times it appears in the stream. The randomness is in how the function was chosen at the start, not in how it evaluates each time.

  • We hash each xix_i to get 0h(xi)10 \leq h(x_i) \leq 1
  • We only maintain the smallest hash value we got so far, i.e., minh(xi)\min h(x_i)
  • To get the number of distinct elements so far, we output 1minh(xi)1\dfrac{1}{\min h(x_i)} - 1

To see why the output makes sense: suppose the stream contains only 2 distinct elements (say, 1 and 8 appearing repeatedly). Since repeated elements always hash to the same value, you only ever compute 2 distinct hash values throughout the entire stream. The expected minimum of 2 uniform [0,1][0,1] variables is 13\frac{1}{3}, so the expected output is 11/31=31=2\frac{1}{1/3} - 1 = 3 - 1 = 2. More generally, for tt distinct elements, the expected minimum hash value is 1t+1\frac{1}{t+1}, giving an expected output of 11/(t+1)1=t\frac{1}{1/(t+1)} - 1 = t. The proof below establishes this.

To analyze this algorithm, we need to understand how the minimum of several uniform random variables behaves, since each hash value h(xi)h(x_i) acts like a uniform [0,1][0,1] random variable. The following sections build up that understanding.

A continuous random variable is a random variable that takes values in a continuum.

  • It can take on any value in a specified interval, but because it can take so many values and its probability still needs to add up to 1, most of those values have a probability of 0.
  • We can define a continuous random variable based on density or distribution. Density is the probability that your random variable XX lies in the small interval xXx+dxx \leq X \leq x + dx.

A uniform random variable is a continuous random variable where XX can take any value in [0,1][0,1], with every part of the interval equally likely. This equal-likelihood is what the density f(x)=1f(x) = 1 for all x[0,1]x \in [0,1] encodes: no region is favored over any other of equal length.

  • Density: f(x)=1    x[0,1]f(x) = 1 \;\forall\; x \in [0,1]
  • Distribution:
F(x)=Pr(Xx)={0X0x0X11X>1F(x) = \Pr(X \leq x) = \begin{cases} 0 & X \leq 0 \\ x & 0 \leq X \leq 1 \\ 1 & X > 1 \end{cases}

The general formula is E(X)=xf(x)dx\mathbb{E}(X) = \int x f(x)\,dx.

  • For our discussed uniform random variable, E(X)=12\mathbb{E}(X) = \dfrac{1}{2}
  • If there are two uniform random variables, or you threw two darts on the interval [0,1][0,1], then the expected value of the smaller dot is 1/31/3 and the expected value of the larger dot is 2/32/3. These can be rewritten as E(min(X1,X2))=1/3\mathbb{E}(\min(X_1, X_2)) = 1/3 and E(max(X1,X2))=2/3\mathbb{E}(\max(X_1, X_2)) = 2/3.

We showed the output was 1minh(xi)1\dfrac{1}{\min h(x_i)} - 1. We will claim that if X=minh(xi)X = \min h(x_i), then

E(X)=1number of distinct elements+1\mathbb{E}(X) = \frac{1}{\text{number of distinct elements} + 1}

Proof: Let tt = number of distinct elements.

A key observation: repeated elements always hash to the same value, so they don’t produce new hash values. Even if an element appears hundreds of times in the stream, it contributes exactly one hash value to the pool. This means across the entire stream, you compute exactly tt distinct hash values — one for each distinct element — which is why the product below runs to tt rather than mm.

We want to compute E(X)\mathbb{E}(X). One useful formula for the expectation of a non-negative random variable is E(X)=Pr(X>x)dx\mathbb{E}(X) = \int \Pr(X > x)\,dx, which is the continuous analog of the discrete formula E(X)=k0Pr(X>k)\mathbb{E}(X) = \sum_{k \geq 0} \Pr(X > k).

So it suffices to compute Pr(X>x)=Pr(minh(xi)>x)\Pr(X > x) = \Pr(\min h(x_i) > x).

If the minimum of all hash values is greater than xx, that means every hash value is greater than xx:

Pr(X>x)=Pr(minh(xi)>x)=i=1tPr(h(xi)>x)\Pr(X > x) = \Pr(\min h(x_i) > x) = \prod_{i=1}^{t} \Pr(h(x_i) > x)

The hash values are independent (each element’s hash is chosen independently), so the probability that all are above xx is the product of the individual probabilities. For any one uniform [0,1][0,1] random variable, Pr(h(xi)>x)=1x\Pr(h(x_i) > x) = 1 - x (since the length of the interval to the right of xx is 1x1 - x). Multiplying (1x)(1-x) by itself tt times gives Pr(X>x)=(1x)t\Pr(X > x) = (1-x)^t. Substituting into E(X)=Pr(X>x)dx\mathbb{E}(X) = \int \Pr(X > x)\,dx:

E(X)=01(1x)tdx=(1x)t+1t+101=1t+1\mathbb{E}(X) = \int_0^1 (1-x)^t\,dx = \frac{(1-x)^{t+1}}{t+1}\Bigg|_0^1 = \frac{1}{t+1}

Therefore E(X=minh(xi))=1t+1\mathbb{E}(X = \min h(x_i)) = \dfrac{1}{t+1}.

We can also substitute this expected value for minh(xi)\min h(x_i) in the previously mentioned output, which will return the number of distinct elements:

1x1    11/(t+1)1=t\frac{1}{x} - 1 \;\to\; \frac{1}{1/(t+1)} - 1 = t

The algorithm is called “ideal” because the hash function it requires — one that maps to the continuous interval [0,1][0,1] — cannot actually be implemented. Storing a real number with full precision would require infinitely many bits, which defeats the purpose of a space-efficient streaming algorithm. The Flajolet-Martin algorithm described next resolves this by hashing to integers and working with their bit representations instead.

Here is an overview of the algorithm:

  • Choose a hash function hh that maps elements to integers from 00 to n1n - 1.
  • For each element xix_i in the stream:
    • Compute h(xi)h(x_i)
    • Represent h(xi)h(x_i) in binary notation
    • Find the position PP of the least significant bit in this bit vector
    • Keep track of the maximum PP encountered so far
  • After all elements are processed, output 2P+12^{P+1}

The Least Significant Bit (LSB) position, as used here, refers to the position of the rightmost 1-bit — equivalently, the number of trailing zeros in the binary representation. (In standard terminology this is sometimes called the “lowest set bit” position.)

Example. Consider the bit vector 1 0 0 0 1 0 0 01\ 0\ 0\ 0\ 1\ 0\ 0\ 0: reading positions from the right starting at 0, the rightmost 1 is at position 3, so the LSB position is 3.

The counter PP tracks the largest LSB position seen across all elements so far in the stream.

Since a hash function is chosen randomly, we can think of each hash value as a random bit vector. It follows that each bit can be a 0 or 1 with equal probability, like a coin flip.

If the stream has tt distinct elements, then we compute tt hash values in total (repeated elements hash to the same value).

We want to find a pattern in the position of the LSB for these tt values. Since each bit in a random hash value is independently 0 or 1 with equal probability, we can count how many elements land at each LSB position:

  • About t/2t/2 of them end in 11 (last bit is 1) — LSB exactly at position 0.
  • About t/4t/4 of them end in 1 01\ 0 (second-to-last bit is 1, last bit is 0) — LSB exactly at position 1.
  • About t/8t/8 of them end in 1 0 01\ 0\ 0 — LSB exactly at position 2.
  • In general, about t/2P+1t/2^{P+1} elements have their LSB exactly at position PP.

Each position is half as common as the one before it: the probability of ending in exactly PP zeros followed by a 1 is (1/2)P+1(1/2)^{P+1}.

As PP increases, the expected number of elements with LSB exactly at position PP gets smaller and smaller. There is a transition point where this expected count crosses 1:

  • For positions well below the transition, many elements land there, so the running maximum will certainly rise at least that high.
  • For positions well above the transition, essentially no elements land there, so the maximum won’t reach that far.

The maximum therefore concentrates right near the transition — the largest PP where we still expect at least one element. Setting t/2P+11t / 2^{P+1} \approx 1 and solving gives t2P+1t \approx 2^{P+1}, which is exactly the output of the algorithm.

It should be noted that the algorithm has a lot of variance. It is not exactly reliable in that random hashing can produce inconsistent results. For example, our output is always a power of 2, so if the true answer is 6, the algorithm can never output that exactly.

The output of the algorithm is between t/32t/32 and 32t32 \cdot t:

t32Output32t\frac{t}{32} \leq \text{Output} \leq 32 \cdot t

This is a factor-32 approximation: we can ensure the output is never more than a factor of 32 away from the true answer. The proof of this guarantee is deferred.

We want to get from a factor-32 approximation down to a (1±ε)(1 \pm \varepsilon) relative error approximation. The idea is to run many copies of the algorithm at different granularity levels, and then ask the appropriate copy for the answer.