Lecture 20 (04/22/2026) - Finish Counting Distinct Elements; Uniform RV; Introduce Flajolet-Martin | CSCI 328

Scribes: Malaika Khan and Saartaj Alam

Summary of the Lecture

The idealized algorithm for the counting distinct elements problem
Characteristics for uniform random variables and how it can be used in the distinct elements problem
Analysis of the Flajolet algorithm

Counting Distinct Elements

Problem: Given a stream $\{x_1, \ldots, x_m\}$ with $x_i \in \{1, \ldots, n\}$ , output ( $F_0$ ) how many distinct elements have appeared in the stream so far?

Real World Application: Imagine a cell tower with users connecting and disconnecting - how many people have connected to the cell tower?
We will first look at an idealized algorithm and then a practical algorithm!

Ideal Algorithm

Use a hash function $h : \{1, \ldots, n\} \to [0,1]$ where $[0,1]$ is the continuous interval.

When we say the hash function is “random,” we mean that for any given input, its output value is equally likely to be any point in $[0,1]$ — like throwing a dart at the interval at random. But the hash function is also consistent: if element 5 hashes to 0.31, it will always hash to 0.31 no matter how many times it appears in the stream. The randomness is in how the function was chosen at the start, not in how it evaluates each time.

We hash each $x_i$ to get $0 \leq h(x_i) \leq 1$
We only maintain the smallest hash value we got so far, i.e., $\min h(x_i)$
To get the number of distinct elements so far, we output $\dfrac{1}{\min h(x_i)} - 1$

To see why the output makes sense: suppose the stream contains only 2 distinct elements (say, 1 and 8 appearing repeatedly). Since repeated elements always hash to the same value, you only ever compute 2 distinct hash values throughout the entire stream. The expected minimum of 2 uniform $[0,1]$ variables is $\frac{1}{3}$ , so the expected output is $\frac{1}{1/3} - 1 = 3 - 1 = 2$ . More generally, for $t$ distinct elements, the expected minimum hash value is $\frac{1}{t+1}$ , giving an expected output of $\frac{1}{1/(t+1)} - 1 = t$ . The proof below establishes this.

To analyze this algorithm, we need to understand how the minimum of several uniform random variables behaves, since each hash value $h(x_i)$ acts like a uniform $[0,1]$ random variable. The following sections build up that understanding.

Continuous Random Variable

A continuous random variable is a random variable that takes values in a continuum.

It can take on any value in a specified interval, but because it can take so many values and its probability still needs to add up to 1, most of those values have a probability of 0.
We can define a continuous random variable based on density or distribution. Density is the probability that your random variable $X$ lies in the small interval $x \leq X \leq x + dx$ .

Uniform Random Variable

A uniform random variable is a continuous random variable where $X$ can take any value in $[0,1]$ , with every part of the interval equally likely. This equal-likelihood is what the density $f(x) = 1$ for all $x \in [0,1]$ encodes: no region is favored over any other of equal length.

Density: $f(x) = 1 \;\forall\; x \in [0,1]$
Distribution:

F(x) = \Pr(X \leq x) = \begin{cases} 0 & X \leq 0 \\ x & 0 \leq X \leq 1 \\ 1 & X > 1 \end{cases}

Expectation of Uniform Random Variables

The general formula is $\mathbb{E}(X) = \int x f(x)\,dx$ .

For our discussed uniform random variable, $\mathbb{E}(X) = \dfrac{1}{2}$
If there are two uniform random variables, or you threw two darts on the interval $[0,1]$ , then the expected value of the smaller dot is $1/3$ and the expected value of the larger dot is $2/3$ . These can be rewritten as $\mathbb{E}(\min(X_1, X_2)) = 1/3$ and $\mathbb{E}(\max(X_1, X_2)) = 2/3$ .

Proof that the Output is Correct

We showed the output was $\dfrac{1}{\min h(x_i)} - 1$ . We will claim that if $X = \min h(x_i)$ , then

\mathbb{E}(X) = \frac{1}{\text{number of distinct elements} + 1}

Proof: Let $t$ = number of distinct elements.

A key observation: repeated elements always hash to the same value, so they don’t produce new hash values. Even if an element appears hundreds of times in the stream, it contributes exactly one hash value to the pool. This means across the entire stream, you compute exactly $t$ distinct hash values — one for each distinct element — which is why the product below runs to $t$ rather than $m$ .

We want to compute $\mathbb{E}(X)$ . One useful formula for the expectation of a non-negative random variable is $\mathbb{E}(X) = \int \Pr(X > x)\,dx$ , which is the continuous analog of the discrete formula $\mathbb{E}(X) = \sum_{k \geq 0} \Pr(X > k)$ .

So it suffices to compute $\Pr(X > x) = \Pr(\min h(x_i) > x)$ .

If the minimum of all hash values is greater than $x$ , that means every hash value is greater than $x$ :

\Pr(X > x) = \Pr(\min h(x_i) > x) = \prod_{i=1}^{t} \Pr(h(x_i) > x)

The hash values are independent (each element’s hash is chosen independently), so the probability that all are above $x$ is the product of the individual probabilities. For any one uniform $[0,1]$ random variable, $\Pr(h(x_i) > x) = 1 - x$ (since the length of the interval to the right of $x$ is $1 - x$ ). Multiplying $(1-x)$ by itself $t$ times gives $\Pr(X > x) = (1-x)^t$ . Substituting into $\mathbb{E}(X) = \int \Pr(X > x)\,dx$ :

\mathbb{E}(X) = \int_0^1 (1-x)^t\,dx = \frac{(1-x)^{t+1}}{t+1}\Bigg|_0^1 = \frac{1}{t+1}

Therefore $\mathbb{E}(X = \min h(x_i)) = \dfrac{1}{t+1}$ .

We can also substitute this expected value for $\min h(x_i)$ in the previously mentioned output, which will return the number of distinct elements:

\frac{1}{x} - 1 \;\to\; \frac{1}{1/(t+1)} - 1 = t

Why “Ideal”?

The algorithm is called “ideal” because the hash function it requires — one that maps to the continuous interval $[0,1]$ — cannot actually be implemented. Storing a real number with full precision would require infinitely many bits, which defeats the purpose of a space-efficient streaming algorithm. The Flajolet-Martin algorithm described next resolves this by hashing to integers and working with their bit representations instead.

Flajolet Algorithm

The Algorithm

Here is an overview of the algorithm:

Choose a hash function $h$ that maps elements to integers from $0$ to $n - 1$ .
For each element $x_i$ $x_{i}$ in the stream:
- Compute $h(x_i)$
- Represent $h(x_i)$ in binary notation
- Find the position $P$ of the least significant bit in this bit vector
- Keep track of the maximum $P$ encountered so far
After all elements are processed, output $2^{P+1}$

The Least Significant Bit

The Least Significant Bit (LSB) position, as used here, refers to the position of the rightmost 1-bit — equivalently, the number of trailing zeros in the binary representation. (In standard terminology this is sometimes called the “lowest set bit” position.)

Example. Consider the bit vector $1\ 0\ 0\ 0\ 1\ 0\ 0\ 0$ : reading positions from the right starting at 0, the rightmost 1 is at position 3, so the LSB position is 3.

The counter $P$ tracks the largest LSB position seen across all elements so far in the stream.

The Output

Since a hash function is chosen randomly, we can think of each hash value as a random bit vector. It follows that each bit can be a 0 or 1 with equal probability, like a coin flip.

If the stream has $t$ distinct elements, then we compute $t$ hash values in total (repeated elements hash to the same value).

We want to find a pattern in the position of the LSB for these $t$ values. Since each bit in a random hash value is independently 0 or 1 with equal probability, we can count how many elements land at each LSB position:

About $t/2$ of them end in $1$ (last bit is 1) — LSB exactly at position 0.
About $t/4$ of them end in $1\ 0$ (second-to-last bit is 1, last bit is 0) — LSB exactly at position 1.
About $t/8$ of them end in $1\ 0\ 0$ — LSB exactly at position 2.
In general, about $t/2^{P+1}$ elements have their LSB exactly at position $P$ .

Each position is half as common as the one before it: the probability of ending in exactly $P$ zeros followed by a 1 is $(1/2)^{P+1}$ .

As $P$ increases, the expected number of elements with LSB exactly at position $P$ gets smaller and smaller. There is a transition point where this expected count crosses 1:

For positions well below the transition, many elements land there, so the running maximum will certainly rise at least that high.
For positions well above the transition, essentially no elements land there, so the maximum won’t reach that far.

The maximum therefore concentrates right near the transition — the largest $P$ where we still expect at least one element. Setting $t / 2^{P+1} \approx 1$ and solving gives $t \approx 2^{P+1}$ , which is exactly the output of the algorithm.

It should be noted that the algorithm has a lot of variance. It is not exactly reliable in that random hashing can produce inconsistent results. For example, our output is always a power of 2, so if the true answer is 6, the algorithm can never output that exactly.

Guarantees

The output of the algorithm is between $t/32$ and $32 \cdot t$ :

\frac{t}{32} \leq \text{Output} \leq 32 \cdot t

This is a factor-32 approximation: we can ensure the output is never more than a factor of 32 away from the true answer. The proof of this guarantee is deferred.

Improvements

We want to get from a factor-32 approximation down to a $(1 \pm \varepsilon)$ relative error approximation. The idea is to run many copies of the algorithm at different granularity levels, and then ask the appropriate copy for the answer.