Welcome everyone, Algorithms for Big Data.

Today should hopefully be the last lecture
on Streaming Algorithms.

This is the overview of the course so far.

Okay,

so I'm trying to find the pointer but I
can't get the pointer.

All right, so this is the course so far.

I will maybe a red.

All right, so when I do this, right, so
that's my pointer and then I will undo it.

Right, so I don't have the the laser
pointer for some reason because I'm using

the the cable it won't let me have the the
laser pointer.

So overview of the course so far.

We saw the dictionary problem as on the
left you see, we saw four algorithms for it.

Then we saw the approximate membership
problem for which we saw the bloom filter.

On the right side here are
all the probability related stuff

that we that we learned while
understanding these things.

Then we moved on to streaming algorithms
where we saw the problems of sampling,

counting, approximate median, heavy hitters,
for which we use the count and sketch.

In the class before last
and in last class we finished

algorithms for the
frequency moment estimation.

That was the AMS sampling alone Matthias
Zageddi.

And for this we also used Chernoff for non
Bernoulli, right, so when the random

variables take not just zero one values
but values in the larger range.

And I had briefly started the topic of
counting distinct elements last time.

And the hope is to finish this problem
today, counting distinct elements.

And that will be the end of this streaming
algorithm section of this course.

Right.

So let's move on to where we were about
distinct elements.

Here.

Alright, so what is the problem?

Hopefully everyone remembers.

The problem is you'll be, there'll be a
stream.

And the stream will have repetitions
typically, right.

So there'll be numbers in the stream,
they might be repeated.

Any number that appears in the stream will
be some number between one and n.

But they could be repeated and not all
numbers from one through n will appear.

And your job is to count how many distinct
elements have appeared in the stream.

Is that the problem statement is clear?

Right, just how many distinct elements
have appeared in the stream?

If you could store the stream,
this is a triple problem, right?

You just, it's like remove, removing
duplicates.

But here you just have to count.

I'm not saying store the set of all
distinct elements.

I'm just asking you to count how many
distinct elements, uh, there have been.

Right?

Uh, practical applications of this problem,
uh, we actually just submitted a paper.

Uh, imagine you are working
for Verizon or something,

you have control of, you
know, cell phone towers.

And people keep moving in and out of your
cell phone tower range.

And someone may live there, so they'll be
there all the time.

And you just want to count
how many different distinct

people have ever come in the
range of this cell phone tower.

Right?

Just to find the load.

So you just want to count how many different,
uh, elements there are in the stream.

And this is a measure of how many
different users have either been in an

area at some point of time, or have used
some service at some point of time.

Right?

To just find out how popular something is.

Basically the same user going again and
again will not be able to fool,

uh, this algorithm.

Right?

Any questions about the problem statement
or motivation?

If not, uh, first here is an ideal
algorithm for, uh, this problem.

And remember, when I say we have a hash
function, the hash function takes in an,

uh, input item and outputs a number
somewhere.

And it's a random hash function in the
sense that the output for any input value

will be chosen randomly among all possible
outputs.

But if you ask the same value
again, the same input, if you

apply it to the hash function,
you'll get the same output.

Does this make sense?

Like if I apply the same input to a hash
function H, it will give me the same value.

The hash values won't change for the same
input.

But when I say the hash function is
random, what it means is that for any,

any given input, its
output value is equally

likely to be any of the
possible output values.

But that doesn't mean
that every time I'm feeding

this input, you're choosing
a random value, right?

That random value is fixed for the input
once and for all.

Like H of Xi does not change during the
algorithm.

Is this point clear or should I repeat it?

Okay, so hopefully it's clear.

So here's the ideal algorithm.

Imagine that you had a hash function
that hashed to the unit interval 0, 1.

So this is the interval, the red interval
of all numbers between 0 and 1.

So in other words, what is this hash
function doing?

This hash function H, you can feed it any
number between 1 and N, that is its domain.

And where will it map this number to?

It will map this number to some random
point between 0 and 1.

Right?

So think of it as throwing a dart randomly
between the interval 0 and 1.

And now the algorithm is the following.

Once you have such a hash function,
when your stream appears, you hash each Xi.

So you hash each element of the stream.

And you'll get some value H of Xi between
0 and 1.

But you don't remember all the hashes of
all the elements in the stream.

All you maintain is the smallest hash
value that you have seen so far.

Right?

So one by one, the items appear in your
stream, go away.

All you have to do is hash each item.

And if it's hash value is smaller than the
smallest hash value you have so far,

then you replace the smallest hash value
with the hash value of this item.

Otherwise, you keep it the same.

And then at any point in the stream,
if someone asks you, hey, what is the

number of distinct elements in the stream
so far?

You output this number.

What is this number?

You first invert, what?

You do one over the smallest hash value.

And then you subtract one from the answer.

And that's your output.

Does this algorithm make sense first of
all?

If I gave you such a
hash function, you would

you be able to implement
such an algorithm?

You just hash every element in the stream
and remember the smallest hash.

When asked for the answer, invert the
smallest hash value, right?

All these hash values are between zero and
one.

So when you do one divided by that, you
will get some number larger than one, right?

So then subtracting one makes sense,
right?

Because the number you're subtracting one
form is always larger than one.

Are these facts clear?

Yes.

Okay, good.

So now why is this ideal algorithm
correct?

Okay, so for that last
class, we had started

quickly about this
uniform random variable.

So that's the first and only continuous
random variable that we will need.

What is a continuous random variable?

Instead of taking a discrete set of values,
it takes values in a continuum, right?

It can take any possible value.

But because the total probability has to
be one for a continuous random variable,

if you allow it to take
a continuum of values,

well, there's an infinitely
many such values.

So then for every value, the probability
will be zero, because there is no way you

can have an infinite numbers adding up to
one.

All of them will, most of them will have
to be zero.

So how do we define
what is a continuous,

how is a continuous
random variable defined?

You can define it at two ways,
either in terms of its density,

or in terms of its distribution.

And the density of the uniform random
variable is just this, fx is equal to one,

for all x between zero and one.

So the density is defined by a function of
one variable.

And for uniform random variable,
this is the density, this function,

which uses the constant function between
zero and one, and is zero everywhere else.

Right?

That's the density, small f.

And the distribution capital
F is... So before that, think

of the density as the probability
that your random variable

x lies between the small value x and like
an x plus dx.

Does this make sense what I've written?

This is the probability that
your random variable capital X

lies in a small segment of
length dx at the value small x.

That's basically what fx, small fx is,
the density.

And here, because this is a uniform random
variable, this is basically one.

Right?

For every value x, the
probability that capital

X is basically small x
is equal for all values.

Does this make sense?

I mean, ideally, you should divide it by
dx, right?

Because this will be dx, and then you
divide by dx.

But division by dx is stretching our
intuitive notation about dx a bit too far.

Right?

If someone who knows calculus seriously
sees anyone dividing by dx, they will faint.

So we don't do these things.

This is just for your intuition,
what small f is for.

And then what is the distribution capital
F?

That is defined as this way.

It is the probability that your random
variable is, at most, small x.

Right?

So this is the distribution of your random
variable, capital X.

And this is a function of small x.

And in this case, this function is pretty
simple.

This function is 0, if x is negative.

It is small x, if x is between 0 and 1.

And it is 1, if x is greater than 1.

All right?

Any questions about how
we define the density or

the distribution of a
uniform random variable?

Okay.

If not, then in last class,
we had also looked at how

to compute the expectation
of a uniform random variable.

And the expectation of a continuous random
variable.

It has two formulas.

This is the most well used formula.

You take the density small f,
you multiply it by small x, and

you integrate over all possible
values of the random variable.

In our case, for the uniform random
variable, small f is 1.

When I multiply 1 by x, I get x.

And my domain is ranging from 0 to 1.

So I integrate small x from 0 to 1,
I get x squared over 2, put in the limits,

and I get half.

Which is not surprising, because what is
the uniform random variable between 0 and 1?

Between 0 and 1, you're selecting one
point uniformly at random.

So what is the expected value of your
point?

It is half just by symmetry.

Right?

But in general, if you have a more
complicated small f, you use the same

formula to compute the expectation of a
random variable, if there are other

continuous random variables you are
interested in.

Good.

Now, this was about one uniform random
variable.

Let us say you have
two independently uniform

random variables x1 and
x2, both uniform on 0, 1.

In other words, you are throwing two dots
in the interval 0, 1.

What we can show is that the expected
value of the smaller of the two dots is

one third and the expected value of the
larger of the two dots is two thirds.

Does that make sense or no?

Yes, because we divided the three parts.

Yes, right?

So that's the rough intuition.

And now, okay, so just
based on this fact, do people

see why this algorithm in
the previous page is correct?

Why it would give the number of distinct
elements?

If, if the stream, let's say the stream
only had two distinct elements, right?

So the stream was just like 1, 8,
1, 8, 1, 8, 1, 8, 1, 8, forever, right?

So there's only two distinct elements,
one and eight, and they're just repeated.

Then how many hash values will you calculate
throughout the course of the stream?

Two hash value.

Two hash value.

And you will keep the minimum of this,
the minimum of the two hash values, right?

And what is the expected minimum hash
value?

One third.

One third.

And so what will you output?

You will do one divided by one third,
which is three, and you will subtract one,

and you will output two, which is the
number of distinct elements.

Right?

If you had 10 distinct elements, then your
expected minimum would be one over 11.

You would invert the one over 11.

So you do one divided by one over 11.

That will be 11.

And you subtract one to give the 10.

Is the reason behind this minus one
becoming clear now?

Or is it still mysterious?

Why, why I'm subtracting one after
inverting the minimum hash value?

Okay.

So hopefully it is clear why we would
subtract one.

Can you explain quickly about why it is
like one third and two third?

Is it always like, if it is two number,
is it like that?

Or is it like four variable?

It will be like, um, like one fifth, two
fifths, three fifths, something like that?

So the minimum would be one fifth,
and the maximum would be four fifths.

Four fifths?

Yes.

The second highest
would be two fifths, the

third highest would be
three fifths, and so on.

Yeah.

So, so I, I will hopefully prove it for
you.

Uh, but, uh, intuitively if you throw two
darts, right?

You, like in this case, you divide your
interval into three pieces.

If you threw five darts, you would divide
the interval into six pieces, and the

minimum would be like
one over sixth, and the

largest expected value
would be five over sixth.

But we'll see why.

I think we can prove this pretty easily.

But assuming this fact is true,
that if you throw eight darts,

then the expected minimum is one over
nine, uh, this algorithm becomes obvious,

right?

Why this algorithm is doing the right
thing.

So, uh, so, uh, here's what we have,
right?

So, uh, let's say, so we are outputting
this one over the minimum of, uh, h of xi.

So, we are inverting the smallest hash
value that we see.

Subtracting one, subtracting one,
and that's our output.

And now here is the claim.

That in expectation, uh, the, the smallest
hash value.

So, so let us say, I denote this
denominator by capital X.

So, let's say capital X is the smallest
hash value.

Okay.

This, our algorithm is keeping track of
this capital X, right?

It's the smallest hash value we've seen so
far.

So, this claim says that in expectation,
the value of the expected value of capital

X, because capital X is a random variable,
right?

It's a hash.

So, the expectation of
capital X is indeed one

over the number of
distinct elements plus one.

Is this so, like, if you threw three
points, then this would be one over four.

Is this fact, is the statement of the
claim clear?

Yes.

And then, if the statement
of the claim is clear, then it

is clear why this algorithm
is doing the right thing, right?

Because it's inverting X, thereby arriving
at the denominator, and then subtracting

one to give the number of distinct
elements, right?

All right.

So, now, let's go into the proof of this.

Why is it what I said, right?

If you throw so many points at that.

So, let's say T is the number of distinct
elements.

And we will, we will calculate.

So, the idea is to calculate the
expectation.

And now, I will, first
of all, I just gave you a

formula for the expectation,
which was this, right?

For a continuous random variable,
I said the expectation of, it can be

computed as this, multiply the density by
small x and do this.

But there is another formula, which is
this.

So, the expectation of a random variable
can also be computed by integrating the

probability that capital X is greater than
small x.

Is this clear?

This actually was there also for the...
So, if some of you tried your exercises in

the textbook, this was also the case for
discrete random variables.

So, remember, for
discrete random variables,

the definition of
expectation was this, right?

That you multiply small
x with the probability

with which capital X
takes the value small x.

But you could also have done this.

That's also another
formula for computing the

expectation of a
discrete random variable.

And if you believe these two formulae for
the discrete random variable, then their

continuous analog simply
replace the summation by an

integral, which is what
it is on the left-hand side.

Okay?

Is this formula clear for expectation?

Okay?

So, if it is clear, then it's
in order to compute the...

So, the claim is about the
expectation of someone, right?

The expectation of capital X.

And the plan will be to
use this newly learned

formula for expectation
in order to compute it.

And so, what we really need to do is find
this probability.

Right?

So, now, that's happening here.

So, you have some value small x.

So, think of small x as some value in the
interval between 0 and 1.

And I'm asking you, what
is the probability that your

random variable capital
X is greater than small x?

And remember, what is capital X?

It is the smallest hash value you have
seen so far.

So, the smallest hash value that I have
seen so far is greater than small x.

And if I know that the
smallest is greater than

small x, what does it
tell me about the others?

Oh, Hanan, if you're speaking,
you're muted.

Do you want to look at it?

Because this is the smallest, you can do 1
minus the max?

Like, if you want to do that?

Because, technically, if you want to get
the max, isn't it?

Because the only... I'm keeping
track of the smallest hash value.

Okay.

Right?

And all I'm asking is, if I tell you that
the smallest hash value is greater than x,

meaning the smallest hash value is to the
right of this line...

It means that every value has to be also
to the right of that line.

It means that all of them have to be to
the right of that line.

Right?

And now, notice that all the hash values
are independent.

Right?

My h of xi is independent
of... I mean, the x1 and x2...

h of... the hash of x1 and the
hash of x2 are independent.

The hash function is...
and it's... it's choosing

independently a random value
between 0 and 1 for every input.

So the probability that all of them are to
the right of that line is actually,

by independence, the product
of the probability that each

one of them individually
is to the right of this line.

That's just by independence.

Does this make sense?

Okay.

And now, what is the probability that any
one of them is to the right of that line?

What's the probability that
when you choose a random

number between 0 and 1,
it is to the right of this line?

1 minus x.

1 minus x.

Right?

The whole interval is 1.

Now, what is the length of that interval?

It's 1 minus x.

So each of these things...

is 1 minus x.

And I'm multiplying them...

t times.

And that's the formula
for... that's what I

will plug in, in this
formula for expectation.

Before expectation...

So I get that.

The first line clear?

Right?

In the formula for expectation... I had to
integrate...

the probability that capital
X is greater than small X and

the probability that capital
X is greater than small X...

This is the probability that the minimum
hash value is greater than small x.

But if t is the number of distinct hash
values, the number of distinct elements,

then I will multiply this thing t times.

Is it clear why I'm
multiplying it t times and

not m times or the length
of stream many times?

Is this fact clear?

Even though I'm computing the hash every
time on the stream, you know, has a new

element, if the element is repeated,
then I then nothing changes, right?

Because it's hash value I have seen
before.

So a repeated element will not give me a
new hash value.

It's only a new element that will give me
a new hash value.

That's why this product
is from 1 to t, where

t is the number of
distinct elements, right?

Hopefully that fact is clear.

That's how t is coming into the picture,
right?

The number of
distinct elements, okay?

And then after this, it is basic calculus,
right?

If you believe the first line,
then I'm integrating 1 minus x to the t.

Again, t is a fixed number, the integrals
over x.

And when you integrate
1 minus x to the t, you

get 1 minus x to the
t plus 1 over t plus 1.

And it's a definite integral evaluated at
0 and 1.

At 0 the value is, oh, at 1 the value is
0, because 1 minus x, right?

In the numerator.

So when you put x equal to 1, the 1 minus
x will become a 0.

And at 0, you will get 1 over t plus 1.

And just take a look at this and convince
yourself that this is correct.

So is that the calculus part clear?

So we calculate that the expectation of
capital X is indeed 1 over t plus 1.

And that was a claim, right?

It's 1 over the number of distinct
elements plus 1.

So when you do the 1 over
x, and you subtract 1, you

are indeed returning the
number of distinct elements.

So, Yator, does that answer your question?

We did end up proving essentially,
right?

If t is 7, then the expected value is 1
over 8, right?

If you throw 7 dots, then the minimum is
that.

So here

is a, if you have been
following it so far, here is a

midterm question for you,
here is a final question for you.

You were implementing this algorithm,
but your friend made a mistake.

And instead of remembering
the minimum hash value,

they are remembering
the maximum hash value.

Is the question clear?

Instead of remembering the minimum hash
value, they are remembering their code,

instead of a min, they put a max by
mistake.

So they're actually keeping track of that.

Okay, is your friend doomed?

Or can you change the output to something
that gives you the right answer?

Not doomed.

Not doomed.

Good, not doomed.

And you can guess what the right answer is
from this fact that I told you.

So what should be the output if they're
keeping track of X prime?

1 minus max, to look at you me.

Now your output should just be in terms of
X prime.

Right, so previously the output was 1 over
capital X minus 1, right?

That's what the algorithm was outputting,
right?

1 over the minimum hash value minus 1.

Now, instead of the minimum,
if your friend remembers

the maximum, that's X
prime, what do you output?

1 over 1 minus X prime minus 1?

Yeah, I think that should work.

Good.

Because the expected value of X prime is T
over T plus 1.

So what Yertrude said was, if you do 1
minus X prime, you get 1 over T plus 1.

And then, so what Yertrude said was,
output 1 over 1 minus X prime minus 1,

because that would be T, right?

1 over T plus 1 minus 1.

And, how would you prove something like
this?

How would you prove that the, I mean,
I gave you this example, for two,

if I throw two dots, then
the maximum expected,

then the expectation of
the maximum is two thirds.

But now in this proof, how would this
proof change?

So, I would still try to use this formula
for expectation, right?

And now in this line, this is not the min
now, right?

This is the max.

So, now how do I
compute the probability that

the maximum hash
value is greater than X?

Is the same formula true?

Is it true that if the maximum
hash is to the right of this

line, then all the hashes
are to the right of this line?

No.

No.

So, we are kind of stuck.

How do we go from this line to the,
to this line?

That's no longer true.

But what can we do?

So, Hanan mentioned something when I asked
last time.

So, the probability that's a random
variable is greater than small x is one

minus the probability that it is less than
small x, right?

Does that make sense?

And so, if I do one minus the probability,
I mean, I'm, I'm oscillating between

leaving it as an exercise for you guys or
doing it.

Do people see, okay, is this hint clear?

That instead of this
probability, if I replace

this by less than and
do one minus, right?

So, this property is one
minus the probability that

the maximum hash value
is less than small x, right?

And now, if I have the same picture,
small x, I have a line, and I tell you

that the maximum hash
value is to the left of this line,

what does that mean about
all the other hash values?

That they're also to the left of this
line, right?

Because the largest is to the left of this
line.

And now you can do this independence
business.

Is this making sense?

So, you would again get 1 minus x.

Because you did 1 minus,
and now the probability

that something is to
the left of this line is x.

So, you would again get this 1 minus x to
the t.

Except something should change.

And let's see what change is.

1 minus x to the t.

Only x to the t itself, right?

Yeah.

Good, good.

So, now you guys can finish the exercise,
right?

So, yeah.

This would be a...

This would be what you
do if instead of the minimum

hash value, you somehow
use the maximum hash value.

And maybe in some paper that you read,
people might remember the maximum hash value.

That's totally possible.

But then their output would be changed
accordingly.

Okay.

So, hopefully this algorithm is clear to
everyone.

What is the problem with this algorithm?

Why do I call it the ideal algorithm?

I mean, it's... Ideal
doesn't mean... It's

ideal in the sense
it's too good to be true.

Why is that?

It's not too good to be true.

For each x i in between 0 and 1,
we have infinite number of x i.

You mean the hash value
is... We need... We need

infinite precision for
this algorithm, right?

Because we have to remember exactly where
the dart landed.

To compute the hash, we may have an
infinite precision.

Basically, this hash function doesn't
exist.

Right?

Because there's no
hash function or it takes

infinite precision, if
you want to make it.

And... But the whole point of streaming
algorithm is to save space, right?

If we had infinite space, we could have
stored the stream.

Right?

So, we don't want to
use a hash function that

uses infinite space
in order to save space.

Right?

But this gives us the main idea of...

the algorithm by Flagelli and Martin,
who invented this hyperlog.

So...

So, imagine now, instead of the hash
function that we had from, you know, 1 to.

..

to 0 to... to the interval 0, 1.

Let's say now, you hash...

to, like, n buckets.

Bucket number 0, bucket number 0,
ta-ta-ta, bucket number n minus 1.

Right?

So, that's where you hash your keys to.

And... So, everyone knows how
to write a number into bits, right?

What is the bit expansion of the number 7?

Do I know this?

1, 1, 1.

Yeah, I guess so.

Yeah, I was
thinking... Okay, yeah.

Yes, 1, 1, 1.

And then, for 8, it will be 1,
0, 0, 0, I guess.

100, yeah.

Yes, okay.

Ah, okay, okay.

You start from 0.

This always throws me off.

See, computer scientists always start from
0.

Right?

So, that's always a...

So...

So, I'm trying to decide
between to give you the

algorithm first, or to
give you the intuition.

Let me give you...

the algorithm, and then I'll give you a
minute to see if...

why this is correct, right?

So, the algorithm, I will say,
is the following.

You pick a hash function that takes your
numbers, you know, whatever your...

these Xi's are coming from.

And it hashes them to...

this, right?

And when Xi appears, we compute.

..

the hash of Xi.

And this is some number... I mean,
not... By number, I mean like an integer.

It's not a real value anymore,
right?

It's a...

It's an integer from 0 to n minus 1.

We write...

the bit representation...

of... this H of Xi.

Right?

So, we write it as bits.

So, I don't know, 1, 0, 0, 0, 1,
0.

For example, this.

Right?

And now, do people know what the least
significant bit of this bit vector is?

When I say the least significant bit,
I mean this one.

The last bit to the right, which is a 1.

Does this make sense?

So, I will count the position.

So, let's see.

So, since we start from 0, from now on,
this will be position 0 for me.

This is position 1 for me.

This is position 2.

And this is position 3.

Right?

So, I will say that for this bit vector,
its least significant bit is at position 3.

Does this make sense?

Right?

This is just a definition
of what I mean when I

say the least significant
bit of a bit vector.

So, for this bit vector, the least
significant bit is at position 3.

So, we write the bit
representation, and then we find

the least significant bit
of this bit vector, right?

Right?

H of Xi.

And we keep track of the largest

So, do you understand what I mean by the
largest value of least significant bit?

So, in other words, if the next time a bit
vector, if this bit vector appears,

so let's say this bit vector had appeared,
right?

And its least significant bit was at
position 3.

Then another element
comes, I hash it, I write the

bit vector, and I get this
bit vector, will I change?

Will my counter change?

The counter that is keeping track of the
largest value of the least significant bit?

What's the least significant bit of this?

What's the position of the least
significant bit of this bit vector?

One.

One.

Is it larger than the previous position?

No.

So, this doesn't do anything, right?

And then later on, after some time,
maybe this vector appears.

And now my least significant bit would
change, right?

The position would change, right?

So, now I would update the 3 to a 4.

Is it making sense to everyone?

Yes.

Good.

So, let me call this value p.

So, p is the largest least significant bit
position.

So far.

Okay.

And now, what should be my output?

2 to the least significant bit?

Close.

Plus one.

Because you guys start, computer
scientists start from zero.

Or, or maybe not.

Let me see.

So, I claim output this.

Yeah, I think this is right.

If I claim output 2 to the p plus one.

So, why did whoever said 2 to the p,
why did you say that?

Like, every time, you're
basically doing like a

bunch of coin flips until
you get heads, kind of.

Like, the first digit on the right is like
coin flip, heads or tails.

And there's a zero, so it's tails.

I kind of felt like we were back to this
binary, like, yes or thing.

And like, the odds that like the least
significant bit was in the fifth position

is really one half times one half,
times one half, times one half.

Like, I don't know.

It just felt like we're back to the binary
thing.

And the only way to get there to be some
number that would represent the actual

amount of coin flips felt like we had to,
you had to make it to the, to the,

raise it to the power.

Right.

So the intuition is, is not too off.

So imagine you had t distinct elements as
before, right?

So whatever these,
your stream is of length m,

but really there are
only t distinct elements.

So you will compute t hashes only, right,
throughout the course of this algorithm.

And the hash values are random between
zero and n minus one, right?

So think of their bit representations also
as random.

So if I ask you out of
these t values that you

calculated, how many
of them will end in a one?

And how many of them will end in a zero?

What would you say?

Out of the t values that I calculate,
what's the expected number of values that

when I write the bit vector, they will end
in a one?

You mean one zero?

No, no.

And the last bit is a one.

How many out of these t values do you
expect?

T over two.

T over two.

T over two.

T over two.

And last bit is a zero.

Also t over two, right?

It's a random thing, right?

Half of them should end in a one,
half of them should end in a zero.

Does this make sense?

Because the bit representation is sort of
a random bit representation.

So if I, if I have t hashes, and all those
t hashes are basically random bit vectors,

then half of them should end in a zero,
half of them should end in a one.

Okay, how many of them should end in two
zeros?

How many of them do I expect will end in
zero zero?

T over four.

T over four.

Good.

How many of them will end in three zeros?

T over eight, right?

How many of them will end in four zeros?

T over 16.

You guys get the idea.

So if this position is the jth position,
or actually the pth position, right?

Then, and again, now position,
because I'm counting from zero,

not counting from one, okay?

So if this is the pth position,
from the right, counting from zero,

then what is the expected number of things
you would see?

T divided by what?

P plus one.

P plus one, or?

This is p.

Uh, no.

So when p is zero, the answer was t over
two, right?

So p is the index of the position.

T over two to the p?

P. If this was true, then
when p is zero, when I'm at

index zero, you said the
answer was t over two, right?

T over two to the p plus one?

Correct.

Right?

This is the thing about starting from zero
or starting from one, right?

Because when I've been
p is equal to zero, I'm at the

first position, and you said
the answer was this, right?

Okay.

So now in the course of this algorithm,
right?

The rightmost place that I will see a
least significant, sorry, the leftmost

place I will see a least
significant bit would be

like, like only very few
elements appear here, right?

Like, like, this should be just about one.

Does it make sense?

That there should be at least one element
that gives me this position p,

and there shouldn't be too many elements,
right?

Because if there were too many of them,
then half of them would actually have

given me a larger least significant bit,
or something like this.

And because this is the...

I expect the value of p to satisfy this,
you just multiply by 2 to the p plus 1,

and that's why I'm outputting that.

Right?

This means that t is roughly 2 to the p
plus 1.

So, that's the rough idea.

Does this make sense?

So, I gave you the algorithm and...

at least some of you saw the...

saw the idea.

So, I...

I don't think I will have time to give you
all the details.

But this is basically...

So, what you will show is that this
algorithm that I just gave you,

it doesn't give the right answer at all,
because it will have a lot of variance.

I mean, we're talking about how many
elements, right?

Have this least significant bit at this
position.

So, if this algorithm is
clear, what we will first prove

is that this algorithm,
which we call A1, gives a...

it gives a pretty lame guarantee.

And the lame guarantee is that its output
will be sandwiched between 32 times the

number of distinct elements and the number
of distinct elements divided by 32.

So, this is what we call in algorithms an
approximation algorithm factor 32.

So, its output will be between 32 times
the output and the output divided by 32.

I mean, the true answer divided by 32.

Does this guarantee make sense,
first of all?

Just the statement of the guarantee,
not how we prove it.

Is the statement making sense?

Or no?

Yes, no.

So again, the algorithm that I just told
you, it makes sense.

But now we talk about what is the
accuracy?

Will it actually always output the value T,
which is the number of distinct elements?

Even you see that that's too much to hope
for, right?

I mean, you're randomly hashing and
anything can happen.

In fact, T may not even be a power of 2,
right?

This algorithm, its output is always a
power of 2.

What if my number of distinct elements was
6?

Then this algorithm will never output 6,
right?

Because it outputs powers of 2.

So clearly, this algorithm cannot give me
the right answer all the time.

How good is an answer it is giving?

First, we will prove
that its answer cannot be

more than a factor 32
away from the true answer.

Okay?

And now you will say, well, this factor 32
sucks, right?

Factor 32 is a pretty big factor to be
away from the true answer by.

Then, what we will do is,
we will run many, we will run

different copies of this
algorithm, but in a different way.

So this was the proof of the factor 32,
which I will try to get to next time.

But then we will run
many different copies of the

algorithm, at different
sort of granularity levels.

And then we will ask an appropriate copy
for the answer.

So the big picture idea is first,
we will prove that the algorithm that I

showed you gives a 32 factor
approximation.

And then we will try
to get that 32 down to

the usual epsilon
relative error that we do.

All right?

So, looks like this may still take half a
lecture more.

And then, we should be done with streaming
algorithms.