Welcome everyone, Algorithms for Big Data. Today should hopefully be the last lecture on Streaming Algorithms. This is the overview of the course so far. Okay, so I'm trying to find the pointer but I can't get the pointer. All right, so this is the course so far. I will maybe a red. All right, so when I do this, right, so that's my pointer and then I will undo it. Right, so I don't have the the laser pointer for some reason because I'm using the the cable it won't let me have the the laser pointer. So overview of the course so far. We saw the dictionary problem as on the left you see, we saw four algorithms for it. Then we saw the approximate membership problem for which we saw the bloom filter. On the right side here are all the probability related stuff that we that we learned while understanding these things. Then we moved on to streaming algorithms where we saw the problems of sampling, counting, approximate median, heavy hitters, for which we use the count and sketch. In the class before last and in last class we finished algorithms for the frequency moment estimation. That was the AMS sampling alone Matthias Zageddi. And for this we also used Chernoff for non Bernoulli, right, so when the random variables take not just zero one values but values in the larger range. And I had briefly started the topic of counting distinct elements last time. And the hope is to finish this problem today, counting distinct elements. And that will be the end of this streaming algorithm section of this course. Right. So let's move on to where we were about distinct elements. Here. Alright, so what is the problem? Hopefully everyone remembers. The problem is you'll be, there'll be a stream. And the stream will have repetitions typically, right. So there'll be numbers in the stream, they might be repeated. Any number that appears in the stream will be some number between one and n. But they could be repeated and not all numbers from one through n will appear. And your job is to count how many distinct elements have appeared in the stream. Is that the problem statement is clear? Right, just how many distinct elements have appeared in the stream? If you could store the stream, this is a triple problem, right? You just, it's like remove, removing duplicates. But here you just have to count. I'm not saying store the set of all distinct elements. I'm just asking you to count how many distinct elements, uh, there have been. Right? Uh, practical applications of this problem, uh, we actually just submitted a paper. Uh, imagine you are working for Verizon or something, you have control of, you know, cell phone towers. And people keep moving in and out of your cell phone tower range. And someone may live there, so they'll be there all the time. And you just want to count how many different distinct people have ever come in the range of this cell phone tower. Right? Just to find the load. So you just want to count how many different, uh, elements there are in the stream. And this is a measure of how many different users have either been in an area at some point of time, or have used some service at some point of time. Right? To just find out how popular something is. Basically the same user going again and again will not be able to fool, uh, this algorithm. Right? Any questions about the problem statement or motivation? If not, uh, first here is an ideal algorithm for, uh, this problem. And remember, when I say we have a hash function, the hash function takes in an, uh, input item and outputs a number somewhere. And it's a random hash function in the sense that the output for any input value will be chosen randomly among all possible outputs. But if you ask the same value again, the same input, if you apply it to the hash function, you'll get the same output. Does this make sense? Like if I apply the same input to a hash function H, it will give me the same value. The hash values won't change for the same input. But when I say the hash function is random, what it means is that for any, any given input, its output value is equally likely to be any of the possible output values. But that doesn't mean that every time I'm feeding this input, you're choosing a random value, right? That random value is fixed for the input once and for all. Like H of Xi does not change during the algorithm. Is this point clear or should I repeat it? Okay, so hopefully it's clear. So here's the ideal algorithm. Imagine that you had a hash function that hashed to the unit interval 0, 1. So this is the interval, the red interval of all numbers between 0 and 1. So in other words, what is this hash function doing? This hash function H, you can feed it any number between 1 and N, that is its domain. And where will it map this number to? It will map this number to some random point between 0 and 1. Right? So think of it as throwing a dart randomly between the interval 0 and 1. And now the algorithm is the following. Once you have such a hash function, when your stream appears, you hash each Xi. So you hash each element of the stream. And you'll get some value H of Xi between 0 and 1. But you don't remember all the hashes of all the elements in the stream. All you maintain is the smallest hash value that you have seen so far. Right? So one by one, the items appear in your stream, go away. All you have to do is hash each item. And if it's hash value is smaller than the smallest hash value you have so far, then you replace the smallest hash value with the hash value of this item. Otherwise, you keep it the same. And then at any point in the stream, if someone asks you, hey, what is the number of distinct elements in the stream so far? You output this number. What is this number? You first invert, what? You do one over the smallest hash value. And then you subtract one from the answer. And that's your output. Does this algorithm make sense first of all? If I gave you such a hash function, you would you be able to implement such an algorithm? You just hash every element in the stream and remember the smallest hash. When asked for the answer, invert the smallest hash value, right? All these hash values are between zero and one. So when you do one divided by that, you will get some number larger than one, right? So then subtracting one makes sense, right? Because the number you're subtracting one form is always larger than one. Are these facts clear? Yes. Okay, good. So now why is this ideal algorithm correct? Okay, so for that last class, we had started quickly about this uniform random variable. So that's the first and only continuous random variable that we will need. What is a continuous random variable? Instead of taking a discrete set of values, it takes values in a continuum, right? It can take any possible value. But because the total probability has to be one for a continuous random variable, if you allow it to take a continuum of values, well, there's an infinitely many such values. So then for every value, the probability will be zero, because there is no way you can have an infinite numbers adding up to one. All of them will, most of them will have to be zero. So how do we define what is a continuous, how is a continuous random variable defined? You can define it at two ways, either in terms of its density, or in terms of its distribution. And the density of the uniform random variable is just this, fx is equal to one, for all x between zero and one. So the density is defined by a function of one variable. And for uniform random variable, this is the density, this function, which uses the constant function between zero and one, and is zero everywhere else. Right? That's the density, small f. And the distribution capital F is... So before that, think of the density as the probability that your random variable x lies between the small value x and like an x plus dx. Does this make sense what I've written? This is the probability that your random variable capital X lies in a small segment of length dx at the value small x. That's basically what fx, small fx is, the density. And here, because this is a uniform random variable, this is basically one. Right? For every value x, the probability that capital X is basically small x is equal for all values. Does this make sense? I mean, ideally, you should divide it by dx, right? Because this will be dx, and then you divide by dx. But division by dx is stretching our intuitive notation about dx a bit too far. Right? If someone who knows calculus seriously sees anyone dividing by dx, they will faint. So we don't do these things. This is just for your intuition, what small f is for. And then what is the distribution capital F? That is defined as this way. It is the probability that your random variable is, at most, small x. Right? So this is the distribution of your random variable, capital X. And this is a function of small x. And in this case, this function is pretty simple. This function is 0, if x is negative. It is small x, if x is between 0 and 1. And it is 1, if x is greater than 1. All right? Any questions about how we define the density or the distribution of a uniform random variable? Okay. If not, then in last class, we had also looked at how to compute the expectation of a uniform random variable. And the expectation of a continuous random variable. It has two formulas. This is the most well used formula. You take the density small f, you multiply it by small x, and you integrate over all possible values of the random variable. In our case, for the uniform random variable, small f is 1. When I multiply 1 by x, I get x. And my domain is ranging from 0 to 1. So I integrate small x from 0 to 1, I get x squared over 2, put in the limits, and I get half. Which is not surprising, because what is the uniform random variable between 0 and 1? Between 0 and 1, you're selecting one point uniformly at random. So what is the expected value of your point? It is half just by symmetry. Right? But in general, if you have a more complicated small f, you use the same formula to compute the expectation of a random variable, if there are other continuous random variables you are interested in. Good. Now, this was about one uniform random variable. Let us say you have two independently uniform random variables x1 and x2, both uniform on 0, 1. In other words, you are throwing two dots in the interval 0, 1. What we can show is that the expected value of the smaller of the two dots is one third and the expected value of the larger of the two dots is two thirds. Does that make sense or no? Yes, because we divided the three parts. Yes, right? So that's the rough intuition. And now, okay, so just based on this fact, do people see why this algorithm in the previous page is correct? Why it would give the number of distinct elements? If, if the stream, let's say the stream only had two distinct elements, right? So the stream was just like 1, 8, 1, 8, 1, 8, 1, 8, 1, 8, forever, right? So there's only two distinct elements, one and eight, and they're just repeated. Then how many hash values will you calculate throughout the course of the stream? Two hash value. Two hash value. And you will keep the minimum of this, the minimum of the two hash values, right? And what is the expected minimum hash value? One third. One third. And so what will you output? You will do one divided by one third, which is three, and you will subtract one, and you will output two, which is the number of distinct elements. Right? If you had 10 distinct elements, then your expected minimum would be one over 11. You would invert the one over 11. So you do one divided by one over 11. That will be 11. And you subtract one to give the 10. Is the reason behind this minus one becoming clear now? Or is it still mysterious? Why, why I'm subtracting one after inverting the minimum hash value? Okay. So hopefully it is clear why we would subtract one. Can you explain quickly about why it is like one third and two third? Is it always like, if it is two number, is it like that? Or is it like four variable? It will be like, um, like one fifth, two fifths, three fifths, something like that? So the minimum would be one fifth, and the maximum would be four fifths. Four fifths? Yes. The second highest would be two fifths, the third highest would be three fifths, and so on. Yeah. So, so I, I will hopefully prove it for you. Uh, but, uh, intuitively if you throw two darts, right? You, like in this case, you divide your interval into three pieces. If you threw five darts, you would divide the interval into six pieces, and the minimum would be like one over sixth, and the largest expected value would be five over sixth. But we'll see why. I think we can prove this pretty easily. But assuming this fact is true, that if you throw eight darts, then the expected minimum is one over nine, uh, this algorithm becomes obvious, right? Why this algorithm is doing the right thing. So, uh, so, uh, here's what we have, right? So, uh, let's say, so we are outputting this one over the minimum of, uh, h of xi. So, we are inverting the smallest hash value that we see. Subtracting one, subtracting one, and that's our output. And now here is the claim. That in expectation, uh, the, the smallest hash value. So, so let us say, I denote this denominator by capital X. So, let's say capital X is the smallest hash value. Okay. This, our algorithm is keeping track of this capital X, right? It's the smallest hash value we've seen so far. So, this claim says that in expectation, the value of the expected value of capital X, because capital X is a random variable, right? It's a hash. So, the expectation of capital X is indeed one over the number of distinct elements plus one. Is this so, like, if you threw three points, then this would be one over four. Is this fact, is the statement of the claim clear? Yes. And then, if the statement of the claim is clear, then it is clear why this algorithm is doing the right thing, right? Because it's inverting X, thereby arriving at the denominator, and then subtracting one to give the number of distinct elements, right? All right. So, now, let's go into the proof of this. Why is it what I said, right? If you throw so many points at that. So, let's say T is the number of distinct elements. And we will, we will calculate. So, the idea is to calculate the expectation. And now, I will, first of all, I just gave you a formula for the expectation, which was this, right? For a continuous random variable, I said the expectation of, it can be computed as this, multiply the density by small x and do this. But there is another formula, which is this. So, the expectation of a random variable can also be computed by integrating the probability that capital X is greater than small x. Is this clear? This actually was there also for the... So, if some of you tried your exercises in the textbook, this was also the case for discrete random variables. So, remember, for discrete random variables, the definition of expectation was this, right? That you multiply small x with the probability with which capital X takes the value small x. But you could also have done this. That's also another formula for computing the expectation of a discrete random variable. And if you believe these two formulae for the discrete random variable, then their continuous analog simply replace the summation by an integral, which is what it is on the left-hand side. Okay? Is this formula clear for expectation? Okay? So, if it is clear, then it's in order to compute the... So, the claim is about the expectation of someone, right? The expectation of capital X. And the plan will be to use this newly learned formula for expectation in order to compute it. And so, what we really need to do is find this probability. Right? So, now, that's happening here. So, you have some value small x. So, think of small x as some value in the interval between 0 and 1. And I'm asking you, what is the probability that your random variable capital X is greater than small x? And remember, what is capital X? It is the smallest hash value you have seen so far. So, the smallest hash value that I have seen so far is greater than small x. And if I know that the smallest is greater than small x, what does it tell me about the others? Oh, Hanan, if you're speaking, you're muted. Do you want to look at it? Because this is the smallest, you can do 1 minus the max? Like, if you want to do that? Because, technically, if you want to get the max, isn't it? Because the only... I'm keeping track of the smallest hash value. Okay. Right? And all I'm asking is, if I tell you that the smallest hash value is greater than x, meaning the smallest hash value is to the right of this line... It means that every value has to be also to the right of that line. It means that all of them have to be to the right of that line. Right? And now, notice that all the hash values are independent. Right? My h of xi is independent of... I mean, the x1 and x2... h of... the hash of x1 and the hash of x2 are independent. The hash function is... and it's... it's choosing independently a random value between 0 and 1 for every input. So the probability that all of them are to the right of that line is actually, by independence, the product of the probability that each one of them individually is to the right of this line. That's just by independence. Does this make sense? Okay. And now, what is the probability that any one of them is to the right of that line? What's the probability that when you choose a random number between 0 and 1, it is to the right of this line? 1 minus x. 1 minus x. Right? The whole interval is 1. Now, what is the length of that interval? It's 1 minus x. So each of these things... is 1 minus x. And I'm multiplying them... t times. And that's the formula for... that's what I will plug in, in this formula for expectation. Before expectation... So I get that. The first line clear? Right? In the formula for expectation... I had to integrate... the probability that capital X is greater than small X and the probability that capital X is greater than small X... This is the probability that the minimum hash value is greater than small x. But if t is the number of distinct hash values, the number of distinct elements, then I will multiply this thing t times. Is it clear why I'm multiplying it t times and not m times or the length of stream many times? Is this fact clear? Even though I'm computing the hash every time on the stream, you know, has a new element, if the element is repeated, then I then nothing changes, right? Because it's hash value I have seen before. So a repeated element will not give me a new hash value. It's only a new element that will give me a new hash value. That's why this product is from 1 to t, where t is the number of distinct elements, right? Hopefully that fact is clear. That's how t is coming into the picture, right? The number of distinct elements, okay? And then after this, it is basic calculus, right? If you believe the first line, then I'm integrating 1 minus x to the t. Again, t is a fixed number, the integrals over x. And when you integrate 1 minus x to the t, you get 1 minus x to the t plus 1 over t plus 1. And it's a definite integral evaluated at 0 and 1. At 0 the value is, oh, at 1 the value is 0, because 1 minus x, right? In the numerator. So when you put x equal to 1, the 1 minus x will become a 0. And at 0, you will get 1 over t plus 1. And just take a look at this and convince yourself that this is correct. So is that the calculus part clear? So we calculate that the expectation of capital X is indeed 1 over t plus 1. And that was a claim, right? It's 1 over the number of distinct elements plus 1. So when you do the 1 over x, and you subtract 1, you are indeed returning the number of distinct elements. So, Yator, does that answer your question? We did end up proving essentially, right? If t is 7, then the expected value is 1 over 8, right? If you throw 7 dots, then the minimum is that. So here is a, if you have been following it so far, here is a midterm question for you, here is a final question for you. You were implementing this algorithm, but your friend made a mistake. And instead of remembering the minimum hash value, they are remembering the maximum hash value. Is the question clear? Instead of remembering the minimum hash value, they are remembering their code, instead of a min, they put a max by mistake. So they're actually keeping track of that. Okay, is your friend doomed? Or can you change the output to something that gives you the right answer? Not doomed. Not doomed. Good, not doomed. And you can guess what the right answer is from this fact that I told you. So what should be the output if they're keeping track of X prime? 1 minus max, to look at you me. Now your output should just be in terms of X prime. Right, so previously the output was 1 over capital X minus 1, right? That's what the algorithm was outputting, right? 1 over the minimum hash value minus 1. Now, instead of the minimum, if your friend remembers the maximum, that's X prime, what do you output? 1 over 1 minus X prime minus 1? Yeah, I think that should work. Good. Because the expected value of X prime is T over T plus 1. So what Yertrude said was, if you do 1 minus X prime, you get 1 over T plus 1. And then, so what Yertrude said was, output 1 over 1 minus X prime minus 1, because that would be T, right? 1 over T plus 1 minus 1. And, how would you prove something like this? How would you prove that the, I mean, I gave you this example, for two, if I throw two dots, then the maximum expected, then the expectation of the maximum is two thirds. But now in this proof, how would this proof change? So, I would still try to use this formula for expectation, right? And now in this line, this is not the min now, right? This is the max. So, now how do I compute the probability that the maximum hash value is greater than X? Is the same formula true? Is it true that if the maximum hash is to the right of this line, then all the hashes are to the right of this line? No. No. So, we are kind of stuck. How do we go from this line to the, to this line? That's no longer true. But what can we do? So, Hanan mentioned something when I asked last time. So, the probability that's a random variable is greater than small x is one minus the probability that it is less than small x, right? Does that make sense? And so, if I do one minus the probability, I mean, I'm, I'm oscillating between leaving it as an exercise for you guys or doing it. Do people see, okay, is this hint clear? That instead of this probability, if I replace this by less than and do one minus, right? So, this property is one minus the probability that the maximum hash value is less than small x, right? And now, if I have the same picture, small x, I have a line, and I tell you that the maximum hash value is to the left of this line, what does that mean about all the other hash values? That they're also to the left of this line, right? Because the largest is to the left of this line. And now you can do this independence business. Is this making sense? So, you would again get 1 minus x. Because you did 1 minus, and now the probability that something is to the left of this line is x. So, you would again get this 1 minus x to the t. Except something should change. And let's see what change is. 1 minus x to the t. Only x to the t itself, right? Yeah. Good, good. So, now you guys can finish the exercise, right? So, yeah. This would be a... This would be what you do if instead of the minimum hash value, you somehow use the maximum hash value. And maybe in some paper that you read, people might remember the maximum hash value. That's totally possible. But then their output would be changed accordingly. Okay. So, hopefully this algorithm is clear to everyone. What is the problem with this algorithm? Why do I call it the ideal algorithm? I mean, it's... Ideal doesn't mean... It's ideal in the sense it's too good to be true. Why is that? It's not too good to be true. For each x i in between 0 and 1, we have infinite number of x i. You mean the hash value is... We need... We need infinite precision for this algorithm, right? Because we have to remember exactly where the dart landed. To compute the hash, we may have an infinite precision. Basically, this hash function doesn't exist. Right? Because there's no hash function or it takes infinite precision, if you want to make it. And... But the whole point of streaming algorithm is to save space, right? If we had infinite space, we could have stored the stream. Right? So, we don't want to use a hash function that uses infinite space in order to save space. Right? But this gives us the main idea of... the algorithm by Flagelli and Martin, who invented this hyperlog. So... So, imagine now, instead of the hash function that we had from, you know, 1 to. .. to 0 to... to the interval 0, 1. Let's say now, you hash... to, like, n buckets. Bucket number 0, bucket number 0, ta-ta-ta, bucket number n minus 1. Right? So, that's where you hash your keys to. And... So, everyone knows how to write a number into bits, right? What is the bit expansion of the number 7? Do I know this? 1, 1, 1. Yeah, I guess so. Yeah, I was thinking... Okay, yeah. Yes, 1, 1, 1. And then, for 8, it will be 1, 0, 0, 0, I guess. 100, yeah. Yes, okay. Ah, okay, okay. You start from 0. This always throws me off. See, computer scientists always start from 0. Right? So, that's always a... So... So, I'm trying to decide between to give you the algorithm first, or to give you the intuition. Let me give you... the algorithm, and then I'll give you a minute to see if... why this is correct, right? So, the algorithm, I will say, is the following. You pick a hash function that takes your numbers, you know, whatever your... these Xi's are coming from. And it hashes them to... this, right? And when Xi appears, we compute. .. the hash of Xi. And this is some number... I mean, not... By number, I mean like an integer. It's not a real value anymore, right? It's a... It's an integer from 0 to n minus 1. We write... the bit representation... of... this H of Xi. Right? So, we write it as bits. So, I don't know, 1, 0, 0, 0, 1, 0. For example, this. Right? And now, do people know what the least significant bit of this bit vector is? When I say the least significant bit, I mean this one. The last bit to the right, which is a 1. Does this make sense? So, I will count the position. So, let's see. So, since we start from 0, from now on, this will be position 0 for me. This is position 1 for me. This is position 2. And this is position 3. Right? So, I will say that for this bit vector, its least significant bit is at position 3. Does this make sense? Right? This is just a definition of what I mean when I say the least significant bit of a bit vector. So, for this bit vector, the least significant bit is at position 3. So, we write the bit representation, and then we find the least significant bit of this bit vector, right? Right? H of Xi. And we keep track of the largest So, do you understand what I mean by the largest value of least significant bit? So, in other words, if the next time a bit vector, if this bit vector appears, so let's say this bit vector had appeared, right? And its least significant bit was at position 3. Then another element comes, I hash it, I write the bit vector, and I get this bit vector, will I change? Will my counter change? The counter that is keeping track of the largest value of the least significant bit? What's the least significant bit of this? What's the position of the least significant bit of this bit vector? One. One. Is it larger than the previous position? No. So, this doesn't do anything, right? And then later on, after some time, maybe this vector appears. And now my least significant bit would change, right? The position would change, right? So, now I would update the 3 to a 4. Is it making sense to everyone? Yes. Good. So, let me call this value p. So, p is the largest least significant bit position. So far. Okay. And now, what should be my output? 2 to the least significant bit? Close. Plus one. Because you guys start, computer scientists start from zero. Or, or maybe not. Let me see. So, I claim output this. Yeah, I think this is right. If I claim output 2 to the p plus one. So, why did whoever said 2 to the p, why did you say that? Like, every time, you're basically doing like a bunch of coin flips until you get heads, kind of. Like, the first digit on the right is like coin flip, heads or tails. And there's a zero, so it's tails. I kind of felt like we were back to this binary, like, yes or thing. And like, the odds that like the least significant bit was in the fifth position is really one half times one half, times one half, times one half. Like, I don't know. It just felt like we're back to the binary thing. And the only way to get there to be some number that would represent the actual amount of coin flips felt like we had to, you had to make it to the, to the, raise it to the power. Right. So the intuition is, is not too off. So imagine you had t distinct elements as before, right? So whatever these, your stream is of length m, but really there are only t distinct elements. So you will compute t hashes only, right, throughout the course of this algorithm. And the hash values are random between zero and n minus one, right? So think of their bit representations also as random. So if I ask you out of these t values that you calculated, how many of them will end in a one? And how many of them will end in a zero? What would you say? Out of the t values that I calculate, what's the expected number of values that when I write the bit vector, they will end in a one? You mean one zero? No, no. And the last bit is a one. How many out of these t values do you expect? T over two. T over two. T over two. T over two. And last bit is a zero. Also t over two, right? It's a random thing, right? Half of them should end in a one, half of them should end in a zero. Does this make sense? Because the bit representation is sort of a random bit representation. So if I, if I have t hashes, and all those t hashes are basically random bit vectors, then half of them should end in a zero, half of them should end in a one. Okay, how many of them should end in two zeros? How many of them do I expect will end in zero zero? T over four. T over four. Good. How many of them will end in three zeros? T over eight, right? How many of them will end in four zeros? T over 16. You guys get the idea. So if this position is the jth position, or actually the pth position, right? Then, and again, now position, because I'm counting from zero, not counting from one, okay? So if this is the pth position, from the right, counting from zero, then what is the expected number of things you would see? T divided by what? P plus one. P plus one, or? This is p. Uh, no. So when p is zero, the answer was t over two, right? So p is the index of the position. T over two to the p? P. If this was true, then when p is zero, when I'm at index zero, you said the answer was t over two, right? T over two to the p plus one? Correct. Right? This is the thing about starting from zero or starting from one, right? Because when I've been p is equal to zero, I'm at the first position, and you said the answer was this, right? Okay. So now in the course of this algorithm, right? The rightmost place that I will see a least significant, sorry, the leftmost place I will see a least significant bit would be like, like only very few elements appear here, right? Like, like, this should be just about one. Does it make sense? That there should be at least one element that gives me this position p, and there shouldn't be too many elements, right? Because if there were too many of them, then half of them would actually have given me a larger least significant bit, or something like this. And because this is the... I expect the value of p to satisfy this, you just multiply by 2 to the p plus 1, and that's why I'm outputting that. Right? This means that t is roughly 2 to the p plus 1. So, that's the rough idea. Does this make sense? So, I gave you the algorithm and... at least some of you saw the... saw the idea. So, I... I don't think I will have time to give you all the details. But this is basically... So, what you will show is that this algorithm that I just gave you, it doesn't give the right answer at all, because it will have a lot of variance. I mean, we're talking about how many elements, right? Have this least significant bit at this position. So, if this algorithm is clear, what we will first prove is that this algorithm, which we call A1, gives a... it gives a pretty lame guarantee. And the lame guarantee is that its output will be sandwiched between 32 times the number of distinct elements and the number of distinct elements divided by 32. So, this is what we call in algorithms an approximation algorithm factor 32. So, its output will be between 32 times the output and the output divided by 32. I mean, the true answer divided by 32. Does this guarantee make sense, first of all? Just the statement of the guarantee, not how we prove it. Is the statement making sense? Or no? Yes, no. So again, the algorithm that I just told you, it makes sense. But now we talk about what is the accuracy? Will it actually always output the value T, which is the number of distinct elements? Even you see that that's too much to hope for, right? I mean, you're randomly hashing and anything can happen. In fact, T may not even be a power of 2, right? This algorithm, its output is always a power of 2. What if my number of distinct elements was 6? Then this algorithm will never output 6, right? Because it outputs powers of 2. So clearly, this algorithm cannot give me the right answer all the time. How good is an answer it is giving? First, we will prove that its answer cannot be more than a factor 32 away from the true answer. Okay? And now you will say, well, this factor 32 sucks, right? Factor 32 is a pretty big factor to be away from the true answer by. Then, what we will do is, we will run many, we will run different copies of this algorithm, but in a different way. So this was the proof of the factor 32, which I will try to get to next time. But then we will run many different copies of the algorithm, at different sort of granularity levels. And then we will ask an appropriate copy for the answer. So the big picture idea is first, we will prove that the algorithm that I showed you gives a 32 factor approximation. And then we will try to get that 32 down to the usual epsilon relative error that we do. All right? So, looks like this may still take half a lecture more. And then, we should be done with streaming algorithms.