Mayank Goswami: Alright, welcome everyone, Algorithms for Big Data. Mayank Goswami: So… Mayank Goswami: In the last class, we talked about online algorithms. These are algorithms where you have to make a decision on the fly, right? And… Mayank Goswami: You're trying to compare the performance of your algorithm with the performance of the optimal algorithm. Mayank Goswami: And typically, we say the optimal algorithm is the one that knows what Mayank Goswami: Sequences of updates were going to occur well in advance. Mayank Goswami: Right, so that's obviously a very strong… Mayank Goswami: a strong algorithm to compare yourself to, and if we can compare ourselves to it, then we say that we are a certain competitive ratio. Mayank Goswami: For example, if you saw the recording, what was the competitive ratio for the ski rental problem, where we go to a ski resort. Mayank Goswami: And we have to decide whether to rent or buy. Two. Every day. Mayank Goswami: 2. Good. And what was the competitive ratio of the pizza finding problem? Mayank Goswami: The best competitive ratio. Samuel Sokol: 2J? Mayank Goswami: So that's, where J is the position of the room. Mayank Goswami: But that's bad, right? Samuel Sokol: 9, sorry, 9. Mayank Goswami: Yes. Mayank Goswami: So 2J is what you get if you zigzag, and you turn at every next room. Mayank Goswami: But if you turn at powers of 2, then you get an approximation ratio of 9. Mayank Goswami: And this was homework. I mean, I gave the algorithm… it's a geometric series. Mayank Goswami: Right? So was that part clear? Any questions about the pizza finding problem or the ski rental problem? Mayank Goswami: Okay, so then these were all toy problems. Mayank Goswami: Today, we'll consider three problems. Mayank Goswami: Which are serious problems, not toy problems. Mayank Goswami: I will… define the problems, then I will tell you their algorithms. Mayank Goswami: And what competitive ratio they have. So we will see 3 problems. Mayank Goswami: 3 algorithms and 3 competitive ratios. Mayank Goswami: But I will prove the competitive ratio guarantee only for one out of the three algorithms. Mayank Goswami: Okay? But these problems are important enough for me to tell you, you know, what they are, and because you will… Mayank Goswami: See them in database design all the time. Mayank Goswami: Right? Mayank Goswami: So, the first problem is called the list update problem. Mayank Goswami: You guys can see my… yeah, if I can see it, you guys can see it too, right? This pointer? Okay. Mayank Goswami: So, what is the list update problem? So, you have to update some list, so… You have N keys? Mayank Goswami: Let's just number the keys 1 through N. Mayank Goswami: And they are stored in a link list for you. Mayank Goswami: Okay, everyone knows what a linked list is, it's just, pointers to the next one. Mayank Goswami: And now what you will get, is you will get an online sequence. Mayank Goswami: of M requests from these keys, right? So this is… think of these N keys as your data. Mayank Goswami: And then you will get requests for your data. Mayank Goswami: And so the requests I'm calling R1, R2, R3, up to RM. Mayank Goswami: So, the picture looks like, here's your link list. Mayank Goswami: And then one by one, People are sending requests, and each RI is some key from your linked list. Mayank Goswami: Now, what does the algorithm do? So, the algorithm has to maintain the linked list, right? Mayank Goswami: Meaning, when the request for R1 comes. Mayank Goswami: I have to start from the top of my linked list. Mayank Goswami: And I have to walk until I find R1. Mayank Goswami: Right, so when the request RI arrives, The algorithm pays the cost. Mayank Goswami: equal to the current position of RI in the linked list. Mayank Goswami: Is this clear? Whenever a request arrives, I have to start from the link list, and go up until wherever that requested key is. And I pay the cost. Mayank Goswami: equal to how much I travel. Mayank Goswami: Which is equal to the current position of this requested element in the linked list. Mayank Goswami: Is everything okay so far? Mayank Goswami: Alright? If there's any questions, feel free to ask, because… Mayank Goswami: You know, understanding the problem statement is important. Mayank Goswami: Now, since you have already walked and found RI, Mayank Goswami: Here is what you're allowed to do. You are allowed Mayank Goswami: to move RI to any previous position for free. Mayank Goswami: Okay, so… Here's what you're allowed to do. Keep a linked list. Mayank Goswami: One by one, the request will arrive. When the request arrives, you have to walk from the front of the linked list to wherever that request is. Mayank Goswami: And now, if you want, for some reason, you are allowed to take that requested element and place it anywhere. Mayank Goswami: Before its current position, for free. Mayank Goswami: This I am allowing you to do for free. Mayank Goswami: And now your goal? Mayank Goswami: Is to minimize the total cost over a sequence. Mayank Goswami: Does this problem make sense yet, or no? Samuel Sokol: Yes. Mayank Goswami: Okay, so let's look at an example. Let's say this was your linked list. Mayank Goswami: 251436. This was your linked list at time 0. Mayank Goswami: And these were your requests. So you first requested cubus 4, then the requested cube was 5, Mayank Goswami: Then the requested key was 6. Mayank Goswami: So this is your linked list. So you first search for 4 in your linked list. Mayank Goswami: So you start from here, 1, 2, 3, 4… Mayank Goswami: you paid for. So that's the cost. So the cost to access the first key was 4, because it was at the fourth position in the link list. Mayank Goswami: Now, you are allowed to move Four, if you want to. Mayank Goswami: Anywhere before its current position for free. Mayank Goswami: So let's say for some reason, you decided to move 4 here, between 2 and 5. Mayank Goswami: So then your new linked list looks like that. Mayank Goswami: Right? Mayank Goswami: And now when the request comes, it's for 5. Mayank Goswami: You will walk. 1, 2, 3… Mayank Goswami: And so for 5, you will pay a cost of 3. Mayank Goswami: And now that you have searched for 5, you're allowed to place 5 anywhere before its current position for free. Mayank Goswami: So maybe you decide to place it at the top of the list. Mayank Goswami: And now your linked list becomes that. Mayank Goswami: Only when you do this is the next request revealed to you, and now the next request is a 6. Mayank Goswami: For which you will have to pay. Mayank Goswami: 6. Mayank Goswami: So your total cost in this case is… 13. Mayank Goswami: Does this make sense? Mayank Goswami: And the question is… Mayank Goswami: what is the right way to do… how? I mean, how do I decide where to move the current Mayank Goswami: requested key to. I could not touch it, I could place it here, I could place it anywhere before it. Mayank Goswami: How do I decide where to place it? Mayank Goswami: So that my total cost is minimized. Mayank Goswami: And now you might think, well, if you don't know what requests are gonna come next. Mayank Goswami: How does it help, right? That's the… intuition. Mayank Goswami: But we'll see what… that still, you can do something. Mayank Goswami: But before we go into that, is the problem statement clear? Mayank Goswami: Is the list update problem clear, or no? Mayank Goswami: Just stare at it for a minute, and then let me know if there's any questions. Mayank Goswami: Okay, so I assume there are none. Mayank Goswami: So now, this is an… is it clear that this is an online problem, right? You have to… the requests are only revealed one by one. Mayank Goswami: You don't get to see all the requests in advance. Mayank Goswami: Right? So you will see R1, you will look for it. Mayank Goswami: Do whatever you want to do with the list, and then you will see R2, and so on. Mayank Goswami: So let's look at the first approach, approach 0, which is do nothing. Mayank Goswami: I mean, we are given the power of placing this element that we just searched for anywhere for free, but let's say we don't make use of that power, right? Mayank Goswami: Then, let's say this was your linked list. Mayank Goswami: 1, 2, 3, right? Mayank Goswami: What is a bad sequence for this linked list? Mayank Goswami: for this algorithm. So if you look at the sequence, that was just… N, N, N, N. Mayank Goswami: And let's say this sequence has length M, right? All my sequences have length. Mayank Goswami: Then is it… then what's the cost of accessing this sequence? Mayank Goswami: Starting with this linked list, Using this algorithm that does nothing. Mayank Goswami: Is it clear that the cost is M times N? Mayank Goswami: Wait, have I lost people already? Samuel Sokol: It is clear. Mayank Goswami: Right? Because each… You're always requesting the last element in the list. Mayank Goswami: So you're gonna pay N every time, and you're not moving it anywhere, because you're doing nothing. Mayank Goswami: And there's M requests, so you're gonna pay M times N. Mayank Goswami: Now, for this sequence, Is there a better algorithm for this sequence? Mayank Goswami: And… we claim yes. Mayank Goswami: Imagine that there was an algorithm that, once it accessed something, like N, Mayank Goswami: It just moved it to the front of the list. Mayank Goswami: Because you have the power, right? Once you access a key, you can move it anywhere before for free. Mayank Goswami: So let's say you did that. Mayank Goswami: Then what would be the cost of this sequence? Mayank Goswami: For the first time n is requested, how much will I pay? Mayank Goswami: For the first time when n is requested, how much will I pay? Mayank Goswami: Correct, I'll pay N. Mayank Goswami: But then, from the next time when it is requested, how much am I gonna pay? Joshua Sin: 1. Olivia Xu: 1. Mayank Goswami: 1. Mayank Goswami: So is it clear that my total cost is that? Mayank Goswami: And for the first axis? Mayank Goswami: And there are M-1 remaining accesses. Mayank Goswami: And we're gonna… we're going to assume that the sequence is long enough. Mayank Goswami: that every key is requested at least once, or something like that, so just assume that M is much more than N. Mayank Goswami: And if M is more than N, then this whole thing, I mean, it's less than 2M. Mayank Goswami: It's like M minus 1 plus something that's much smaller than M. Mayank Goswami: So here we have that this optimal algorithm… there is an algorithm. Mayank Goswami: with cost at most 2M on the sequence. Mayank Goswami: But this algorithm that does nothing. Mayank Goswami: Its cost is M times N. Mayank Goswami: So what is the competitive ratio of this algorithm? It is its cost divided by the cost of optimum. Mayank Goswami: The cancels, and you get N over 2. Mayank Goswami: Which is omega N. Mayank Goswami: Which is a bad competitive ratio, right? It depends on an Mayank Goswami: Is it clear what the competitive ratio of this algorithm is? How we just analyzed it? Mayank Goswami: So, in the exam, you will get a couple of questions like this. Mayank Goswami: You'll get a problem, I'll give you an algorithm, and then you'll have to ask… you'll have to analyze its competitive ratio. Mayank Goswami: But this is, like, level zero, right? This is a trivial algorithm. Mayank Goswami: I'm trivial because it's doing nothing. Mayank Goswami: But any questions about this? Mayank Goswami: Is anything in this unclear? Mayank Goswami: So what was this algorithm doing that was wrong? Mayank Goswami: Like, what would you change about this algorithm, if you had to? Mayank Goswami: Why is it… why does it have such a large cost compared to… This better algorithm. Mayank Goswami: What is a better algorithm doing that this one is not? Samuel Sokol: If, if an element is being, Samuel Sokol: being requested most, it gets moved to the front, which makes it a… Samuel Sokol: Which makes the most used elements the cheapest to get. Mayank Goswami: Right. Mayank Goswami: Right, so this algorithm is sort of… Making the most used elements. Mayank Goswami: The cheapest to get. Mayank Goswami: And that's why it's binning on this. Mayank Goswami: Algard. Mayank Goswami: So, here is another approach now, right? Because we know now that Mayank Goswami: Somehow, these requested frequencies should come into the picture. Mayank Goswami: So let's consider this idea. Mayank Goswami: We will order the list, Based on the current access frequency of the keys. Mayank Goswami: What does that mean? That means at any time. Mayank Goswami: The more frequently requested keys will be at the front of the list. Mayank Goswami: So, in other words, the first element of my list will be the most frequently Requested element so far. Mayank Goswami: The second will be the second most frequently requested, and so on. Mayank Goswami: Okay? Mayank Goswami: And now you may ask, well, how can we maintain such a list, right? How do I maintain the invariant that my list contains elements in decreasing order of their current frequencies? Mayank Goswami: But that's actually very simple, because when a request RI appears. Mayank Goswami: Remember, I have to walk from the front of the list until I find RI, right? Mayank Goswami: So if I walk from the front of the list, Until I find RI, Mayank Goswami: And let's say RI is at position J. Mayank Goswami: Well, now I will update RI's frequency by 1, right? I will increment it by 1. Mayank Goswami: And now I will compare Ri's frequency to the frequency of this element just before RI. Mayank Goswami: If I was maintaining the invariant so far, Then I have two cases. Mayank Goswami: Either RI's frequency has now become larger, Than the elements before it? Mayank Goswami: Or it is still smaller. If it is still smaller, I do nothing. I keep our eye where it is. Mayank Goswami: And if RI's frequency has overtaken this element's frequency, then I check with the previous element, right? And I can find out Mayank Goswami: where RI should fit, and I'll move RI to that position. Mayank Goswami: Does this algorithm make sense? Mayank Goswami: Should I repeat it? XUEXIONG WU: Yes, please. Mayank Goswami: Okay, so this is an algorithm that is trying to maintain a linked list. Mayank Goswami: And it's trying to use the idea that we want the more frequently used elements in the beginning of the list. Mayank Goswami: So, in the beginning, your list is whatever, doesn't matter, because currently, No element has been requested. Mayank Goswami: When the first element is requested. Mayank Goswami: It's gonna move it to the front of the list. Mayank Goswami: Because currently, that is the most frequently requested element. Mayank Goswami: And it will keep on going. Mayank Goswami: And now I'm trying to show you how this algorithm will maintain this property. Mayank Goswami: That the items in the link list. Mayank Goswami: Are stored in decreasing order of their current frequencies at this time. Mayank Goswami: So let us say this first element is the most frequent element. Mayank Goswami: Second element is the next most frequent element, and so on. Mayank Goswami: Right? Mayank Goswami: If I requested the element RI, Mayank Goswami: I will walk from the front of the list till I find RI. Mayank Goswami: I know what RI's frequency is, and I know it was just asked for. Mayank Goswami: So I can update its frequency by 1. Mayank Goswami: And I have the frequencies of all of these guys, right? Their frequencies are decreasing from right to left. Mayank Goswami: And now with RI's new frequency, I can check. Mayank Goswami: Most likely, it will only update… only take over the previous one, right? Mayank Goswami: Basically, I will find now… Which position should arrive with its new frequency, be inserted into? Mayank Goswami: And I will just move our eye to that position. Remember, in the problem, I'm allowed to move an element. Mayank Goswami: to move the element that I just accessed. Mayank Goswami: Anywhere before its current position for free. Mayank Goswami: So I can move our eye to wherever it should be, now with its new frequency. Mayank Goswami: And now again, my list has all the elements ordered in decreasing order of frequency. Mayank Goswami: So I'm saying this is an algorithm that does maintain all the elements, In decreasing order of frequencies. Mayank Goswami: Does the algorithm make sense now, or no? XUEXIONG WU: Yes. Mayank Goswami: Okay, so this is what we wanted to do, though, right? This is what we learned from this example, that… Mayank Goswami: You know, we should try to keep the more frequently used keys in the beginning. Mayank Goswami: So then, looks like this algorithm should be good somehow, right? Because it is doing… Mayank Goswami: It is always keeping the most frequent guy in the beginning, second most frequent second, third, third, and so on. Mayank Goswami: By the way, people understand what the frequency of a key is, I'm hoping, right? Might be too late after all this explanation, but frequency is just how many times the key has been requested so far. Mayank Goswami: Right? Mayank Goswami: So, but it'll turn out that this algorithm is also bad. Mayank Goswami: And… Mayank Goswami: So, in the… I mean, generally, if I give you an algorithm, to show it is bad, you just have to show one sequence on which it is bad. Mayank Goswami: Right? Because then we cannot claim Mayank Goswami: A worst-case competitive ratio for that algorithm. Mayank Goswami: So what is a bad sequence for this algorithm? Let's say I have this sequence. Mayank Goswami: So, first I access 1, 10 times? Mayank Goswami: Then I ask your algorithm to access 2 n times. Mayank Goswami: Then I ask your algorithm to access 3 n times. Mayank Goswami: And then I ask your algorithm to access N times. Mayank Goswami: Okay, is the access sequence clear? Mayank Goswami: Okay, if it is. Mayank Goswami: Then, start with any linked list that you want, doesn't matter, because originally all elements have frequency 0. Mayank Goswami: what will happen after the first axis? Mayank Goswami: What can you say happens after the first axis? Hanan Latiff: Yeah, 1 become 1 for the N, because we access it n times, that's the frequency of 1. Mayank Goswami: Correct. So the first position will be occupied by one, right? Mayank Goswami: And then the rest of your linked list will be whatever you had in the beginning, because nothing has changed so far. Mayank Goswami: when… what happens after I have access to? Hanan Latiff: Stuck. Hanan Latiff: Yeah, to become the second as well, so yeah. Mayank Goswami: So we come the second. Okay. Now, imagine the half of this axis here. Mayank Goswami: Okay Mayank Goswami: So what element… what number would I be accessing at the middle of the sequence? What number would I be accessing at that time? Hanan Latiff: N over 2. Mayank Goswami: N over 2. Mayank Goswami: So let's say I have just finished accessing N over 2. Mayank Goswami: Then what does your linked list look like? Joshua Sin: 110 over his foot. Mayank Goswami: 1, 2, 3, up to N over 2, right? Mayank Goswami: And then whatever you started with. Mayank Goswami: Is this good so far? Mayank Goswami: Also, what are the frequencies of the fir- of 1, 2, or N over 2? What are the frequencies so far? Joshua Sin: None. Olivia Xu: N? Hanan Latiff: Good. Mayank Goswami: N. And what are the frequencies of the guys who are in the second half of the linked list at this time? Olivia Xu: Vero? Mayank Goswami: Who's idol? Mayank Goswami: Now… Now let us just count. Mayank Goswami: Is it clear that… Okay, maybe. Mayank Goswami: Okay, so you guys have… Mayank Goswami: See integral this half, right? So after half of this sequence has gone by, meaning I have just accessed n over 2, Mayank Goswami: We agreed that the first half contains the keys 1 through n over 2. Mayank Goswami: And the second half contains the keys from N over 2 plus 1 to N. Mayank Goswami: Okay. Mayank Goswami: But now I claim that my cost is high. Mayank Goswami: So let's count our cost of this algorithm. Mayank Goswami: And for now, I'll ignore even the cost of this first half. Free. Let's say I give it to the algorithm for free. It can only be better. Mayank Goswami: So the remaining sequences, I will access N over 2 plus 1 n times. Mayank Goswami: Right? Mayank Goswami: But I claim that N over 2 plus 1 will always cost me at least n over 2. Mayank Goswami: See, right now, it is in the second half of the list, right? Mayank Goswami: So that means I have to pay N over 2, To get to this guy. Mayank Goswami: But even if I have accessed it the first time, I cannot move it to the front of the list, because all of them have high frequencies right now. Mayank Goswami: Right, so I will not touch this element, even though I accessed it once. Mayank Goswami: Even when I access it twice, I will not touch it, because… Mayank Goswami: All of them have a frequency of N, right? So it does not… it cannot… it has not overtaken its frequency yet. Mayank Goswami: So, for all of its accesses. Mayank Goswami: I will pay at least n half, because it's in the second half of the list. Mayank Goswami: Right, the list is length n, half of it is n half, and if an element is in the second half of the list, you have to pay n half. Mayank Goswami: Does this make sense? That all of these axes of n over 2 plus 1, I'll pay n half? Olivia Xu: Yes. Mayank Goswami: And now, similarly, all of these axes of n over 2 plus 2 also, The point is, these guys… Mayank Goswami: Will not be moved to the first half of the list. Mayank Goswami: Until they're fully accessed, basically, and then it's too late, even if you move them, right? You don't gain anything. Mayank Goswami: So, each axis here I'm gonna pay at least n and half. Mayank Goswami: Okay? What was the total length of my sequence? Mayank Goswami: What was the total length of my access sequence? Olivia Xu: N squared? Mayank Goswami: Very good. And squared, right? Because each of the N keys was being accessed n times, right? Mayank Goswami: n plus n plus n plus n, so n squared. Mayank Goswami: So, how much is… what is the length of this sequence? Mayank Goswami: This was half of the original sequence, right? Mayank Goswami: So if the original sequence was left. Olivia Xu: 10 square over 2? Mayank Goswami: Alright, so that's N squared over 2. Mayank Goswami: And we agreed that each of them costs me at least n over 2. Mayank Goswami: So my cost of the algorithm on the whole sequence has… Mayank Goswami: Is at least equal to its cost on just the second half of the sequence. Mayank Goswami: And the cost on the second half of the sequence is at least n squared over 2, which is the number of requests in the second half of the sequence, times N over 2, because all of them are in the second half. Mayank Goswami: Of the list. Mayank Goswami: And this gives me an n cubed over 4. Mayank Goswami: So the cost of this algorithm is roughly an N cube, right, on the sequence. It's at least an N cube. Mayank Goswami: Is it clear that the cost of this algorithm is at least an n cube on this sequence? Mayank Goswami: Just look at this analysis again, and make sure you understand Mayank Goswami: How this algorithm pays this much on the sequence? Mayank Goswami: Okay. Mayank Goswami: So the order by frequency pays N cube. Mayank Goswami: on the sequence. Mayank Goswami: But okay, maybe this sequence is a bad sequence for all algorithms, right? Is there a better algorithm for this sequence? Mayank Goswami: So remember, our sequence is that. Mayank Goswami: So, consider this algorithm now. It's different from this ordering by frequency, and this algorithm is very simple. Mayank Goswami: It says, when a new element appears, Move it to the front. Mayank Goswami: So you have your sequence, axis sequence, Right? Mayank Goswami: When you access one, now it's the first element, move it to the front of the list. Mayank Goswami: Then you keep accessing one. Mayank Goswami: Now, 2 has appeared. This is a new element. Mayank Goswami: So you search for two, pay the cost. Mayank Goswami: But then move it to the front of the list. Mayank Goswami: Right? This algorithm makes sense. Mayank Goswami: Whenever you see a new key being requested, a key that has not been requested before. Mayank Goswami: Move it to the front of the list. Mayank Goswami: What is the cost of this algorithm? On the sequence, Now it should be clear. Mayank Goswami: Okay, so maybe we can try to see. Mayank Goswami: the first occurrence of any element, I will pay at most an N, right? Like, maybe in the beginning, the linked list I started with has 1 at the end of the list, right? Whatever. Mayank Goswami: So I pay N, For the first axis of this key. Mayank Goswami: But after the first axis, I'm going to move it to the front. Mayank Goswami: So, all of the remaining accesses here. Mayank Goswami: How much am I going to pay? Hanan Latiff: And minus 1. Mayank Goswami: Total, right? One for each. Mayank Goswami: Does it make sense? Mayank Goswami: Because it's at the beginning of the list, right? Mayank Goswami: So I'll pay N for the first access, But one… Mayank Goswami: for each of the remaining n-1 accesses of that same key. Mayank Goswami: And this is the cost for one. Mayank Goswami: And then again, when 2 appears, maybe for the first appearance of 2, I pay an N, Mayank Goswami: But then I'll move it to the front. Mayank Goswami: And then I will pay 1 for the remaining n-1 occurrences of 2. Mayank Goswami: So that's my cost for every key. Mayank Goswami: So my total cost is that combined by N, multiplied with N. Mayank Goswami: And now this is roughly a 2N, right? N plus N minus 1 is, like, a 2N. Mayank Goswami: And a 2N times n is like a 2N squared. Mayank Goswami: So this cost is at most 2N squared. Mayank Goswami: Does this make sense? That the cost of this algorithm is 2N squared. Mayank Goswami: And if I have an algorithm whose cost is at most 1 squared. Mayank Goswami: then the optimal algorithm's cost is also at most 2N squared, right? This algorithm did not, like. Mayank Goswami: Know the sequence, it's just doing, this thing. Mayank Goswami: But if I look at now this algorithm that was ordering them by frequencies. Mayank Goswami: This one looks bad, right? Because it had an M cube cost. Mayank Goswami: Whereas this one has an N squared cost, roughly. Mayank Goswami: So the competitive ratio of this order by frequency algorithm Mayank Goswami: Is its cost divided by the optimum's cost. Mayank Goswami: Which is omega n. Mayank Goswami: That's actually as bad as the first algorithm where we did nothing. Mayank Goswami: Remember this algorithm where we did nothing? Mayank Goswami: Its competitive ratio was omega n. Mayank Goswami: And even the smart algorithm, that was ordering things by frequencies, Turns out there's a bad… Sequence for that. Mayank Goswami: That also gives an omega n competitive ratio. Mayank Goswami: Is this… Is this page clear? Samuel Sokol: Professor? Mayank Goswami: Yes? Samuel Sokol: Is the point of this to find… like, I feel like for every… Samuel Sokol: at least the ones you've looked at so far for this kind of problem. Every algorithm you make, you could create some kind of, like. Samuel Sokol: bad sequence for it, right? Like, even the one we just did, the move to front, what if you just, like, scanned every element across once and then had the same stream? Samuel Sokol: Right, like, is there, like, an optimal algorithm for this problem? Mayank Goswami: Yes, yes, yes. So it turns out, no. So, the question in online algorithms is exactly… Mayank Goswami: Given an algorithm, so what you just select, you can always create a batch sequence for that algorithm. Mayank Goswami: And the question is, how bad? Mayank Goswami: So, how bad is characterized by this competitive ratio, right? That's how we capture how bad is that algorithm. Mayank Goswami: So these two that we have so far… seen so far. Mayank Goswami: their competitive ratio was bad in the sense it could be N, right? Mayank Goswami: And now what you said about move to front. Mayank Goswami: So, what would be the bad example… what would be the bad sequence for move to front? That is the next thing. Mayank Goswami: But let's talk about it. So what is a bad… Sequence for move to front. Hanan Latiff: I'm sorry, can you repeat your question? Mayank Goswami: This was Sam's question. Mayank Goswami: So, Sam, do you want to repeat. Samuel Sokol: So, for the algorithm we just saw, the move to front, which is the fir… whenever you meet a new element, you just move it to the front of the linked list. What would be a bad… what would be a bad sequence for that? And my guess was that you show… first, you show every element once. Samuel Sokol: And then I've shown everyone… Mayank Goswami: Just 1, 2, 3, 4, 2N over 2, or 2N, and then… Samuel Sokol: Yes. And then you have the same… Stream that we had before. Mayank Goswami: You mean this one? Samuel Sokol: Yes. Mayank Goswami: Alright, for this one, when a new element appears, right? Mayank Goswami: The new being the keyword here. Samuel Sokol: Yes. Mayank Goswami: So, yeah, so the next algorithm we will see is pretty close to this one. Mayank Goswami: Except, it will not have this word new in it. Mayank Goswami: So the next algorithm is going to be… Very simple. Mayank Goswami: Which is… it's called the move to front. Mayank Goswami: Which is, after you walk to your currently requested element. Mayank Goswami: Just quietly move it to the front of the list. Mayank Goswami: This algorithm is clear, right? Mayank Goswami: Search for your element, you're allowed to place it anywhere. Mayank Goswami: In the list before it. Mayank Goswami: Just move it to the front. Every time you search for any element, move it to the front. Mayank Goswami: No new business here. Mayank Goswami: So is the algorithm clear? Mayank Goswami: Hopefully it is. Mayank Goswami: And turns out, This is pretty good in the sense it is too competitive. Mayank Goswami: So there is a bad example for it, but… Mayank Goswami: This cannot deviate from OPT by more than a factor, too. Mayank Goswami: Which is pretty good, considering this is an online algorithm. Mayank Goswami: That doesn't know the sequence of requests you're gonna get. Mayank Goswami: And opt is not an online algorithm. Opt is like God's algorithm, right? Who knows the entire sequence of updates that you're gonna get. Mayank Goswami: So, Sam, does that answer your question? Samuel Sokol: Young. Mayank Goswami: So, what we are essentially claiming is that for any sequence S, Mayank Goswami: the cost of this algorithm, what I'm calling the move-to-front algorithm on that sequence. Mayank Goswami: is at most 2 times the cost of opt on that sequence. Mayank Goswami: Right? That's what we… I mean, combined ratio is the ratio of this to OPT, right? So that's another way of saying that the ratio is at most 2. Mayank Goswami: Right? That the cost of moved to front. Mayank Goswami: is at most 2 times the cost of opt on that sequence. Samuel Sokol: Yes, that answered my question, thank you. Mayank Goswami: And actually, this theorem is slightly wrong, in the sense, I mean, it's morally correct. Mayank Goswami: But technically, the theorem is… This… Mayank Goswami: It's slightly ugly, but what it says is the cost of move to front on the sequence S. Mayank Goswami: is at most 2 times the cost of OPT on that sequence. Mayank Goswami: minus… M, which is the length of the sequence, plus N choose 2. Mayank Goswami: Forget about this, borrow from bank for now. Mayank Goswami: Okay, that's actually the theorem that… We can prove. Mayank Goswami: And I normally would, but not today. Mayank Goswami: So that's technically the correct theorem. Mayank Goswami: But notice that if the length of your sequence, which is M, is more than n choose 2, Mayank Goswami: N choose 2 is roughly n squared, right? Mayank Goswami: Meaning, if the sequence is longer than n choose to, so if, you know, you have a long enough sequence, which typically… I mean, you're running this algorithm all day, all night, right? You have a linked list, people are coming with accesses. Mayank Goswami: So, you want to maintain this algorithm for days or months, right? And eventually, your length of the requested sequence Mayank Goswami: will become larger than n squared. Mayank Goswami: So, if that is the case. Mayank Goswami: If M is greater than n choose 2, Mayank Goswami: Then, the right… this… these two terms are negative, right? Mayank Goswami: I mean, then the right-hand side can be written like this, right? Mayank Goswami: minus… Mayank Goswami: M minus n choose 2, and if m is greater than n choose 2, then this term is positive. Mayank Goswami: But that means this theorem has proven that the cost of move to front is, at most. Mayank Goswami: 2 times OPT minus something positive. Mayank Goswami: And if you're smaller than… blah minus something positive, then you're smaller than blah. Mayank Goswami: Like, if you're smaller than blah minus 5, Then you're smaller than blah. Mayank Goswami: Right? Mayank Goswami: So that's why I said that this theorem is morally correct. Mayank Goswami: Because what we are saying is, for long enough sequences. Mayank Goswami: The competitive ratio is at most 2. Mayank Goswami: So this is the famous theorem that the move-to-front heuristic is too competitive. Mayank Goswami: Any questions about this algorithm? Mayank Goswami: Or the guarantee? Mayank Goswami: Okay. Mayank Goswami: So, I will not prove this theorem. Mayank Goswami: But I will tell you why this proof is, Mayank Goswami: Not as, trivial as it may sound. Mayank Goswami: So you… what is the first… Mayank Goswami: Where you would think of proving something like this. Mayank Goswami: That the cost of an algorithm on the sequence is at most 2 times the cost of opt on that sequence. Mayank Goswami: Like, what is the first, simplest way to… to try to prove such a statement? Samuel Sokol: Try to find its worst-case sequence. Mayank Goswami: Right, but I want to prove this for all sequences, right? Hanan Latiff: Contradiction? Mayank Goswami: Yes, let's say… so, yeah, so first of all, trying to find the worst-case sequence is good, as a first thing when we are trying to understand how bad this algorithm is, right? And if you find a really bad worst-case sequence, then you give up on the algorithm, right? Mayank Goswami: In research, we reach this theorem phase only when we cannot find a very bad sequence for the algorithm. Mayank Goswami: Right? So, meaning… We try to find worst-case sequences for this algorithm. Mayank Goswami: But we don't find anyone that is worse than a factor of 2. Mayank Goswami: Right? Mayank Goswami: But just because we cannot find a sequence that's worse than a factor of 2, Mayank Goswami: Doesn't mean there is no sequence that's within a factor of 2, right? Mayank Goswami: We cannot exhaust all sequences. They are exponentially many. Mayank Goswami: So then we need a proof. Mayank Goswami: And you can go by contradiction, but, I mean, this is, Mayank Goswami: So how do I say this? Mayank Goswami: You have a sequence that's coming one by one. Mayank Goswami: There is an optimal algorithm, that's God's algorithm, and there is our algorithm. Move to front. Mayank Goswami: And we're trying to say our cost is no more than twice. Mayank Goswami: Our total cost is no more than twice the total cost of this God's algorithm. Mayank Goswami: So one may think of proving this statement on a request-by-request basis, right? Show that the first request Mayank Goswami: We don't take more than 2 times as much. Mayank Goswami: For the second request, we don't take 2 times as much, and so on. Mayank Goswami: Does it make sense? Like, how that would be the obvious approach? Show that for every request, we are within a factor or two. Mayank Goswami: And then when I add up over the entire sequence, I'm still within a factor 2. Mayank Goswami: So that's what I mean here by the obvious approach. If you can prove that the cost to access RI Mayank Goswami: In the move to front algorithm. Mayank Goswami: is at most 2 times the cost to access Ri in OPT. Mayank Goswami: Then you would have proved this statement. Mayank Goswami: Right? Mayank Goswami: Wait, first of all, is this making sense, or no? Mayank Goswami: That such a statement would imply such a statement. Olivia Xu: Yes. Mayank Goswami: Yes. Mayank Goswami: But this is not true. You cannot do things like this. Mayank Goswami: So, A per request argument does not work. Mayank Goswami: So this thing is only true in the aggregate. The statement… Mayank Goswami: It's only true for the full sequence, in the sense There will be some requests. Mayank Goswami: That we will take much more than 2 times opt. Mayank Goswami: And there will be other requests that we will be… Faster than two times opt. Mayank Goswami: And what we have to show is that eventually, everything cancels out. Mayank Goswami: And we are still within 2x opt. Mayank Goswami: Is the difference clear, what this new strategy that I'm trying to say is? Mayank Goswami: Like, it's a… it's an overall bound. It's not… you cannot say for every request, I'm within a factor 2. Mayank Goswami: That doesn't work. Mayank Goswami: Sometimes I'll be better, sometimes I'll be worse. Mayank Goswami: And so for this, we do what is called a potential function method. Mayank Goswami: So think of potential function as, like, I keep a piggy bank. Mayank Goswami: Where whenever I am faster… so, I am this algorithm. When I'm… whenever I am faster than 2 times opt. Mayank Goswami: I put the remaining money in the piggy bank. Mayank Goswami: And whenever I am slower than 2 times opt. Mayank Goswami: I take out money from the piggy bank to pay for the fact that I'm slow. Mayank Goswami: And at the end, I want to say that my bank is never in the negative. Mayank Goswami: Is this making sense, or too vague? Mayank Goswami: So we will see a potential function proof, soon, but not for this problem. Mayank Goswami: So with this problem, our story will end here. Mayank Goswami: Where you have two algorithms that failed. Mayank Goswami: But one algorithm that actually turns out to be, pretty good. Mayank Goswami: Factor 2. Mayank Goswami: Okay, so any questions about the… List update problem? Mayank Goswami: Okay, then let's move to the next problem. Mayank Goswami: So these were the toy problems, skate rental, and pizza. Mayank Goswami: Okay, so the next problem is going to be… Mayank Goswami: Or not. Hanan Latiff: So, Professor, question. You said, we can see a question for the final, only, like, you're gonna give us, an algorithm, and after that. Hanan Latiff: you're gonna… how would be the format for the finals for the material that just we went through? I'm, like, just wondering. Because we don't have… you know, for the midterm, we did have a question to practice. Hanan Latiff: Like, from the book. Mayank Goswami: Yeah, so here is what it would be. I would give you… this much? Mayank Goswami: Yes? Mayank Goswami: I would give you this information. Hanan Latiff: Huh. Mayank Goswami: And then I would give you this algorithm. Mayank Goswami: And I would say, find the competitive ratio of this algorithm. Mayank Goswami: So I will describe the problem. Mayank Goswami: I will give you an algorithm, and I'll say. Mayank Goswami: Find the competitive ratio of this algorithm. Hanan Latiff: And it's basically the cost of the original, function over the cost of the optimal, right? The ratio, like… Mayank Goswami: But you have to find this bad sequence. Mayank Goswami: The cost of the… cost of… Mayank Goswami: Cost of an algorithm on what, right? Mayank Goswami: Again, in the definition of competitive ratio. Mayank Goswami: So what is the definition of competitive ratio? Who can tell me Mayank Goswami: What's the definition of the competitive ratio for an algorithm? For an online algorithm? Samuel Sokol: The algorithm's worst time cost over the optimal solutions? Samuel Sokol: Cost? Mayank Goswami: Not… not necessarily, because… The algorithm's worst time cost Mayank Goswami: Maybe on a input where the optimal also takes a long time, right? Mayank Goswami: And the ratio could be fine. Mayank Goswami: Does that make sense? Mayank Goswami: Just because your algorithm is taking long on input. Mayank Goswami: Doesn't mean opt is fast on that input, right? Mayank Goswami: You have to find an input where your algorithm is taking long compared to opt. Mayank Goswami: So the ratio is really what you want to maximize. Mayank Goswami: So it's the maximum of the ratio Over all inputs. Mayank Goswami: Of the algorithm's running time on that input, divided by ops time on that input. Mayank Goswami: Do you see the distinction, Sam? Mayank Goswami: Between what you said. Samuel Sokol: Yeah, I keep saying, the worst… Samuel Sokol: time for the algorithm, and what I should be saying is that of the ratio of the cost of the algorithm over the cost of the optimal, it's the ratio that's maximized for worseness, not the specific cost of the algorithm. Mayank Goswami: Exactly, so it's the ratio you want to be maxing. Mayank Goswami: So, coming back to Haran's question, I'll give you a problem, I'll give you an algorithm, an algorithm, and I will say find a competitive ratio of this algorithm. Mayank Goswami: And to find the competitive ratio of that algorithm, you have to find an input? Mayank Goswami: For which this ratio is large, or small, whatever, whatever is the best. Mayank Goswami: Maybe the algorithm is good, in which case you will find a competitive ratio of 2 or 3. Mayank Goswami: Maybe the algorithm is bad, and you'll find a competitive ratio of N, or something like that. Mayank Goswami: I mean, you all… in the recording, I already gave you a homework for this, right? The pizza problem. I did not prove the competitive ratio of 9, and that was your homework. So that's an example. Samuel Sokol: So, for a problem like that, I mean, it's just kind of trivial that the optimal solution is just going to the right room, but for other problems, it seems harder to prove that that ratio is actually maximized, no? Mayank Goswami: Exactly, I agree. So, so again, you don't, I mean… Mayank Goswami: So, good questions. Let me answer your first good question. One, you're saying. Mayank Goswami: It's hard to know… for the pizza finding room, it's clear that the opt is just… that goes straight to the room. Mayank Goswami: And for the others, it's not clear. Mayank Goswami: But actually, we did see here, so… Mayank Goswami: You see, for this sequence, right? Mayank Goswami: I didn't know what OPT does. Mayank Goswami: But I know that ops certainly not be worse than something else. Mayank Goswami: So sometimes you don't have to figure out opt. Mayank Goswami: Right? If there is anything better, any other algorithm that's better. Mayank Goswami: That's way better, then opt can only be, you know, even more better. Samuel Sokol: So sometimes we only have to find a bound, it doesn't have to be the exact amount, but it's at least something. Mayank Goswami: Exactly, yes. Mayank Goswami: So you don't have to argue about OPT sometimes, because opt could be hard to argue about. Mayank Goswami: But if you can say, alright, there's this other thing we could do that's already much better than this proposed algorithm. Mayank Goswami: Then the proposed algorithm has a high competitive ratio. Mayank Goswami: Because forget about comparing it to Opt. Mayank Goswami: Even comparing it to this other algorithm. Mayank Goswami: Makes it clear that this one is terrible. Mayank Goswami: But this trick only works when you want to show a large competitive ratio. Mayank Goswami: If you want to show that it's a competitor ratio of 2, Mayank Goswami: Right? Then you kind of need to compare to Opt. Mayank Goswami: But that's a theorem, which I'm not proving for you, so you won't be asked for the proof of that theorem. That's why I'm telling you that this theorem is much more involved, because you have to argue something for all sequences. Mayank Goswami: And for opt on that sequence. Mayank Goswami: Okay, but to answer, Hanan's question, is it clear? What type of… Examples, exercises you can… Expect? Hanan Latiff: Yes, thank you. Mayank Goswami: Okay, good. Mayank Goswami: The second topic is potentially one of the most… Imported algorithms in, Mayank Goswami: Top 2, top 3, in machine learning? Mayank Goswami: Has anyone here heard of… Multiplicative weight updates. Mayank Goswami: No. Mayank Goswami: Okay, good. Mayank Goswami: Okay, tell me what you know about a neural network. Can anyone tell me what does a neural network do? Mayank Goswami: Or a large language model. What do you know about it? Mayank Goswami: What do you know about a neural network? Mayank Goswami: Whoa. Mayank Goswami: You guys have heard the term, right? What do you know? Hanan Latiff: Use, human brain. Hanan Latiff: Like, based on honey. Mayank Goswami: user… Hanan Latiff: No, it works like human pain. Hanan Latiff: We're singing. Mayank Goswami: it's supposed to be modeled like a human brain. Now, we don't know… we don't know it works like a human brain. Mayank Goswami: It's just supposed to model like a human brain. Anastasiia Tcyrenzhapova: It is trained on large… There you go. Mayank Goswami: Correct? Anastasiia Tcyrenzhapova: And… Then it gives tokens based on… Probability. Mayank Goswami: Yeah, so then you ask it a question, and based on its training, it will tell you something, right? Mayank Goswami: And you may… right now, there are many large language models, ChatGPT, Claude, blah blah blah blah. Mayank Goswami: But they are never guaranteed to give you the right answer. They cannot be. Mayank Goswami: Because we don't know. Mayank Goswami: Because, first of all, it's all statistical, right? Mayank Goswami: And so there will never be a guarantee that they will always give you the right answer. They hallucinate at times, Mayank Goswami: They have this tendency of… they try to please the user, so they will give you whatever you want to hear. Mayank Goswami: In the beginning days, if you could ask, So if you ask ChatGPT, Mayank Goswami: So you would ask it, prove… Mayank Goswami: That square root 2 is irrational. Mayank Goswami: Everyone knows kind of what an irrational number is, maybe you've seen this before. Mayank Goswami: So if you ask ChatGPT to prove, square root 2 is irrational, it would prove it. Mayank Goswami: It will give you correct proof. Mayank Goswami: And then if in the next sentence. Mayank Goswami: You would ask it to prove that Mayank Goswami: That square root 4 is irrational. Mayank Goswami: It would also give you a nice proof. Mayank Goswami: But is the second statement correct? Anastasiia Tcyrenzhapova: No. Anastasiia Tcyrenzhapova: No. Irrational number is something that you can't express as a fraction. Mayank Goswami: Exactly, yes. And square root 4 is actually 2, right? Mayank Goswami: But because you ask it to do something. Mayank Goswami: and it tries to please you, it will do it. It'll give you something wrong. Mayank Goswami: And we… in machine learning, this… this happens, because these are not… these models don't understand, right? Mayank Goswami: But still, we have to… I mean, people do use them. Mayank Goswami: So here is the problem. Suppose you have to make some decision. Mayank Goswami: Right? So you are, whatever, someone. Mayank Goswami: I don't know what the quizzed look is. Mayank Goswami: But you're trying to decide. Mayank Goswami: I don't know, whether to do one or two things, whether to do A or B. Mayank Goswami: And now, you don't know anything, so you ask, You asked ChatGPT? Mayank Goswami: You ask, I don't know, Claude… Oh, you ask… blah blah… And then you ask your… Your parents. Mayank Goswami: And then you ask your… I don't know, your… Neighbor… your neighbors… Mayank Goswami: Newaska… Someone. Mayank Goswami: And everyone gives you a suggestion. So maybe ChatGPT says, do away. Mayank Goswami: Claude says, do B. Mayank Goswami: Parents say do A. Neighbors say do B. Mayank Goswami: And this person saves to we. Mayank Goswami: Okay? Mayank Goswami: And now you have to decide who to listen to. Mayank Goswami: So this problem that we are going to talk about is exactly the problem of Mayank Goswami: when I have a collection of… so-called experts. Mayank Goswami: Some of which may or may not be true experts. Mayank Goswami: How do I decide who to listen to? Mayank Goswami: when I have to make a decision. Mayank Goswami: Okay. Mayank Goswami: And every day, let's say every day I'll have to make a decision. Mayank Goswami: And every day I'm asking these people. Mayank Goswami: And every day they give me different answers, right? How do I decide who will listen? Mayank Goswami: Does the question make sense? Mayank Goswami: And you can replace all of these. Mayank Goswami: by different algorithms. All of them statistical algorithms. Mayank Goswami: So, effectively, this is a question of… Mayank Goswami: If you have a bunch of algorithms. Mayank Goswami: Right? How do you decide which one to listen to? How do you choose the best out of them? Mayank Goswami: When they give different advice. Mayank Goswami: And these algorithms are very different. Mayank Goswami: from the algorithms that you will learn in, like, 3.23 or something, right? Because there, I mean, if I give you an input. Mayank Goswami: And I ask you to sort the input. Mayank Goswami: Your algorithm always sorts the input. There is no error there. Mayank Goswami: But because of the event of machine learning and statistical-based problems. Mayank Goswami: All the algorithms, they are randomized, right? So sometimes they'll give you an error. Mayank Goswami: And the question is, how do you decide, you know? Mayank Goswami: Which algorithm to listen to. Mayank Goswami: So this is what we'll… We'll see what's called the expert's theorem. Mayank Goswami: And the algorithm here? Mayank Goswami: The best algorithm is called the multiplicative Weight Updates. Mayank Goswami: That's what we're going to see. Mayank Goswami: But hopefully you understand why this is important in today's context. Mayank Goswami: Good. Mayank Goswami: Alright, so what's the setting here? Mayank Goswami: Where did the setting go? Mayank Goswami: No. Mayank Goswami: Alright, multiplicity weight updates, yes. Mayank Goswami: So we have to make decisions every day, we have N experts to assist us. Mayank Goswami: Okay, so you have some number of experts, N, to assist you. Mayank Goswami: And your decisions are just binary, sell or buy. So every day, you just have to decide A or B. You can think of it as sell or buy. You know, you're in the stock market, you're asking friends, you're looking at different channels, and you have to decide whether to sell or buy a certain stock every day. Mayank Goswami: And then, what happens with the… so in the beginning of the day, all the experts give you your advice, give you their advice, whether to sell or buy. Mayank Goswami: And at the end of the day, right. Mayank Goswami: you know if Expert I's suggestion was correct, or if it was a mistake. Mayank Goswami: Right, so at the end of the day, you can compare the stock price to what it was at the beginning. Mayank Goswami: And, you know, if it's higher. Mayank Goswami: Then it was a bad decision to sell it, and if it's lower, it was a bad decision to buy it. Mayank Goswami: So, at the end of the day, you know if a certain expert is Right or wrong? Mayank Goswami: And now, you're doing this every day, right? So on the first day, you have a bunch of experts. Mayank Goswami: Maybe you listen to someone, and you find that they were wrong the next day. You find that they were wrong that day. Mayank Goswami: Now, what do you do? You can… Mayank Goswami: You can just never listen to them again, or you can do something else. Mayank Goswami: So that is the question of this. Mayank Goswami: algorithm, this problem. Mayank Goswami: Right? Is there a strategy? Mayank Goswami: That performs almost as well as the best expert. Mayank Goswami: And when I say the best expert, I mean the best expert I mean… Currently. Mayank Goswami: The best expert in hindsight. Mayank Goswami: Meaning the expert who has made the fewest mistakes so far. Mayank Goswami: Does the question make sense, or no? Samuel Sokol: That makes sense. Mayank Goswami: Alright. Mayank Goswami: So what is this multiplicative weight updates algorithm? So again, this is probably one of the most important algorithms you will see. Mayank Goswami: So, and it's one of also very simple. Mayank Goswami: So… Initially, you give all your experts a weight Mayank Goswami: of one. So this is how much confidence you have in them, 1 being 100% in every one of them. Mayank Goswami: So all the experts initially have a weight of 1. So WI is 1, Mayank Goswami: for all I in N, because you have N experts, right? You're trying to decide who among the N experts to listen to. Mayank Goswami: So all the NX ports have a weight of 1. Mayank Goswami: And in general, at every step, meaning at every day. Mayank Goswami: your expert will have a certain weight, right? You will have weights Mayank Goswami: And so WIT will denote the weight of the iF expert, on day T. Mayank Goswami: Okay? Mayank Goswami: And the algorithm, basically, what it does is the following. Mayank Goswami: If an expert is correct on a given day. Mayank Goswami: it will not change its weight. It will keep the export's weight correct, the same for the next day. Mayank Goswami: But if at the end of the day I find out that an expert made a mistake. Mayank Goswami: I penalize the expert, and I lower its weight slightly. Mayank Goswami: So this is what I have written here. Mayank Goswami: If you don't like step, you can think of a day. Mayank Goswami: So at step T plus 1, Mayank Goswami: the weight of expert i at time t plus 1. Mayank Goswami: Will be one of two things. Mayank Goswami: It will either be the weight of expert I on day T, if… Expert I was correct. Mayank Goswami: But if Expert I was incorrect, I multiply its weight. Mayank Goswami: by something that's slightly smaller than the 1. I multiply it by a 1 minus epsilon. Mayank Goswami: And I pick what epsilon is. Maybe I pick epsilon to be 0.1, or, I don't know, 0.2. Mayank Goswami: You are free to choose whatever epsilon you want. That's the penalty. Mayank Goswami: And every day? Mayank Goswami: You either keep the weight of the expert the same. Mayank Goswami: Or you decrease it by 1 minus epsilon multiplicative factor. Mayank Goswami: Is it clear how these weights are changing for the exports over time? Mayank Goswami: Any questions about how the weights are changing for the exports over time? Samuel Sokol: Sorry if I missed this, so can the experts ever gain? Mayank Goswami: No, they never did. Samuel Sokol: Yep. Okay. Mayank Goswami: Good. So, originally, what's the sum of the weights? Mayank Goswami: What's the sum of all the way? It's N, right? And the total is only going to go down from that point on. Mayank Goswami: They never gain. Mayank Goswami: Okay, so I've told you how to weigh the experts. Mayank Goswami: But I haven't told you how to decide what to do, right? I mean, all this weight business is fine. Mayank Goswami: But now, at every day, remember, I have to make a decision also, right? Mayank Goswami: How do I decide what to do? Mayank Goswami: Well, so here's what I'll do. Every day, I will add up the weights of all the experts. Mayank Goswami: Right? Whatever are the current weights. Mayank Goswami: And… I will add the weights of the experts who are saying sell. Mayank Goswami: And I'll add up the weights of the exports we're seeing by. Mayank Goswami: And whichever weight is more, I do that. Mayank Goswami: Does that make sense? Hanan Latiff: I'm sorry, can you repeat it? Mayank Goswami: Okay, so this is the total weight of the experts on day T, right? Because WIT is the weight of expert I on day T. Mayank Goswami: And I have summed over all I, all the experts. Mayank Goswami: So this is the total weight of all the experts on day T. Mayank Goswami: And… I will divide it by 2. Mayank Goswami: And… If… Okay, maybe the other way is easier. Mayank Goswami: I will add up the weight of all the experts who are telling me to sell on DayT. Mayank Goswami: I will add up the weight of all the experts who are asking me to buy on deity, right? Mayank Goswami: The sum of these two numbers is this numerator, right? Mayank Goswami: Because that's the sum of weights of all the experts. Mayank Goswami: So one of these two numbers has to be at least half. Mayank Goswami: And I'll do whatever. Mayank Goswami: That is. Mayank Goswami: Okay, so… Sorry, say that, Alan? Allen Singleton: What if the sum of the weights for the, different parties are equal? And what do we do? Mayank Goswami: Good, so if they're equal, then you do what we always do, which is you toss a coin and do whatever. Mayank Goswami: Then it doesn't matter what you do. Mayank Goswami: But otherwise, one side will have more weight than the other side. Mayank Goswami: But remember, it's not the number of experts we're looking at. Mayank Goswami: We're looking at which side of experts has more weight. Mayank Goswami: Does this make sense? This is an important distinction. I'm not just taking the majority of experts, I'm taking the weighted majority. Mayank Goswami: Right, so I'm re-weighing the experts. Mayank Goswami: And then, I'm looking at which half of the experts, have more weight. Or, I mean, which… Mayank Goswami: Which of the two decisions have experts with more weight behind them? Mayank Goswami: And I do that. Hanan Latiff: So the more the side with the weighted… more weighted, is it going to be the correct one, or the incorrect? Mayank Goswami: There is no correct… you… it could be wrong. You don't know. You will only find out at the end of the day. Hanan Latiff: Hispanic. Mayank Goswami: I don't know. Mayank Goswami: right now, I'm not talking about any claim. I'm not… Saying, Mayank Goswami: I haven't told you what the guarantee of the algorithm is. Mayank Goswami: But it's not necessarily true. Mayank Goswami: For example, just on day one, right? Mayank Goswami: On day 1, the total weight of the experts is n. Mayank Goswami: Right? Mayank Goswami: But maybe… There's only one good expert, and all the others are fools. Mayank Goswami: Right? Mayank Goswami: So on day one, the one good expert will say sell. Mayank Goswami: But the N-1 bad experts will say bye. Mayank Goswami: And you're gonna buy. Mayank Goswami: Right? So on day one, you made the wrong decision. Mayank Goswami: That could have. Mayank Goswami: There is no guarantee that this algorithm will make the correct decision, i.e. Mayank Goswami: Because it still has to listen to these experts. Mayank Goswami: But is the algorithm clear, what this algorithm is doing? Hanan Latiff: Thank you. Mayank Goswami: So, any questions about the algorithm? Mayank Goswami: Okay. Mayank Goswami: Now, what sort of guarantee do we want to prove, right? Mayank Goswami: We want to say that, let's say, T days have passed, right? Mayank Goswami: This is an algorithm, right? So every day, We not only find out Mayank Goswami: which experts made a mistake, we also find out if B made a mistake, right? If this algorithm made a mistake. Mayank Goswami: Because at the end of the day, it is revealed which decision was the correct decision. Mayank Goswami: So, we want to prove a statement like the number of mistakes made by the weighted majority algorithm, our algorithm. Mayank Goswami: Is, at most, some competitive ratio alpha. Mayank Goswami: Times the number of mistakes made by the best expert. Mayank Goswami: Does this thing highlighted in yellow make sense? Like, that that's the statement we want to prove? Mayank Goswami: So we want to say that the number of mistakes made by this weighted majority algorithm Mayank Goswami: is at most some number alpha. This will be the competitive ratio. Mayank Goswami: Times the number of mistakes made by the best expert at that point. Mayank Goswami: And in fact, we will prove a stronger statement. We will actually show Mayank Goswami: that the number of mistakes this algorithm makes… So again, do people understand what I mean by a mistake made by the algorithm? Mayank Goswami: Like, what is a mistake made by the algorithm? Hanan Latiff: It was, like, it told you to sell and you shouldn't sell, like, it was incorrect. Mayank Goswami: At the end of the day, you find out, right, whether you did the right thing or wrong thing. So this is how many errors you made. Mayank Goswami: And we want to claim that it's at most some number times the number of errors made by the best expert. Mayank Goswami: But we'll actually prove something stronger. We will… let me just show you what we'll prove now. Mayank Goswami: Okay, so… Here, let's just read this theorem statement. Mayank Goswami: So, after T steps, Right? Mayank Goswami: let… Mayank Goswami: M to the T… by the way, this is not raised to the power t. When… in this problem, when T is in the superscript. Mayank Goswami: It is just to denote the day or the time. This is not raised to T. Mayank Goswami: Okay, so empty is the number of mistakes. Mayank Goswami: That our algorithm made so far. Mayank Goswami: So, empty is the number of mistakes our algorithm made. Mayank Goswami: Up until Deity? Mayank Goswami: And let M-I-T… be the number of mistakes that Expert I has made so far. Mayank Goswami: Okay, so take any expert, expert number I, Mayank Goswami: That's how many mistakes this expert has made in the T days. Mayank Goswami: And that's how many mistakes we have made in the T days. Mayank Goswami: And now the theorem statement says. Mayank Goswami: That for every expert, so in particular, this will be true for the best expert. Mayank Goswami: The number of mistakes we made. Mayank Goswami: is, if you ignore this plus for a moment, right? So if you ignore… This part. Mayank Goswami: And what this says is that the number of mistakes we made, Is, at most. Mayank Goswami: 2 times 1 plus epsilon, and remember epsilon you chose, right? Mayank Goswami: times the number of mistakes made by the IATH expert. Mayank Goswami: And this is true for every expert. Mayank Goswami: So, in particular, this is also true for the… Best expert on deity. Mayank Goswami: So, basically, what we're saying is the number of errors we make Mayank Goswami: Is, at most 2 times 1 plus epsilon. Mayank Goswami: times the errors, Made by the best expert. Mayank Goswami: on dating. Mayank Goswami: Because this statement is true for every expert. Mayank Goswami: Which is not a bad guarantee, right? Mayank Goswami: Because whoever is the best expert on a current day. Mayank Goswami: When I say on a current day, I mean so far. I don't mean just today. Mayank Goswami: Right? So this… This is the best expert so far. Mayank Goswami: So what I'm trying to say is this algorithm that we just saw, Its guarantee is? Mayank Goswami: that the number of mistakes it will make until day T, is at most… Mayank Goswami: 2 times 1 plus epsilon. So let's say it chose epsilon to be 0.5. Mayank Goswami: then this is 3, right? 2 times 1.5 is 3. Mayank Goswami: So, 3 times the number of errors made by the… Best expert so far. Mayank Goswami: And the best expert so far is the one who has made the fewest errors. Mayank Goswami: It's clear you cannot beat the best experts so far, right? Mayank Goswami: Because you have all of these other experts trying to distract you. Mayank Goswami: So I cannot match the best experts so far. Mayank Goswami: But, with this algorithm, I matched that best expert up to a pretty close to two factor. Mayank Goswami: That's if this… Plus term was not there. Mayank Goswami: But you can think of the plus term as like this. So what is the plus term? It's 2 times log of n over epsilon. Mayank Goswami: But this is, again, similar to what you saw for the list, update problem. Mayank Goswami: What this means is, For long enough sequences, when the number of mistakes has become longer than this. Mayank Goswami: You are essentially within a factor 2. Mayank Goswami: Right, this is like a fixed number. It doesn't depend on T. Mayank Goswami: So, ignore this for now. Mayank Goswami: And then make sure you understand what this guarantee is. Mayank Goswami: So is the statement of the theorem clear? The meaning behind it? Mayank Goswami: Or is it a… Naufil Faruqi: say errors minus bestie? Mayank Goswami: No, errors of the best. It's like an underscore. Mayank Goswami: I tried to put an underscore. Mayank Goswami: Errors of the best expert. Mayank Goswami: But I mean, what I mean is, this statement is true for all experts. Here, I was any expert. Mayank Goswami: So, in particular, it's true for the best expert. Mayank Goswami: And one subtle thing that maybe you guys would have noticed, or not? Mayank Goswami: The best expert can change over time. Mayank Goswami: There may be an expert who was the best expert on day 50, Mayank Goswami: But on… by the time day 100 appears, there's another expert who's the best expert. Mayank Goswami: Does that make sense? Mayank Goswami: So the best expert is never the same. Mayank Goswami: So this statement… is true for all I and for all T. Samuel Sokol: T is saying, like, within any given day, the expert… We don't… Mayank Goswami: Up until any given day. Samuel Sokol: And T is the number of days? Mayank Goswami: No, empty is the day. Empty is how many mistakes we've made so far. Samuel Sokol: And M sub i of T is how many mistakes that specific expert has made that day. Mayank Goswami: Not that day, so far. Samuel Sokol: Just in general. Mayank Goswami: Yeah, so both these numbers are between 0 and t. Mayank Goswami: Does that make sense? Mayank Goswami: They could be as low as zero. Mayank Goswami: No mistakes. Mayank Goswami: And they could be as large as T, if you made a mistake every day. Samuel Sokol: Yeah. Mayank Goswami: So that's how many mistakes we have made in the first T days. Mayank Goswami: That's how many mistakes Expert I has made in the first T days. Mayank Goswami: We never make roughly more than twice as many mistakes. Mayank Goswami: As any single expert. Mayank Goswami: In particular, we don't make more than roughly twice as many mistakes. Mayank Goswami: As the best expert, Up the left tail. Mayank Goswami: Right, I'm gonna stop the recording.