Final Prep: Basic Understanding Questions
Questions explicitly posed to students across lectures 12-15, 17-24 (March 9 — May 6).
Lecture 12 - Streaming & Uniform Sampling
Section titled “Lecture 12 - Streaming & Uniform Sampling”Why Does the Bloom Filter Have No False Negatives?
Section titled “Why Does the Bloom Filter Have No False Negatives?”Why does the Bloom Filter guarantee no false negatives?
Answer
When a key is inserted, all of its hash positions are set to 1, and bits are never reset from 1 back to 0. So if a key was ever inserted, all positions it mapped to must still be 1. Any query for that key will find all positions set to 1 and correctly return “yes.”
Bloom Filter Hash Function Count for 2% FPR
Section titled “Bloom Filter Hash Function Count for 2% FPR”If asked to build a Bloom Filter with a 2% false positive rate, how many hash functions should you use?
Answer
Bloom Filter Bits Per Key for 2% FPR
Section titled “Bloom Filter Bits Per Key for 2% FPR”For the same 2% false positive rate, how many bits per key does the Bloom Filter use?
Answer
Finding a Missing Number in O(log n) Space
Section titled “Finding a Missing Number in O(log n) Space”You’re shown 4,999 of the numbers 1 through 5,000 one at a time in a random order. One number is missing. You’re only allowed to store bits at any time. How do you find the missing number?
Answer
Precompute the total expected sum . Maintain a single running sum , initialized to 0. As each element arrives, update . At the end, the missing number is .
The only value stored is the running sum, which is at most , requiring bits — far fewer than the bits needed to store the full array.
Uniform Sampling Probability: What Do s and t Represent?
Section titled “Uniform Sampling Probability: What Do s and t Represent?”In the uniform sampling algorithm, when the sample size is and 9 elements have passed so far, any given element has probability of being in the sample. Where does the 2 come from, and where does the 9 come from?
Answer
- , the required sample size
- , the number of elements seen so far (current time)
For the sample to be uniform at time , all elements must have an equal chance of being in the sample.
How Is Streaming Sampling Different from Static Sampling?
Section titled “How Is Streaming Sampling Different from Static Sampling?”How is the streaming sampling problem different from the kind of sampling problem you’d encounter in CSCI 323?
Answer
Two key differences:
- Space constraint — you cannot store the entire stream, so you can’t collect all elements and then randomly choose of them after the fact.
- Online / at-every-moment guarantee — at every time , the current sample must already satisfy the uniform guarantee . The sample must continuously evolve as new elements arrive, which a one-shot static algorithm doesn’t need to worry about.
What Happens If You Always Include the New Key?
Section titled “What Happens If You Always Include the New Key?”In the streaming sampling algorithm, if you always put every newly arrived key into the sample (kicking out a random existing key when full), does that give a uniform sample?
Answer
No — your sample would always consist of exactly the last keys seen. Every earlier key would have zero probability of remaining in the sample at later times, violating the uniformity requirement that all elements have equal probability.
What Must Be Stored Beyond the Sample Itself?
Section titled “What Must Be Stored Beyond the Sample Itself?”Can the streaming sampling algorithm be implemented by storing only the sampled keys, or must something else be stored?
Answer
You must also store , the current count of how many keys have been seen so far. Without knowing , you cannot compute the acceptance probability for each newly arriving key. Storing requires only bits, so the total space is .
Inductive Proof Base Case: Why and Not ?
Section titled “Inductive Proof Base Case: Why t=s+1t = s + 1t=s+1 and Not t=1t = 1t=1?”In the inductive correctness proof for the uniform sampling algorithm, the base case is rather than . Why does this make sense?
Answer
For the first arrivals (), the algorithm simply stores every key — no probabilistic decisions are made. Since for , all keys have probability 1 of being in the sample, which trivially satisfies the guarantee. The first genuinely interesting case — where the algorithm must decide whether to accept or reject a new key — is when the th key arrives, making the natural base case.
Zero Term in the Inductive Step
Section titled “Zero Term in the Inductive Step”When proving the inductive step, you use the total probability formula for the event that key is in the sample at time , conditioning on whether it was in the sample at time . One of the two resulting terms is zero. Why?
Answer
The term , because keys can only enter the sample at the moment they first appear in the stream. If was not accepted when it arrived, it has already passed and there is no mechanism to add it retroactively.
Lecture 13 - Morris Counter & Variance Reduction
Section titled “Lecture 13 - Morris Counter & Variance Reduction”How Many Bits to Exactly Count Items?
Section titled “How Many Bits to Exactly Count TTT Items?”How many bits of working memory do you need to keep an exact count of items that have passed through a stream?
Answer
bits — the minimum required to represent the number in binary.
Midterm Review: Cuckoo Hashing vs. Hashing with Chaining
Section titled “Midterm Review: Cuckoo Hashing vs. Hashing with Chaining”What is the main advantage of cuckoo hashing over hashing with chaining?
Answer
worst-case query time. Hashing with chaining guarantees expected query time, but a single query can take in the worst case. Cuckoo hashing stores every key in exactly one of two possible positions, so any query checks at most 2 locations.
Midterm Review: Hashing with Chaining vs. FKS
Section titled “Midterm Review: Hashing with Chaining vs. FKS”What is the main advantage of hashing with chaining over FKS hashing?
Answer
Hashing with chaining has worst-case preprocessing (build) time, since each of the insertions takes worst-case. FKS hashing only guarantees expected build time — there is some probability of needing to rebuild, so its build time is not worst-case .
What Does the Morris Counter Reduce to with Increment Probability 1?
Section titled “What Does the Morris Counter Reduce to with Increment Probability 1?”What does the Morris counter algorithm become if you change the increment probability from to 1 (always increment)?
Answer
A regular exact counter — it increments by 1 every time a key appears, and at the end its value equals exactly (the total number of keys seen). This uses bits, which is what the Morris counter is designed to improve upon.
Simulating a Biased Coin with Only a Fair Coin
Section titled “Simulating a Biased Coin with Only a Fair Coin”You have only a single fair coin (probability heads). How do you simulate an event that occurs with probability ?
Answer
Flip the coin times. Declare “success” if every flip comes up heads. Since each flip is independent, .
This matters in implementing the Morris counter: rather than storing and computing the full probability (which would require storing — defeating the purpose), you just flip a fair coin times and increment only if all flips are heads.
First Thing to Verify About a New Randomized Algorithm
Section titled “First Thing to Verify About a New Randomized Algorithm”If you’ve just designed a randomized algorithm that outputs an estimate , what is the first property you’d want to verify?
Answer
That , i.e., the estimate is unbiased — correct on average. If the algorithm produces the wrong answer in expectation, there is little hope of making it useful. Establishing this is typically the first step before analyzing variance or applying concentration inequalities.
Variance of an Average of Independent Estimates
Section titled “Variance of an Average of Independent Estimates”You have independent random variables , each with variance . If , what is ?
Answer
Derivation:
where independence lets the variance of the sum equal the sum of variances, and the scaling rule pulls out the .
How to Reduce Variance by Repetition
Section titled “How to Reduce Variance by Repetition”Suppose you have an unbiased estimator (correct in expectation) but with high variance. How can you reduce the variance while keeping the estimate unbiased?
Answer
Run independent copies of the algorithm to get estimates , and output their average .
- — the average is still unbiased.
- — variance shrinks by a factor of .
By Chebyshev’s inequality, the probability that deviates from by more than is at most , which can be made as small as desired by increasing .
Lecture 14 - Practice Problems (TA-led)
Section titled “Lecture 14 - Practice Problems (TA-led)”Lecture 14 was led by a TA (Gibyo) working through textbook exercises 4.9 and 4.5. The main explicit question posed mid-solution:
Expectation of the Number of Bad Weak Estimates (Exercise 4.9)
Section titled “Expectation of the Number of Bad Weak Estimates (Exercise 4.9)”In the proof of exercise 4.9, we define if the -th weak estimate is bad (outside the -relative error range), and otherwise. We collect independent weak estimates. What is ?
Answer
Each is a Bernoulli random variable. A “weak estimate” is defined with , meaning each individual estimate is bad with probability at most . By linearity of expectation and independence of the estimates:
Lecture 15 - Approximate Median & Morris+/++
Section titled “Lecture 15 - Approximate Median & Morris+/++”Lecture 15 was again led by a TA (Gibyo). The session covered the -approximate median algorithm and the Morris+ / Morris++ algorithms. Unlike the professor’s lectures, the TA rarely posed explicit conceptual questions to the class. There are no substantial Q&A exchanges to include here.
Lecture 17 - Heavy Hitters & Count Min Sketch
Section titled “Lecture 17 - Heavy Hitters & Count Min Sketch”How to Find the Approximate Median of a Stream
Section titled “How to Find the Approximate Median of a Stream”If you are given a stream of numbers, how would you find an approximate median?
Answer
Take uniform samples from the stream and return the median of those samples.
This gives an -approximate median — meaning the returned value has rank within of the true median — with probability at least . No repetition or averaging step is needed beyond choosing a large enough sample size .
Using Count Min Sketch for Top-K: The Role of a Min-Heap
Section titled “Using Count Min Sketch for Top-K: The Role of a Min-Heap”Suppose you have a Count Min Sketch that can estimate the frequency of any item. How do you use it to maintain the top- most frequent items as the stream arrives?
Answer
Maintain a min-heap of size , keyed by estimated frequency. Initialize it with the first items.
Whenever a new item appears in the stream, query its estimated frequency from the CMS. If exceeds the frequency of the current minimum in the heap (the least popular of the tracked items), evict that minimum and insert the new item.
This keeps the heap holding the current top- candidates at all times, using only extra space and update time per arrival.
CMS Query Output
Section titled “CMS Query Output”When querying the Count Min Sketch for the frequency of item , what is the final output?
Answer
Apply every row’s hash function to : compute to obtain one cell index per row. Read the counter stored in each of those cells and return the minimum value among them. This is why the data structure is called the count min sketch.
Why Return the Minimum?
Section titled “Why Return the Minimum?”Why does the Count Min Sketch return the minimum counter value across all rows, rather than, say, the maximum or the average?
Answer
Whenever two items hash to the same cell in a row, their counts collide and that cell’s counter gets inflated by items other than . The minimum across rows is the counter that has been inflated the least, making it the closest estimate to the true frequency. Taking the maximum or average would allow a single heavily-collided row to dominate the result.
Can the Count Min Sketch Underestimate?
Section titled “Can the Count Min Sketch Underestimate?”Can the Count Min Sketch ever return — an estimate smaller than the true frequency?
Answer
No. Every time appeared in the stream, its designated cell in every row was incremented by 1. Counters are never decremented. So each of those cell values is at least , and therefore so is their minimum. The Count Min Sketch can only overestimate — it never underestimates.
How Is the CMS Guarantee Different from the Standard Relative-Error Guarantee?
Section titled “How Is the CMS Guarantee Different from the Standard Relative-Error Guarantee?”The Count Min Sketch guarantees with probability at least . How does this differ from the usual relative-error guarantee?
Answer
In a standard relative-error guarantee, the error is bounded by — a fraction of the true frequency of the queried item. For the CMS, the error bound is , where is the total stream length, regardless of how large or small is.
Since for any item, , so this is a weaker, absolute error bound. A matching lower bound shows that a relative-error guarantee requires storing the full stream, so the absolute bound is the best achievable in sublinear space.
Applying Markov to a Single CMS Row
Section titled “Applying Markov to a Single CMS Row”The expected error in one row’s counter for item is at most (where columns). Using Markov’s inequality, what is the probability that this row’s error exceeds ?
Answer
By Markov’s inequality, . Here:
So each row independently has at most a chance of its error exceeding the allowed tolerance.
Why Must All Rows Fail for the CMS to Give a Bad Answer?
Section titled “Why Must All Rows Fail for the CMS to Give a Bad Answer?”For the Count Min Sketch to return an overestimate larger than , what must be true of all rows simultaneously?
Answer
The CMS output is the minimum counter across all rows. For this minimum to exceed , every row’s counter must exceed that threshold — all bad events must occur together.
If even one row has an error within , the minimum is controlled and the guarantee holds. Because the rows use independent hash functions, the probability that all rows simultaneously have error greater than is at most . With rows, this is .
Lecture 18 - Frequency Moments & AMS Sampling
Section titled “Lecture 18 - Frequency Moments & AMS Sampling”Where Does Show Up?
Section titled “Where Does E[X2]\mathbb{E}[X^2]E[X2] Show Up?”If you have a random variable , for what purpose would you ever need to compute ?
Answer
In computing the variance:
The second moment is needed any time you want to measure how spread out a distribution is around its mean.
The 0th Frequency Moment
Section titled “The 0th Frequency Moment”What does the 0th frequency moment of a stream represent?
Answer
This counts the number of distinct elements in the stream. Each item that has appeared at least once contributes (since for ), and items that never appeared contribute . Estimating efficiently is the problem solved by HyperLogLog.
The 1st Frequency Moment
Section titled “The 1st Frequency Moment”What does the 1st frequency moment represent, and what streaming algorithm computes it?
Answer
This is the total number of elements in the stream — its length — since summing all frequencies counts every arrival. The Morris counter is precisely the algorithm for approximating in sublinear space.
AMS Sampling: Output When Sampling a Rare Item
Section titled “AMS Sampling: Output When Sampling a Rare Item”Consider the stream (length ). If the AMS algorithm samples element (which appears only once), what is the output?
Answer
Sampling gives (it appears exactly once at or after the sampled position). The second-moment output formula is :
AMS Sampling: Output When Sampling the First Occurrence of a Frequent Item
Section titled “AMS Sampling: Output When Sampling the First Occurrence of a Frequent Item”In the same stream, if the algorithm samples the first occurrence of element (which appears a total of 3 times), what is the output?
Answer
Sampling the first occurrence of means (all three occurrences fall at or after that position). The output is:
Boosting a Single Unbiased Estimate: General Strategy
Section titled “Boosting a Single Unbiased Estimate: General Strategy”AMS sampling produces a single estimate that is correct in expectation but has high variance. What is the general strategy for converting this into an -guarantee?
Answer
The standard two-level strategy from exercise 4.9:
- Run independent copies of AMS and take their average. This produces a “weak estimate” that is within of the true answer with probability at least (variance is reduced by averaging).
- Run independent copies of step 1 and take their median. By the median amplification argument, the final output achieves the full -guarantee.
This mirrors the Morris+ / Morris++ construction applied to AMS sampling.
Applying Chernoff to AMS: What Are We Solving For?
Section titled “Applying Chernoff to AMS: What Are We Solving For?”When applying the Chernoff bound in the analysis of AMS sampling, what is the goal of the calculation?
Answer
We want to find the minimum number of independent copies of the AMS estimator that must be averaged in order to achieve the desired error guarantee.
We know the estimator is correct in expectation. We apply the generalized Chernoff bound (for random variables bounded in rather than just ) to bound the probability that the average of copies deviates from the true answer by more than . Solving for the smallest such that this probability stays at or below gives the required number of replications.
Lecture 19 - AMS Sampling (cont.) & Counting Distinct Elements
Section titled “Lecture 19 - AMS Sampling (cont.) & Counting Distinct Elements”Why Does Averaging T Copies Preserve the Expected Value?
Section titled “Why Does Averaging T Copies Preserve the Expected Value?”When you run independent copies of AMS sampling and average their outputs to get , why is , where is the output of one copy?
Answer
By linearity of expectation. Since all copies are identically distributed:
Does Running More Copies Make the Error Probability Better or Worse?
Section titled “Does Running More Copies Make the Error Probability Better or Worse?”In the AMS analysis, as you increase the number of parallel copies being averaged, does the probability of a large error increase or decrease?
Answer
It decreases. Both from intuition (more samples give a better estimate) and from the Chernoff expression: the probability bound is , which shrinks exponentially as grows. Running more copies always improves the guarantee.
What Is the Problem with the Exact Expression for ?
Section titled “What Is the Problem with the Exact Expression for TTT?”After applying Chernoff to derive the exact minimum number of copies needed for an guarantee, what prevents you from using that formula directly?
Answer
The formula depends on two unknown quantities:
- — the frequency of the most frequent item in the stream, which you don’t know ahead of time.
- — the second frequency moment, which is precisely the quantity you are trying to compute.
Since neither is known before processing the stream, the formula can’t be evaluated. The fix is to upper-bound (a provable lemma), replacing the unknown expression with , which is a computable quantity.
What Streams Maximize and Minimize ?
Section titled “What Streams Maximize and Minimize F=∑f(i)2F = \sum f(i)^2F=∑f(i)2?”The second frequency moment depends on how stream elements are distributed. Which stream structure maximizes , and which minimizes it?
Answer
- Maximized when all elements are the same item: that item has frequency and all others have frequency , giving .
- Minimized when the stream is spread as evenly as possible across all items: each item has frequency , giving .
So is always in the range , and a high signals a skewed distribution (few items dominate), while a low signals a more uniform one.
Counting Distinct Elements Without Space Constraints
Section titled “Counting Distinct Elements Without Space Constraints”If you had no space constraint, how would you count the number of distinct elements in a stream?
Answer
Maintain a running set . Whenever a new element arrives, check if . If it is not, insert ; if it already is, skip it. At any point, output as the count of distinct elements seen so far.
This works but uses bits, where is the number of distinct elements and each element is a number between and requiring bits — far too much for a streaming setting.
Expected Position of the Minimum of Two Random Darts
Section titled “Expected Position of the Minimum of Two Random Darts”You throw two darts independently and uniformly at random on the interval . What is the expected position of the smaller (leftmost) of the two darts?
Answer
The intuition: two darts divide the interval into three equal pieces on average, so the left dart lands at and the right dart at .
More generally, if you throw darts uniformly on , the expected minimum is . This fact is the key to understanding why the Flajolet-Martin ideal algorithm works.
Lecture 20 - Flajolet-Martin Ideal Algorithm
Section titled “Lecture 20 - Flajolet-Martin Ideal Algorithm”How Many Hash Values Does the FM Ideal Algorithm Compute?
Section titled “How Many Hash Values Does the FM Ideal Algorithm Compute?”In the Flajolet-Martin ideal algorithm, the stream may have length with many repetitions. Over the entire stream, how many distinct hash values are actually computed and kept track of?
Answer
Only distinct hash values are computed — one per distinct element, since the hash function is fixed and deterministic. If the same element appears ten times, it hashes to the same value every time. So repeated appearances of an item never produce a new hash value; only a genuinely new element does.
Why Does the FM Algorithm Subtract 1?
Section titled “Why Does the FM Algorithm Subtract 1?”The FM ideal algorithm outputs , where is the minimum hash value. Why the ?
Answer
If is the number of distinct elements, we proved that:
So in expectation, and subtracting 1 gives approximately — the quantity we want to output.
Exam Question: Tracking the Maximum Hash Instead of the Minimum
Section titled “Exam Question: Tracking the Maximum Hash Instead of the Minimum”You are implementing the FM ideal algorithm, but your friend accidentally tracks the maximum hash value instead of the minimum. Is the algorithm salvageable? If so, what output formula should be used?
Answer
The algorithm is not doomed. If is the number of distinct elements, then:
So in expectation, and therefore . The corrected output is:
The proof follows the same structure as the minimum case, but uses the complementary probability: the probability that all hashes land to the left of a line at position is , and integrating gives the expectation of the max.
Why Is the Continuous-Hash Algorithm Called “Ideal”?
Section titled “Why Is the Continuous-Hash Algorithm Called “Ideal”?”The Flajolet-Martin analysis begins with a hash function mapping items to real numbers in . Why is this called the “ideal” algorithm?
Answer
Because such a hash function cannot actually be implemented in finite space. A real number in (like ) may require infinite bits to represent exactly. Since the entire goal of the streaming setting is to save space, an algorithm that requires infinite precision to store a single hash value is impractical — hence “ideal.” It serves only to convey the main idea before showing the practical approximation.
HyperLogLog: Why Output ?
Section titled “HyperLogLog: Why Output 2p+12^{p+1}2p+1?”The HyperLogLog algorithm tracks the position of the leftmost least-significant-bit across all hash bit vectors. Why is the output (and not, say, or itself)?
Answer
For distinct elements, the expected number of elements whose hash ends in exactly zeros (i.e., whose least-significant-bit is at position ) is:
The largest position we observe is the one where this count drops to roughly . Setting and solving gives , which is why we output that quantity.
Lecture 21/22 - HyperLogLog Analysis & Online Algorithms
Section titled “Lecture 21/22 - HyperLogLog Analysis & Online Algorithms”Bernoulli Variance vs. Expectation
Section titled “Bernoulli Variance vs. Expectation”In the HyperLogLog analysis, why is the variance of each indicator Bernoulli variable guaranteed to be no larger than its expectation?
Answer
For a Bernoulli with success probability :
since . Multiplying by something at most 1 can only make it smaller or equal. This lets you upper-bound the variance of indicator sums by their expectation — a convenient shortcut throughout the HyperLogLog proof.
Ski Rental: What Is the Competitive Ratio?
Section titled “Ski Rental: What Is the Competitive Ratio?”In the ski rental problem, skis cost \P$1NP-1P$ if you’re still there. What is the competitive ratio?
Answer
At most 2. There are two cases:
- If : you only rent, spending total — exactly what OPT spends. Ratio = 1.
- If : you spend (rent days 1 through , buy on day ). OPT spends (just buy on day 1). Ratio = .
In the worst case, the ratio approaches , so the competitive ratio is at most .
Pizza Finding: Why Does “Walk to One End First” Have a Bad Competitive Ratio?
Section titled “Pizza Finding: Why Does “Walk to One End First” Have a Bad Competitive Ratio?”You are in room 0 of a corridor with rooms on each side. Pizza is in room (unknown). One algorithm: walk all the way to one end; if not found, walk to the other. Why is this algorithm’s competitive ratio bad?
Answer
This algorithm’s cost is at most : at most to reach one end, to return, then to reach the pizza on the other side.
OPT costs (walk straight there).
The competitive ratio is:
If is small (e.g., pizza is in room 1), this ratio grows like — it depends on , the corridor length. A good online algorithm should have a constant competitive ratio, not one that scales with the input size.
Pizza Finding: Why Does the Zigzag Algorithm Still Fail?
Section titled “Pizza Finding: Why Does the Zigzag Algorithm Still Fail?”The zigzag algorithm alternates between checking room 1, , 2, , , , … until the pizza is found. Why does this also have a poor competitive ratio?
Answer
If the pizza is in room , this algorithm travels roughly total distance before finding it (you visit rooms in each direction, multiple times). OPT travels .
The competitive ratio is at least:
This still depends on , which could be as large as . A competitive ratio of is not a constant, so the zigzag algorithm is not competitive in the desired sense either. The fix is to turn at powers of 2 (i.e., visit rooms ), which achieves a constant competitive ratio of at most .
Lecture 23 - List Update Problem & Multiplicative Weight Updates
Section titled “Lecture 23 - List Update Problem & Multiplicative Weight Updates”What Is a Bad Sequence for the “Do Nothing” List Algorithm?
Section titled “What Is a Bad Sequence for the “Do Nothing” List Algorithm?”You maintain a linked list of keys. One algorithm never moves any accessed key (it just walks to the key and leaves it in place). What is the worst sequence for this algorithm, and what is its cost?
Answer
The worst sequence is one that repeatedly requests the key at the very end of the list — for example, the sequence of length .
Since the algorithm never moves anything, the key stays at position every time. Each access costs , so the total cost is .
By contrast, an algorithm that moves the first accessed key to the front of the list would pay once and then for each subsequent access of the same key — a total of roughly . Its cost on this sequence is at most , so the “do nothing” algorithm’s competitive ratio on this sequence is .
Move-to-Front: Cost After the First Access
Section titled “Move-to-Front: Cost After the First Access”In the list update problem, the “move to front” heuristic moves an accessed key to the front of the list every time it is accessed. On the sequence (always requesting the last element), how much does the algorithm pay for the first access of , and how much for each subsequent access?
Answer
- First access: — the algorithm must walk from the front to position .
- Each subsequent access: — after the first access, is moved to the front of the list and stays there.
So the total cost over accesses is , which is far cheaper than the “do nothing” algorithm’s .
What Is a Bad Sequence for the Ordering-by-Frequency Algorithm?
Section titled “What Is a Bad Sequence for the Ordering-by-Frequency Algorithm?”The “order by frequency” algorithm maintains the linked list in decreasing order of how many times each key has been requested so far. What is a bad access sequence for this algorithm?
Answer
Access item exactly times, then item exactly times, then item exactly times, and so on up to item — a total sequence length of .
After the first half of the sequence (items through have each been accessed times), the first half of the list is occupied by items , all with frequency . Items through have frequency and sit in the second half.
Now accessing item costs at least each time (it is in the second half of the list), and this is true for all remaining accesses. Total cost .
The “move to front” algorithm achieves on the same sequence, so the ordering-by-frequency competitive ratio is at least — just as bad as doing nothing.
What Exactly Is the Competitive Ratio?
Section titled “What Exactly Is the Competitive Ratio?”What is the formal definition of the competitive ratio of an online algorithm?
Answer
The competitive ratio is the maximum over all possible input sequences of the ratio:
It is not simply the worst-case cost of the algorithm alone — an algorithm can have a high cost on a sequence where OPT also has a high cost, and the ratio might still be small. What matters is how much worse the online algorithm does compared to the offline optimal on the same input.
Can Experts Gain Weight in the Multiplicative Weight Updates Algorithm?
Section titled “Can Experts Gain Weight in the Multiplicative Weight Updates Algorithm?”In the MWU algorithm, experts start with weight and may be penalized by a factor of each time they make a mistake. Can an expert ever have its weight increased?
Answer
No. The algorithm only ever decreases weights (multiplying by on mistakes) or leaves them unchanged (on correct predictions). There is no mechanism for an expert’s weight to increase. Consequently, the total weight is non-increasing over time, starting at and only going down.
Lecture 24 - Experts Theorem Proof & Paging
Section titled “Lecture 24 - Experts Theorem Proof & Paging”Simplified Experts Setting: One Expert Is Always Correct
Section titled “Simplified Experts Setting: One Expert Is Always Correct”Suppose you have experts and you are told that one of them is always correct. What strategy guarantees the fewest mistakes, and how many mistakes does it make in the worst case?
Answer
Follow an expert until they make a mistake, then drop them and pick another. Since the one always-correct expert never makes a mistake, you only ever drop the wrong ones. Each expert you drop made at most one mistake (the one that caused you to drop them), and you start with experts, so you make at most mistakes total. After that, only the always-correct expert remains, and you make no further mistakes.
This case is simpler than the general MWU setting, where no expert is guaranteed to be always correct.
In the MWU algorithm, expert ‘s weight on day is , where is the number of mistakes expert has made up to day . Why?
Answer
Each expert starts with weight . Every time expert makes a mistake, its weight is multiplied by . Since expert has made mistakes over days:
The number of mistakes is exactly the exponent because each mistake contributes exactly one factor of .
Why Does the Potential Function Start at ?
Section titled “Why Does the Potential Function Start at NNN?”In the MWU potential function proof, denotes the total weight on day . Why is ?
Answer
On day 1, every expert has weight (no mistakes have been made yet, so no weights have been penalized). There are experts, so:
How Does Change Over Time?
Section titled “How Does Φt\Phi^tΦt Change Over Time?”As days pass, what happens to the potential function ?
Answer
either decreases or stays the same — it never increases. This is because expert weights are only ever penalized (multiplied by ) or left unchanged; they are never increased. Since the total weight is the sum of individual weights that can only go down, the potential can only go down.
More precisely: on any day the algorithm makes a mistake, at least half the total weight was behind the wrong decision, and those experts’ weights get multiplied by . This forces:
so each mistake shrinks the potential by a factor of at least .
LRU vs. FIFO: What Is the Difference?
Section titled “LRU vs. FIFO: What Is the Difference?”In the paging problem, both LRU (least recently used) and FIFO (first in, first out) are cache eviction policies. How are they different?
Answer
- LRU: when a cache miss forces an eviction, evict the item in cache that was requested least recently — i.e., the item that has gone the longest without being asked for.
- FIFO: evict the item that was brought into cache the earliest — the one that has been sitting in cache the longest, regardless of whether it was requested recently.
They differ when an old item is still actively used. For example: items are cached (item was brought in first). Item is then requested 50 more times. When a cache miss on item requires an eviction:
- FIFO evicts item (it was brought in first).
- LRU evicts item (item was requested most recently).
Optimal Paging with Full Future Knowledge
Section titled “Optimal Paging with Full Future Knowledge”If you could see the entire future request sequence, what is the optimal algorithm for deciding which cached item to evict on a cache miss?
Answer
Farthest in Future (proved optimal by Bélády): when a cache miss occurs, look at every item currently in cache and find the one that will not be requested again until the farthest point in the future. Evict that item.
Intuitively, keeping items you will need soon and evicting items you won’t need for a long time minimizes future cache misses. Counting frequencies ignores the order of future requests and can perform worse — a very frequent item that is only needed at the very end is not worth keeping now.
Is LRU Being -Competitive a Good Result?
Section titled “Is LRU Being KKK-Competitive a Good Result?”Both LRU and FIFO are proven to be -competitive, where is the cache size. Is this a good guarantee?
Answer
No — can be very large (e.g., thousands for a megabyte-sized cache), making the bound practically useless. Furthermore, it is also proved that no online algorithm can achieve a competitive ratio better than , so -competitiveness is the best possible in the standard online setting. This makes the paging problem essentially hopeless without additional assumptions.
Resource Augmentation: LRU with a Larger Cache
Section titled “Resource Augmentation: LRU with a Larger Cache”In the resource augmentation model, the online algorithm has a cache of size while the optimal offline algorithm is restricted to a smaller cache of size . If , what is LRU’s competitive ratio under this comparison?
Answer
With , the competitive ratio is:
So LRU is 2-competitive when given a cache twice the size of OPT’s. The intuition: being unable to see the future is a significant handicap, but doubling the cache size roughly compensates for it, bringing the competitive ratio down to a small constant.