Lecture 02/04/2026 - Query Time Analysis and Tail Bounds
Scribes: Ayesha Jamal and Kyoshi Noda
Summary of the Lecture
Section titled “Summary of the Lecture”- Query time analysis (
) - Variance analysis (
) - Tail Bounds (Concept)
- Markov’s Inequality
- Chebyshev’s Inequality
Query Time Analysis
Section titled “Query Time Analysis”Setup and Indicator Variables
Section titled “Setup and Indicator Variables”We analyze the query time
Since we assume the hash function
Computing Expectation
Section titled “Computing Expectation ”The total query time
Using Linearity of Expectation:
Conclusion: To achieve a constant expected query time
Variance Analysis
Section titled “Variance Analysis”To understand how much the query time fluctuates from the average, we calculate the Variance.
Computing for a Single Item
Section titled “Computing for a Single Item”For a Bernoulli (indicator) variable
Computing
Section titled “Computing ”Since the keys are hashed independently:
If we assume
Knowing the Variance is small (
Tail Bounds
Section titled “Tail Bounds”Concept
Section titled “Concept”Although the expected query time is constant, the worst-case query time is still
In computer science, we typically focus on the upper tail, since we want to bound how long an algorithm can take.
In finance, the lower tail is often more important, as it represents potential losses.
While
- Worst Case: All
items hash to the same bucket. . - Tail Bounds: The “tail” refers to the region of extreme outcomes in the probability distribution (e.g.,
).
Ideally, we would calculate the exact probability:
However, this often has no nice closed form. Instead, we use inequalities to bound this probability.
Markov’s Inequality
Section titled “Markov’s Inequality”Markov’s inequality is one of the most fundamental results in probability theory. It provides a simple bound on the tail probability of a non-negative random variable using only its expectation.
Requirement: You only need to know the expectation
Formula:
Example 1: Hashing With Chaining
Section titled “Example 1: Hashing With Chaining”If we want to know the chance the query takes more than 50 steps (
This gives a “loose” bound. It guarantees the failure rate is at most 2%.
Example 2: Exam Scores
Section titled “Example 2: Exam Scores”Setup: Consider a class with 33 students. After a midterm exam, we are told that the average score is 60. No other information about the distribution of scores is provided. How many students scored at least 90?
Let
The expected number of students scoring at least 90 is:
Conclusion: At most 22 students can have scored 90 or higher.
In simple words, Markov’s inequality basically says if the expectation of a non-negative random variable is fixed, then there is a maximum possible probability that the variable can exceed any threshold
Chebyshev’s Inequality
Section titled “Chebyshev’s Inequality”Requirement: You must know both Expectation
Formula:
Equivalently, this can be written as:
Interpretation
Section titled “Interpretation”The left-hand side of Chebyshev’s inequality measures the probability that
We can decompose the event
Case 1:
This is the right tail (upper tail).
Case 2:
This is the left tail (lower tail).
Together, these two cases cover all outcomes where
Example: Hashing with Chaining
Section titled “Example: Hashing with Chaining”Using our calculated
For the same example of
In simple terms, Chebyshev’s inequality basically says if the variance of a random variable is fixed, then there is a maximum possible probability that the variable can be far away from its mean by more than some amount
Comparison
Section titled “Comparison”- Markov:
(Linear decay) 2% chance. - Chebyshev:
(Quadratic decay) 0.04% chance.
By using more information (Variance), we proved the probability of a slow query is significantly lower than Markov suggested.