Lecture 25 (05/11/2026) - Finish Paging; Dimensionality Reduction & Johnson-Lindenstrauss Lemma; Introduce Nearest Neighbor Search
Scribes: Alisha Adhikari and Saartaj Alam
Summary of the lecture
Section titled “Summary of the lecture”- Review of paging algorithms and competitive analysis
- Resource augmentation and randomized paging algorithms
- Introduction to high-dimensional geometry
- Curse of dimensionality and motivation for dimension reduction
Paging Problem
Section titled “Paging Problem”Recall the paging problem discussed in the previous lecture. We are given a cache that can only store a limited number of items. Requests for items arrive one at a time, and whenever the cache becomes full, we must decide which item to evict from the cache.
Several heuristics for solving the paging problem were discussed previously:
- Least Recently Used (LRU)
- Least Frequently Used (LFU)
- First In First Out (FIFO)
If the future request sequence is known in advance, then the optimal strategy is the Farthest-in-Future algorithm. Whenever an eviction is required, this algorithm removes the item whose next occurrence is farthest in the future.
However, in the online setting, future requests are unknown. Online algorithms only know the requests that have already appeared.
Competitive Analysis
Section titled “Competitive Analysis”A paging algorithm is said to be -competitive if the number of cache misses produced by the algorithm is at most times the number of cache misses produced by the optimal offline algorithm.
For a cache of size , both LRU and FIFO are known to be -competitive.
A cache miss occurs whenever a requested item is not currently stored in the cache.
Resource Augmentation
Section titled “Resource Augmentation”To improve the competitive ratio, the lecture introduced the idea of resource augmentation.
Suppose the online algorithm has cache size , while the optimal offline algorithm has a smaller cache size where
In this setting, LRU and FIFO achieve the competitive ratio
As an example, suppose
Then,
Thus, if the online algorithm is allowed roughly twice the cache size of the optimal offline algorithm, then LRU and FIFO become -competitive.
Randomized Paging Algorithms
Section titled “Randomized Paging Algorithms”Both LRU and FIFO are deterministic algorithms because they do not use randomness.
If randomization is allowed, then significantly better competitive ratios can be achieved. In particular, there exist randomized paging algorithms with competitive ratio approximately
This is an exponential improvement over the deterministic -competitive bound.
Introduction to Dimension Reduction
Section titled “Introduction to Dimension Reduction”Dimension reduction is widely used in:
- Machine learning
- High-dimensional statistics
- Big data analysis
The goal is to reduce the dimensionality of data while preserving important geometric properties.
Review of Euclidean Distance
Section titled “Review of Euclidean Distance”Consider two points in :
The Euclidean () distance between them is
More generally, for points in ,
the distance is
High-Dimensional Data
Section titled “High-Dimensional Data”Suppose we are given a dataset consisting of points in :
Each point contains coordinates:
Subscripts denote different points, while superscripts denote coordinates within a point.
Curse of Dimensionality
Section titled “Curse of Dimensionality”Many algorithms are polynomial in the number of points but exponential in the dimension . For example, an algorithm may require runtime such as
When is small, such runtimes are manageable. However, modern datasets often contain very high-dimensional data.
For example, a grayscale image contains
pixels. By storing each pixel intensity as a coordinate, the image can be represented as a point in .
Thus, even low-resolution images naturally generate high-dimensional vectors.
Dimension Reduction Mapping
Section titled “Dimension Reduction Mapping”Dimension reduction attempts to map points from a high-dimensional space into a lower-dimensional space.
Suppose we define a mapping
where
Applying to every point produces transformed points
in a lower-dimensional space.
The goal is to preserve pairwise distances approximately. This idea leads to the introduction of the Johnson—Lindenstrauss Lemma.
The Johnson-Lindenstrauss Lemma
Section titled “The Johnson-Lindenstrauss Lemma”Introduction
Section titled “Introduction”For any and any integer , let
Then, for any set of points in dimensions, there exists a function
such that for any two points and in :
This means that the function maps every point to a lower-dimensional space, and all distances are preserved up to a factor of .
Observations
Section titled “Observations”-
- The new dimension is independent of the original dimension. Regardless of how large is, the target dimension is only . Since is much smaller than , and is generally much smaller than , we see a significant reduction.
- The number of points doesn’t change. Only the dimension of each point is reduced.
- Distances are preserved. is the distance between the images of and in the lower-dimensional space. It is within a factor of the original distance .
Random Hyperplanes and Projection
Section titled “Random Hyperplanes and Projection”Definition
Section titled “Definition”A hyperplane is a subspace that is exactly one less dimension than the space that it exists in. For example, in 3D space, a hyperplane is a flat plane (2D). In 2D space, a hyperplane is a line (1D).
Constructing a Random Hyperplane
Section titled “Constructing a Random Hyperplane”To construct a random -dimensional hyperplane in , take random unit vectors in . Let be the vector space spanned by and the origin.
Example. In , to reduce from 3 dimensions to 2, we pick 2 random unit vectors, they can be called and . They define a unique 2-dimensional plane .
Given a hyperplane , the map is the orthogonal projection onto :
It follows that if already lies on , then . Otherwise, find the perpendicular line from to , and the intersection point is .
Benefits of Randomness
Section titled “Benefits of Randomness”How do we determine a “bad” hyperplane? If all points are nearly perpendicular to the dataset, the projection will occur to nearly the same point, thus destroying distance information. With a randomly chosen hyperplane, it is very unlikely that it will poorly align with any dataset. However, it’s not impossible. In the event that the chosen hyperplane fails the distance-preservation guarantee, we can simply repeat the process witha new, random hyperplane. Since the probability of success if polynomially large, a “good” hyperplane can be found in polynomial time.
Nearest Neighbor Search
Section titled “Nearest Neighbor Search”This applies to machine learning as well, abstract as it may seem.
The problem statement is the following:
Given points in , preprocess them so that given a query , return the closest point in the dataset:
Nearest neighbor search forms the basis of the nearest neighbor classifier in machine learning. The idea is that with a labeled training set, we can classify new images by finding its nearest neighbor in the training set, and using that neighbor’s label. In the euclidean space, similar images lie closer together.
The search can be incredibly expensive, but with JL, we can reduce the dimension from to , making the search drastically cheaper while approximately preserving the identity of the nearest neighbor.