Lecture 2 on 01/28/2026 - Probability Basics

Scribes: Allen Singleton and Hanan Latiff

Summary of Lecture

What is Big Data?
Basic Probability Concepts
Discrete Random Variables

Big Data and Randomized Algorithms

In modern computing, many problems involve big data, meaning datasets that are extremely large in size, often too large to store entirely in memory or to process using traditional deterministic algorithms.

Big data problems are characterized by:

Massive input sizes (millions or billions of elements)
Limited memory and time constraints
Data arriving in streams rather than all at once

Because of these constraints, classical algorithms that require multiple passes over the data or exact computations may be infeasible.

Randomized algorithms provide a powerful tool for dealing with big data. These algorithms use randomness in their logic, typically by making random choices during execution.

Randomness allows us to:

Process only a small portion of the data
Approximate answers instead of computing exact results
Achieve good performance with high probability

Instead of guaranteeing correctness in every execution, randomized algorithms guarantee correctness with high probability. This tradeoff is acceptable in many big data applications where speed and scalability are more important than absolute precision.

Randomized algorithms are especially useful when:

Exact solutions are too slow or memory-intensive
Approximate answers are sufficient
The data is noisy or inherently uncertain

Review of Probability Concepts

Before analyzing randomized algorithms, it is important to review basic probability concepts that will be used throughout the course. These concepts help us reason about uncertainty and quantify the likelihood of different outcomes.

A probability space consists of:

A sample space, which is the set of all possible outcomes
Events, which are subsets of the sample space
A probability measure that assigns a value between 0 and 1 to each event

The probability of an event represents how likely it is to occur. Probabilities satisfy the following basic properties:

The probability of any event is between 0 and 1
The probability of the entire sample space is 1
The probability of an impossible event is 0

In the context of randomized algorithms, probabilities are used to analyze:

The likelihood that an algorithm produces a correct result
The expected behavior of an algorithm over random choices
The chance that a rare or undesirable outcome occurs

Rather than guaranteeing deterministic outcomes, randomized algorithms rely on probabilistic guarantees. This means we analyze how often an algorithm succeeds or fails over many possible random executions.

The Complement Rule

Let $A$ be an event in a probability space. The complement of $A$ , denoted by $A^{c}$ , represents the event that $A$ does not occur.

The complement rule states that the probability of an event not occurring is equal to one minus the probability that the event occurs:

\Pr(A^{c}) = 1 - \Pr(A)

This rule follows directly from the fact that an event and its complement together cover the entire sample space. Since the probability of the sample space is $1$ , the probabilities of an event and its complement must add up to $1$ .

In practice, the complement rule is often useful when computing $\Pr(A)$ directly is difficult, but computing $\Pr(A^{c})$ is easier. In such cases, we compute the probability of the complement first and subtract it from $1$ .

In the analysis of randomized algorithms, the complement rule is frequently used to:

Bound the probability that an algorithm fails
Analyze rare or undesirable events
Convert success probabilities into failure probabilities, or vice versa

Independent Events

Two events $A$ and $B$ are said to be independent if the occurrence of one event does not affect the probability of the other event.

Formally, events $A$ and $B$ are independent if $\Pr(A \cap B) = \Pr(A)\Pr(B)$ :

\Pr(A \cap B) = \Pr(A)\Pr(B)

This definition captures the idea that knowing whether $A$ occurs gives no information about whether $B$ occurs, and vice versa. If the equality above does not hold, then the events are dependent.

Independence is a fundamental assumption in many randomized algorithms. Random choices made by an algorithm are often designed to be independent so that probabilities can be multiplied and analyzed more easily.

It is important to note that independence is a strong condition. Even if two events seem unrelated, they may still be dependent unless the product rule above holds exactly.

In algorithm analysis, independence allows us to:

Compute probabilities of multiple events occurring together
Analyze repeated random trials
Simplify probability calculations by separating events

The Binomial Coefficient

The binomial coefficient is used to count the number of ways to choose a fixed number of elements from a larger set, without regard to order.

For integers $n \geq 0$ and $0 \leq k \leq n$ , the binomial coefficient is denoted by:

\binom{n}{k}

and represents the number of ways to choose $k$ elements from a set of $n$ elements.

The binomial coefficient is defined as:

\binom{n}{k} = \frac{n!}{k!(n-k)!}

In probability, binomial coefficients arise naturally when analyzing repeated independent trials, where each trial has two possible outcomes, such as success or failure.

They are especially useful when computing probabilities involving:

The number of ways a certain outcome can occur
Multiple independent random choices
Counting events before assigning probabilities

In the context of randomized algorithms, binomial coefficients help quantify how many different ways an algorithm’s random decisions can lead to the same result. This allows us to combine counting arguments with probability calculations.

Random Variables

A random variable is a function $X: \Omega \rightarrow \mathbb{R}$ , where $\Omega$ is the sample space consisting of all the possible outcomes of the event that $X$ models.

A random variable can be discrete, meaning that its support is finite or countably infinite, or continuous, meaning that its support is uncountable. In this course, we will primarily focus on discrete random variables, as they arise naturally in the analysis of randomized algorithms.

Discrete Random Variables

A discrete random variable takes on a finite or countably infinite set of values, each with an associated probability.

Example (Rolling a fair die): Consider the experiment of rolling a fair six-sided die. The sample space consists of the six possible outcomes. The random variable $X$ maps each outcome to a numerical value in its support:

\mathbb{S}_X = \{1,2,3,4,5,6\}

Since the die is fair, each outcome occurs with equal probability. We can describe the random variable $X$ as follows:

X \sim \begin{cases} 1 & \text{w.p. } \frac{1}{6} \\ 2 & \text{w.p. } \frac{1}{6} \\ 3 & \text{w.p. } \frac{1}{6} \\ 4 & \text{w.p. } \frac{1}{6} \\ 5 & \text{w.p. } \frac{1}{6} \\ 6 & \text{w.p. } \frac{1}{6} \end{cases}

This notation makes explicit both the possible values of the random variable and the probability with which each value occurs.

Properties of Random Variables

A discrete random variable is fully described by its probability mass function (PMF), which assigns a probability to each value in its support.

The probabilities assigned to a random variable satisfy the following basic properties:

For every value $x$ in the support, $\Pr(X = x) \geq 0$
The sum of the probabilities over all possible values is equal to 1

That is,

\sum_{x \in \mathbb{S}_X} \Pr(X = x) = 1

In randomized algorithms, random variables are used to model quantities such as running time, the number of correct outputs, or whether an algorithm succeeds or fails. By defining appropriate random variables, we can analyze the behavior of an algorithm using probability.

This perspective allows us to reason about algorithm performance in terms of likelihood and expectation, rather than exact deterministic outcomes.

Expectation and Variance

The expectation of a discrete random variable, denoted as $\mathbb{E}[X]$ , is a weighted average of all the possible values that $X$ takes on. The expectation is given by:

\mathbb{E}[X] = \sum_{x \in \mathbb{S}_{X}} x \Pr(X = x)

Properties of the Expectation

Property 1 (Shift and Scale of Expectation): If $X$ is a random variable with finite expectation, then:

\mathbb{E}[aX + b] = a\mathbb{E}[X] + b

where $a, b \in \mathbb{R}$ .

Property 2 (Linearity of Expectation): If $X_1, X_2, \ldots, X_n$ are random variables, each with finite expectation $\mathbb{E}[X_1], \mathbb{E}[X_2], \ldots, \mathbb{E}[X_n]$ , then:

\mathbb{E}[X_1 + X_2 + \cdots + X_n] = \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_n]

The variance of a discrete random variable, denoted as $\mathbb{Var}[X]$ , is the squared average distance by which the values that $X$ takes on deviate from $\mathbb{E}[X]$ . The variance is given by:

\mathbb{Var}[X] = \sum_{x \in \mathbb{S}_{X}} \left( x - \mathbb{E}[X] \right)^2 \Pr(X = x)

Equivalently, the variance can be computed as:

\mathbb{Var}[X] = \mathbb{E}[X^2] - \left( \mathbb{E}[X] \right)^2

Properties of the Variance

Property 1 (Shift and Scale of Variance): If $X$ is a random variable with finite variance, then:

\mathbb{Var}[aX + b] = a^2 \mathbb{Var}[X]

where $a, b \in \mathbb{R}$ .

Property 2 (Linearity of Variance): If $X_1, X_2, \ldots, X_n$ are independent random variables, each with finite variance $\mathbb{Var}[X_1], \mathbb{Var}[X_2], \ldots, \mathbb{Var}[X_n]$ , then:

\mathbb{Var}[X_1 + X_2 + \cdots + X_n] = \mathbb{Var}[X_1] + \mathbb{Var}[X_2] + \cdots + \mathbb{Var}[X_n]

The Bernoulli Distribution

A Bernoulli random variable models the number of successes that occur in a single trial with a fixed probability of success, $p$ .

By convention, we say that the Bernoulli random variable can take on either $0$ (indicating a failure) or $1$ (indicating a success). Thus, the support of the Bernoulli distribution is $\mathbb{S}_{X} = \{ 0, 1 \}$ .

Example (Flipping a fair coin once): If we define a success as landing heads and a failure as landing tails, then a single flip of a fair coin is an example of a Bernoulli trial with a fixed probability of success, $p = \frac{1}{2}$ . We denote this random variable as follows:

X \sim \text{Bernoulli} \left( \frac{1}{2} \right)

PMF of the Bernoulli Distribution

The probability mass function of the Bernoulli distribution is given by:

\Pr(X = x) = \begin{cases} 1 - p, & x = 0 \\ p, & x = 1 \end{cases}

Equivalently, we can express the Bernoulli distribution in closed form as:

\Pr(X = x) = p^{x} (1 - p)^{1 - x}

Expectation of the Bernoulli Distribution

If $X \sim \text{Bernoulli} (p)$ , then the expectation of $X$ is given by:

\mathbb{E}[X] = p

Proof

Using the closed-form definition of the Bernoulli distribution, we have:

\begin{align} \mathbb{E}[X] &= \sum_{x \in \mathbb{S}_{X}} x \Pr(X = x) && \text{by definition} \\ &= \sum_{x \in \{ 0, 1 \}} x p^{x} (1 - p)^{1 - x} && \text{substituting} \\ &= 0 \cdot p^{0} (1 - p)^{1 - 0} + 1 \cdot p^{1} (1 - p)^{1 - 1} && \text{evaluating the sum} \\ &= p \end{align}

Variance of the Bernoulli Distribution

If $X \sim \text{Bernoulli} (p)$ , then the variance of $X$ is given by:

\mathbb{Var}[X] = p(1 - p)

Proof

Using the closed-form definition of the Bernoulli distribution, we have:

\begin{align} \mathbb{Var}[X] &= \sum_{x \in \mathbb{S}_{X}} \left( x - \mathbb{E}[X] \right)^2 \Pr(X = x) && \text{by definition} \\ &= \sum_{x \in \{ 0, 1 \}} (x - p)^2 p^{x} (1 - p)^{1 - x} && \text{substituting} \\ &= (0 - p)^2 \cdot p^{0} (1 - p)^{1 - 0} + (1 - p)^2 \cdot p^{1} (1 - p)^{1 - 1} && \text{evaluating the sum} \\ &= p^2 (1 - p) + p(1 - p)^2 && \text{simplifying} \\ &= p(1 - p) [p + (1 - p)] && \text{factoring} \\ &= p(1 - p) \end{align}

The Binomial Distribution

A Binomial random variable models the number of successes that occur in $n$ independent Bernoulli trials with a fixed probability of success, $p$ .

Since it models a count of the number of successes in $n$ trials, the Binomial random variable can take on any whole number between $0$ and $n$ . Thus, the support of the Binomial distribution is $\mathbb{S}_{X} = \{ 0, 1, 2, \ldots, n \}$ .

Example (Flipping a fair coin 50 times): If we define a success as landing heads and a failure as landing tails, then 50 flips of a fair coin is an example of a Binomial procedure with a fixed probability of success in each trial, $p = \frac{1}{2}$ . We denote this random variable as follows:

X \sim \text{Binomial} \left( 50, \frac{1}{2} \right)

PMF of the Binomial Distribution

The probability mass function of the Binomial distribution is given by:

\Pr(X = x) = \binom{n}{x} p^{x} (1 - p)^{n - x}

Alternative Definition of the Binomial Distribution

Very often, it is useful to interpret a Binomial random variable as the sum of $n$ independent Bernoulli random variables, with probability of success $p$ . In other words:

If $X_1, X_2, \ldots, X_n \overset{\text{iid}}{\sim} \text{Bernoulli} (p)$ , then $Y = X_1 + X_2 + \cdots + X_n \sim \text{Binomial} (n, p)$ .

Expressing the Binomial random variable in this way allows for certain expressions to be simplified, such as computing the expectation and the variance of the Binomial distribution.

Expectation of the Binomial Distribution

If $Y \sim \text{Binomial} (n, p)$ , then the expectation of $Y$ is given by:

\mathbb{E}[Y] = np

Proof

Using the alternative definition of the Binomial distribution, we can express:

\begin{align} \mathbb{E}[Y] &= \mathbb{E}[X_1 + X_2 + \cdots + X_n] && \text{by definition} \\ &= \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_n] && \text{by the linearity of expectation} \\ &= p + p + \cdots + p && \text{expectation of the Bernoulli} \\ &= np \end{align}

Variance of the Binomial Distribution

If $Y \sim \text{Binomial} (n, p)$ , then the variance of $Y$ is given by:

\mathbb{Var}[Y] = np(1 - p)

Proof

Using the alternative definition of the Binomial distribution, we can express:

\begin{align} \mathbb{Var}[Y] &= \mathbb{Var}[X_1 + X_2 + \cdots + X_n] && \text{by definition} \\ &= \mathbb{Var}[X_1] + \mathbb{Var}[X_2] + \cdots + \mathbb{Var}[X_n] && \text{by the independence of } X_1, X_2, \ldots, X_n \\ &= p(1 - p) + p(1 - p) + \cdots + p(1 - p) && \text{variance of the Bernoulli} \\ &= np(1 - p) \end{align}