Skip to content

Lecture 2 on 01/28/2026 - Probability Basics

Scribes: Allen Singleton and Hanan Latiff

  • What is Big Data?
  • Basic Probability Concepts
  • Discrete Random Variables

In modern computing, many problems involve big data, meaning datasets that are extremely large in size, often too large to store entirely in memory or to process using traditional deterministic algorithms.

Big data problems are characterized by:

  • Massive input sizes (millions or billions of elements)
  • Limited memory and time constraints
  • Data arriving in streams rather than all at once

Because of these constraints, classical algorithms that require multiple passes over the data or exact computations may be infeasible.

Randomized algorithms provide a powerful tool for dealing with big data. These algorithms use randomness in their logic, typically by making random choices during execution.

Randomness allows us to:

  • Process only a small portion of the data
  • Approximate answers instead of computing exact results
  • Achieve good performance with high probability

Instead of guaranteeing correctness in every execution, randomized algorithms guarantee correctness with high probability. This tradeoff is acceptable in many big data applications where speed and scalability are more important than absolute precision.

Randomized algorithms are especially useful when:

  • Exact solutions are too slow or memory-intensive
  • Approximate answers are sufficient
  • The data is noisy or inherently uncertain

Before analyzing randomized algorithms, it is important to review basic probability concepts that will be used throughout the course. These concepts help us reason about uncertainty and quantify the likelihood of different outcomes.

A probability space consists of:

  • A sample space, which is the set of all possible outcomes
  • Events, which are subsets of the sample space
  • A probability measure that assigns a value between 0 and 1 to each event

The probability of an event represents how likely it is to occur. Probabilities satisfy the following basic properties:

  • The probability of any event is between 0 and 1
  • The probability of the entire sample space is 1
  • The probability of an impossible event is 0

In the context of randomized algorithms, probabilities are used to analyze:

  • The likelihood that an algorithm produces a correct result
  • The expected behavior of an algorithm over random choices
  • The chance that a rare or undesirable outcome occurs

Rather than guaranteeing deterministic outcomes, randomized algorithms rely on probabilistic guarantees. This means we analyze how often an algorithm succeeds or fails over many possible random executions.

Let AA be an event in a probability space. The complement of AA, denoted by AcA^{c}, represents the event that AA does not occur.

The complement rule states that the probability of an event not occurring is equal to one minus the probability that the event occurs:

Pr(Ac)=1Pr(A)\Pr(A^{c}) = 1 - \Pr(A)

This rule follows directly from the fact that an event and its complement together cover the entire sample space. Since the probability of the sample space is 11, the probabilities of an event and its complement must add up to 11.

In practice, the complement rule is often useful when computing Pr(A)\Pr(A) directly is difficult, but computing Pr(Ac)\Pr(A^{c}) is easier. In such cases, we compute the probability of the complement first and subtract it from 11.

In the analysis of randomized algorithms, the complement rule is frequently used to:

  • Bound the probability that an algorithm fails
  • Analyze rare or undesirable events
  • Convert success probabilities into failure probabilities, or vice versa

Two events AA and BB are said to be independent if the occurrence of one event does not affect the probability of the other event.

Formally, events AA and BB are independent if Pr(AB)=Pr(A)Pr(B)\Pr(A \cap B) = \Pr(A)\Pr(B):

Pr(AB)=Pr(A)Pr(B)\Pr(A \cap B) = \Pr(A)\Pr(B)

This definition captures the idea that knowing whether AA occurs gives no information about whether BB occurs, and vice versa. If the equality above does not hold, then the events are dependent.

Independence is a fundamental assumption in many randomized algorithms. Random choices made by an algorithm are often designed to be independent so that probabilities can be multiplied and analyzed more easily.

It is important to note that independence is a strong condition. Even if two events seem unrelated, they may still be dependent unless the product rule above holds exactly.

In algorithm analysis, independence allows us to:

  • Compute probabilities of multiple events occurring together
  • Analyze repeated random trials
  • Simplify probability calculations by separating events

The binomial coefficient is used to count the number of ways to choose a fixed number of elements from a larger set, without regard to order.

For integers n0n \geq 0 and 0kn0 \leq k \leq n, the binomial coefficient is denoted by:

(nk)\binom{n}{k}

and represents the number of ways to choose kk elements from a set of nn elements.

The binomial coefficient is defined as:

(nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!}

In probability, binomial coefficients arise naturally when analyzing repeated independent trials, where each trial has two possible outcomes, such as success or failure.

They are especially useful when computing probabilities involving:

  • The number of ways a certain outcome can occur
  • Multiple independent random choices
  • Counting events before assigning probabilities

In the context of randomized algorithms, binomial coefficients help quantify how many different ways an algorithm’s random decisions can lead to the same result. This allows us to combine counting arguments with probability calculations.

A random variable is a function X:ΩRX: \Omega \rightarrow \mathbb{R}, where Ω\Omega is the sample space consisting of all the possible outcomes of the event that XX models.

A random variable can be discrete, meaning that its support is finite or countably infinite, or continuous, meaning that its support is uncountable. In this course, we will primarily focus on discrete random variables, as they arise naturally in the analysis of randomized algorithms.

A discrete random variable takes on a finite or countably infinite set of values, each with an associated probability.

Example (Rolling a fair die): Consider the experiment of rolling a fair six-sided die. The sample space consists of the six possible outcomes. The random variable XX maps each outcome to a numerical value in its support:

SX={1,2,3,4,5,6}\mathbb{S}_X = \{1,2,3,4,5,6\}

Since the die is fair, each outcome occurs with equal probability. We can describe the random variable XX as follows:

X{1w.p. 162w.p. 163w.p. 164w.p. 165w.p. 166w.p. 16X \sim \begin{cases} 1 & \text{w.p. } \frac{1}{6} \\ 2 & \text{w.p. } \frac{1}{6} \\ 3 & \text{w.p. } \frac{1}{6} \\ 4 & \text{w.p. } \frac{1}{6} \\ 5 & \text{w.p. } \frac{1}{6} \\ 6 & \text{w.p. } \frac{1}{6} \end{cases}

This notation makes explicit both the possible values of the random variable and the probability with which each value occurs.

A discrete random variable is fully described by its probability mass function (PMF), which assigns a probability to each value in its support.

The probabilities assigned to a random variable satisfy the following basic properties:

  • For every value xx in the support, Pr(X=x)0\Pr(X = x) \geq 0
  • The sum of the probabilities over all possible values is equal to 1

That is,

xSXPr(X=x)=1\sum_{x \in \mathbb{S}_X} \Pr(X = x) = 1

In randomized algorithms, random variables are used to model quantities such as running time, the number of correct outputs, or whether an algorithm succeeds or fails. By defining appropriate random variables, we can analyze the behavior of an algorithm using probability.

This perspective allows us to reason about algorithm performance in terms of likelihood and expectation, rather than exact deterministic outcomes.

The expectation of a discrete random variable, denoted as E[X]\mathbb{E}[X], is a weighted average of all the possible values that XX takes on. The expectation is given by:

E[X]=xSXxPr(X=x)\mathbb{E}[X] = \sum_{x \in \mathbb{S}_{X}} x \Pr(X = x)

Property 1 (Shift and Scale of Expectation): If XX is a random variable with finite expectation, then:

E[aX+b]=aE[X]+b\mathbb{E}[aX + b] = a\mathbb{E}[X] + b

where a,bRa, b \in \mathbb{R}.

Property 2 (Linearity of Expectation): If X1,X2,,XnX_1, X_2, \ldots, X_n are random variables, each with finite expectation E[X1],E[X2],,E[Xn]\mathbb{E}[X_1], \mathbb{E}[X_2], \ldots, \mathbb{E}[X_n], then:

E[X1+X2++Xn]=E[X1]+E[X2]++E[Xn]\mathbb{E}[X_1 + X_2 + \cdots + X_n] = \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_n]

The variance of a discrete random variable, denoted as Var[X]\mathbb{Var}[X], is the squared average distance by which the values that XX takes on deviate from E[X]\mathbb{E}[X]. The variance is given by:

Var[X]=xSX(xE[X])2Pr(X=x)\mathbb{Var}[X] = \sum_{x \in \mathbb{S}_{X}} \left( x - \mathbb{E}[X] \right)^2 \Pr(X = x)

Equivalently, the variance can be computed as:

Var[X]=E[X2](E[X])2\mathbb{Var}[X] = \mathbb{E}[X^2] - \left( \mathbb{E}[X] \right)^2

Property 1 (Shift and Scale of Variance): If XX is a random variable with finite variance, then:

Var[aX+b]=a2Var[X]\mathbb{Var}[aX + b] = a^2 \mathbb{Var}[X]

where a,bRa, b \in \mathbb{R}.

Property 2 (Linearity of Variance): If X1,X2,,XnX_1, X_2, \ldots, X_n are independent random variables, each with finite variance Var[X1],Var[X2],,Var[Xn]\mathbb{Var}[X_1], \mathbb{Var}[X_2], \ldots, \mathbb{Var}[X_n], then:

Var[X1+X2++Xn]=Var[X1]+Var[X2]++Var[Xn]\mathbb{Var}[X_1 + X_2 + \cdots + X_n] = \mathbb{Var}[X_1] + \mathbb{Var}[X_2] + \cdots + \mathbb{Var}[X_n]

A Bernoulli random variable models the number of successes that occur in a single trial with a fixed probability of success, pp.

By convention, we say that the Bernoulli random variable can take on either 00 (indicating a failure) or 11 (indicating a success). Thus, the support of the Bernoulli distribution is SX={0,1}\mathbb{S}_{X} = \{ 0, 1 \}.

Example (Flipping a fair coin once): If we define a success as landing heads and a failure as landing tails, then a single flip of a fair coin is an example of a Bernoulli trial with a fixed probability of success, p=12p = \frac{1}{2}. We denote this random variable as follows:

XBernoulli(12)X \sim \text{Bernoulli} \left( \frac{1}{2} \right)

The probability mass function of the Bernoulli distribution is given by:

Pr(X=x)={1p,x=0p,x=1\Pr(X = x) = \begin{cases} 1 - p, & x = 0 \\ p, & x = 1 \end{cases}

Equivalently, we can express the Bernoulli distribution in closed form as:

Pr(X=x)=px(1p)1x\Pr(X = x) = p^{x} (1 - p)^{1 - x}

If XBernoulli(p)X \sim \text{Bernoulli} (p), then the expectation of XX is given by:

E[X]=p\mathbb{E}[X] = p
Proof

Using the closed-form definition of the Bernoulli distribution, we have:

E[X]=xSXxPr(X=x)by definition=x{0,1}xpx(1p)1xsubstituting=0p0(1p)10+1p1(1p)11evaluating the sum=p\begin{align} \mathbb{E}[X] &= \sum_{x \in \mathbb{S}_{X}} x \Pr(X = x) && \text{by definition} \\ &= \sum_{x \in \{ 0, 1 \}} x p^{x} (1 - p)^{1 - x} && \text{substituting} \\ &= 0 \cdot p^{0} (1 - p)^{1 - 0} + 1 \cdot p^{1} (1 - p)^{1 - 1} && \text{evaluating the sum} \\ &= p \end{align}

If XBernoulli(p)X \sim \text{Bernoulli} (p), then the variance of XX is given by:

Var[X]=p(1p)\mathbb{Var}[X] = p(1 - p)
Proof

Using the closed-form definition of the Bernoulli distribution, we have:

Var[X]=xSX(xE[X])2Pr(X=x)by definition=x{0,1}(xp)2px(1p)1xsubstituting=(0p)2p0(1p)10+(1p)2p1(1p)11evaluating the sum=p2(1p)+p(1p)2simplifying=p(1p)[p+(1p)]factoring=p(1p)\begin{align} \mathbb{Var}[X] &= \sum_{x \in \mathbb{S}_{X}} \left( x - \mathbb{E}[X] \right)^2 \Pr(X = x) && \text{by definition} \\ &= \sum_{x \in \{ 0, 1 \}} (x - p)^2 p^{x} (1 - p)^{1 - x} && \text{substituting} \\ &= (0 - p)^2 \cdot p^{0} (1 - p)^{1 - 0} + (1 - p)^2 \cdot p^{1} (1 - p)^{1 - 1} && \text{evaluating the sum} \\ &= p^2 (1 - p) + p(1 - p)^2 && \text{simplifying} \\ &= p(1 - p) [p + (1 - p)] && \text{factoring} \\ &= p(1 - p) \end{align}

A Binomial random variable models the number of successes that occur in nn independent Bernoulli trials with a fixed probability of success, pp.

Since it models a count of the number of successes in nn trials, the Binomial random variable can take on any whole number between 00 and nn. Thus, the support of the Binomial distribution is SX={0,1,2,,n}\mathbb{S}_{X} = \{ 0, 1, 2, \ldots, n \}.

Example (Flipping a fair coin 50 times): If we define a success as landing heads and a failure as landing tails, then 50 flips of a fair coin is an example of a Binomial procedure with a fixed probability of success in each trial, p=12p = \frac{1}{2}. We denote this random variable as follows:

XBinomial(50,12)X \sim \text{Binomial} \left( 50, \frac{1}{2} \right)

The probability mass function of the Binomial distribution is given by:

Pr(X=x)=(nx)px(1p)nx\Pr(X = x) = \binom{n}{x} p^{x} (1 - p)^{n - x}

Alternative Definition of the Binomial Distribution

Section titled “Alternative Definition of the Binomial Distribution”

Very often, it is useful to interpret a Binomial random variable as the sum of nn independent Bernoulli random variables, with probability of success pp. In other words:

If X1,X2,,XniidBernoulli(p)X_1, X_2, \ldots, X_n \overset{\text{iid}}{\sim} \text{Bernoulli} (p), then Y=X1+X2++XnBinomial(n,p)Y = X_1 + X_2 + \cdots + X_n \sim \text{Binomial} (n, p).

Expressing the Binomial random variable in this way allows for certain expressions to be simplified, such as computing the expectation and the variance of the Binomial distribution.

If YBinomial(n,p)Y \sim \text{Binomial} (n, p), then the expectation of YY is given by:

E[Y]=np\mathbb{E}[Y] = np
Proof

Using the alternative definition of the Binomial distribution, we can express:

E[Y]=E[X1+X2++Xn]by definition=E[X1]+E[X2]++E[Xn]by the linearity of expectation=p+p++pexpectation of the Bernoulli=np\begin{align} \mathbb{E}[Y] &= \mathbb{E}[X_1 + X_2 + \cdots + X_n] && \text{by definition} \\ &= \mathbb{E}[X_1] + \mathbb{E}[X_2] + \cdots + \mathbb{E}[X_n] && \text{by the linearity of expectation} \\ &= p + p + \cdots + p && \text{expectation of the Bernoulli} \\ &= np \end{align}

If YBinomial(n,p)Y \sim \text{Binomial} (n, p), then the variance of YY is given by:

Var[Y]=np(1p)\mathbb{Var}[Y] = np(1 - p)
Proof

Using the alternative definition of the Binomial distribution, we can express:

Var[Y]=Var[X1+X2++Xn]by definition=Var[X1]+Var[X2]++Var[Xn]by the independence of X1,X2,,Xn=p(1p)+p(1p)++p(1p)variance of the Bernoulli=np(1p)\begin{align} \mathbb{Var}[Y] &= \mathbb{Var}[X_1 + X_2 + \cdots + X_n] && \text{by definition} \\ &= \mathbb{Var}[X_1] + \mathbb{Var}[X_2] + \cdots + \mathbb{Var}[X_n] && \text{by the independence of } X_1, X_2, \ldots, X_n \\ &= p(1 - p) + p(1 - p) + \cdots + p(1 - p) && \text{variance of the Bernoulli} \\ &= np(1 - p) \end{align}