Select one of the keywords on the left…

ProbabilityProbability Models

Reading time: ~35 min

In this section we will learn how to mathematically represent and reason about randomness. One benefit of having an explicit mathematical model, as opposed to simply applying some set list of rules to probability situations, is that the intuitive approach to probability has serious limitations when analyzing tricky or sophisticated phenomena. Consider the following example.

Example (Exchange paradox)
Two envelopes are placed on the table in front of you, containing X and 2X dollars for some unknown positive number X (you don't know which envelope is which). You choose one of the envelopes and discover $10 inside. You have a choice to switch envelopes; should you?

On one hand, your chance of getting the better envelope was 50% to begin with, and opening the envelope did not provide any information on whether you succeeded. From this perspective, you should be indifferent to switching.

On the other hand, you might reason that the unopened envelope contains either $20 or $5, with a 50% chance of each. So on average the other envelope contains $12.50. from this perspective, you should switch.

How can we adjudicate between these contradictory analyses? It would be very helpful to have model for the situation—that is, a mathematical object together with a way to translate questions about the situation to unambiguous questions about the object. This provides separation of concerns: questions about the model will be math questions and can be answered with mathematical certainty. Any remaining uncertainty about the applicability of conclusions will pertain to whether the model suitably reflects reality.

Our first probability model

Let's set the exchange paradox aside and develop a model for the following simple experiment: two flips of a fair coin. We begin by observing that we can write down all possible outcomes:

\begin{align*}\{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T}), (\texttt{T},\texttt{H}), (\texttt{T},\texttt{T})\}.\end{align*}

This is clearly an important set; let's call it the sample space and denote it as \Omega.

Furthermore, we need a way to specify how likely each outcome is to occur. It seems reasonable in this scenario to believe that each of the four outcomes is equally likely, in which case we should assign a probability value of \frac{1}{4} to each outcome. The general mathematical object which assigns a particular value to each element in a set is a , so we will call this assignment of probability values the probability mass function and denote it as m.

So all together, we have

  • the sample space \Omega, which contains the possible outcomes of the experiment, and
  • the probability mass function m from \Omega to [0,1] which indicates the probability of each outcome in \Omega.

The pair (\Omega,m) is already enough to specify the experiment, but we need a few more translations for the model to be useful.


In the context of the experiment, an event is a predicate whose occurrence can be determined based on the outcome. For example, "the first flip turns up heads" is an event.


Identify a mathematical object in our model (\Omega, m) which can be said to correspond to the phrase "the first flip turns up heads". Which of the following is true of this object?

It is one of the values of the function m
It is the set \Omega
It is a subset of \Omega
It is one of the elements of \Omega

Solution. The outcomes (\texttt{H},\texttt{H}) and (\texttt{H},\texttt{T}) are the ones which satisfy the condition "the first flip turns up heads". Therefore, the event corresponds to a subset of \Omega, namely the subset \{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T})\}.

Explain how to obtain the probability of an event from the probability mass function.

For concreteness, consider \Omega = \{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T}), (\texttt{T},\texttt{H}), (\texttt{T},\texttt{T})\}, a probability mass function which assigns mass \frac{1}{4} to each outcome, and the event \{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T})\}.

Solution. The probability of the event \{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T})\} is the sum of the probabilities of the two outcomes in the event, namely \frac{1}{4} + \frac{1}{4} = \frac{1}{2}.

In general, we sum all of the probability masses of the outcomes in the event to find the probability of the event.

Some common terms for combining and modifying predicates include and, or, and not. For example, we might be interested in the event "the first flip comes up heads and the second does not come up heads, or the first flip comes tails". Each of these corresponds to one of the set-theoretic operations we have learned:


Match each term to its corresponding set-theoretic operation by appropriately sorting the items in the second list. Assume that E and F are events.

For concreteness, you can think about the events "first flip comes up heads" and "second flip comes up heads" for the two-flip probability space we've been considering.

  • the event that E and F both occur

  • the event that E does not occur

  • the event that either E occurs or F occurs

    x-sortable"0") the intersection E \cap F"2") the union E \cup F"1") the complement E^{\mathsf{c}}

Solution. The event that both E and F occur is E \cap F, since E \cap F is the set of outcomes in both E and F.

The event that E does not occur is E^{\mathsf{c}}, since the complement of E includes all the outcomes that are not in E.

The event that either E or F occurs is E \cup F, since E \cup F is the set of outcomes which are in either E or F.

Suppose a group of n friends enter the lottery. For i \in \{1, \dots , n\} let E_i be the event that the $i$th friend wins. Express the following events using set notation.

  • At least one friend loses.
  • All friends win.
  • At least one friend wins.


  • The event that at least one friend loses is \bigcup_{i = 1}^n E_i^c.
  • The event that all friends win is \bigcap_{i=1}^n E_i.
  • The event that at least one friend wins is \bigcup_{i=1}^n E_i.

Since events play a more prominent role than individual outcomes in discussions of probability, we will demote the probability mass function to auxiliary status and instead focus on the function \mathbb{P} from the set of events to [0,1] which assigns to each event the total probability mass therein. For example, for our two-flip experiment, the function \mathbb{P} satisfies

\begin{align*}\mathbb{P}(\{(\texttt{H},\texttt{T})\}) &= \tfrac{1}{4} \\ \mathbb{P}(\{\}) &= 0 \\ \mathbb{P}(\{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T}), (\texttt{T},\texttt{T})\}) &= \tfrac{3}{4} \\ \mathbb{P}(\Omega) &= 1,\end{align*}

and so on.


\begin{align*}\Omega = \{(\texttt{H},\texttt{H}), (\texttt{H},\texttt{T}), (\texttt{T},\texttt{H}), (\texttt{T},\texttt{T})\},\end{align*}

then the number of elements in the domain of \mathbb{P} is .

Solution. The domain of \mathbb{P} is the set of subsets of \Omega. Since \Omega has 4 elements, there are 2 \times 2 \times 2 \times 2 = \boxed{16} elements in the domain of \mathbb{P}.

We call \mathbb{P}(E) = \sum_{\omega \in E} m(\omega) the probability measure associated with the probability mass function m. The pair (\Omega, \mathbb{P}) is called a probability space. Probability measures satisfy the following properties.

Theorem (Properties of a probability measure)

If (\Omega,\mathbb{P}) is a probability space, then

  • \mathbb{P}(\Omega) = 1—"something has to happen"
  • \mathbb{P}(E) \geq 0 for all E \subset \Omega—"probabilities are non-negative"
  • \mathbb{P}(E \cup F) = \mathbb{P}(E) + \mathbb{P}(F) if E and F are mutually exclusive events—"probability is additive"

These are the fundamental properties of a probability measure on a finite sample space \Omega, in the sense that functions from the set of events to [0,1] satisfying the above properties are in one-to-one correspondence with probability mass functions.

One further important property is a consequence of the fundamental ones. It says that if B's occurrence implies A's occurrence, then \mathbb{P}(B) \leq \mathbb{P}(A).

Exercise (Monotonicity)
Use the additivity property and the fact that A = (A \cap B) \cup (A \cap B^{\mathsf{c}}) to show that if B \subset A \subset \Omega, then \mathbb{P}(B) \leq \mathbb{P}(A).

Solution. We have \mathbb{P}(A) = \mathbb{P}(A \cap B) + \mathbb{P}(A \cap B^c) by additivity. Since A \cap B = B and probabilities are non-negative, it follows that

\begin{align*}\mathbb{P}(A) = \mathbb{P}(B) + \mathbb{P}(A \cap B^c) \geq \mathbb{P}(B)\end{align*}

as required.

Exercise (Subadditivity)
Show that \mathbb{P}(A \cup B) \leq \mathbb{P}(A) + \mathbb{P}(B) for all events A and B.

Use this property to show that if A occurs with probability zero and B occurs with probability zero, then the probability that A or B occurs is also zero.

Solution. Define \tilde{A} to be the set of \omega's which are in A but not B, and let \tilde{B} be the set of \omega's which are in B but not A. Then

\begin{align*}\mathbb{P}(A \cup B) = \mathbb{P}(\tilde{A} \cup \tilde{B} \cup (A \cap B)) = \mathbb{P}(\tilde{A}) + \mathbb{P}(\tilde{B}) + \mathbb{P}( A \cap B),\end{align*}

since \tilde{A}, \tilde{B}, and A \cap B are disjoint and together make up A \cup B. Furthermore, since \mathbb{P}(A) = \mathbb{P}(\tilde{A}) + \mathbb{P}(A \cap B) and similarly for B, we have

\begin{align*}\mathbb{P}(A \cup B) &= \mathbb{P}(A) - \mathbb{P}(A \cap B) + \mathbb{P}(B) - \mathbb{P}(A\cap B) + \mathbb{P}(A \cap B) \\\ &= \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B) \leq \mathbb{P}(A) + \mathbb{P}(B),\end{align*}

as desired.

We have \mathbb{P}(A \cup B) \leq \mathbb{P}(A) + \mathbb{P}(B) \leq 0 + 0 = 0 if both A and B have probability zero, so \mathbb{P}(A \cup B) = 0 in that case.

Countable additivity

If \Omega is countably infinite, then the additivity property extends to countable additivity: If E_1, E_2, \ldots is a pairwise disjoint sequence of events, then \mathbb{P}(E_1 \cup E_2 \cup \cdots) = \mathbb{P}(E_1) + \mathbb{P}(E_2) + \cdots.

Exercise (Countable additivity)
Suppose that \Omega is the set of ordered pairs of positive integers, with probability mass m((i,j)) = 2^{-i-j} at each pair (i,j). Show that the probability of the event \{(i,j) \in \Omega : i > 2\} is equal to the sum of the probabilities of the events \{(i,j) \in \Omega : i = t\} as t ranges over \{3, 4, 5, \ldots\}

Solution. The probability of the event \{(i,j) \in \Omega : i > 2\} is the sum of the probability masses of all the points in \Omega which lie to the right of the line x = 2. The probability of the event \{(i,j) \in \Omega : i = t\} is the sum of the probability masses of all of the points on the vertical line x = t. So summing these probabilities over each t value in \{3, 4, 5, \ldots\} amounts to totalling the probability mass right of the line x = 2 in columns. Since positive quantities may be summed in any order, this column-wise sum will indeed yield the total mass right of the line x = 2.

Show that the function m((i,j)) = 2^{-i-j} sums to 1 as (i,j) ranges over the set of ordered pairs of positive integers.

Solution. The sum along the first column is 1/4 + 1/8 + 1/16 + \cdots = 1/2, and the sum along the second column is 1/8 + 1/16 + 1/32 + \ldots = 1/4, and so on. Summing these column sums, we get 1/2 + 1/4 + 1/8 + \cdots = 1, as desired.

Bruno Bruno