Glossary

Select one of the keywords on the left…

ProbabilityProbability Distributions

Reading time: ~20 min

Given a probability space (\Omega, \mathbb{P}) and a random variable X, the distribution of X tells us how X distributes probability mass on the real number line. Loosely speaking, the distribution tells us where we can expect to find X and with what probabilities.

Definition (Distribution of a random variable)
The distribution (or law) of a random variable X is the probability measure on \mathbb{R} which maps a set A \subset \mathbb{R} to \mathbb{P}(X \in A).

Exercise
Suppose that X represents the amount of money you're going to win with the lottery ticket you just bought. Suppose that \nu is the law of X. Then \nu((-\infty,0)) = , \nu(\{0\}) = , and \nu([10000,\infty]) = .

We can think of X as pushing forward the probability mass from \Omega to \mathbb{R} by sending the probability mass at \omega to X(\omega) for each \omega \in \Omega. The probability masses at multiple \omega's can stack up at the same point on the real line if X maps the \omega's to the same value.

The distribution of a discrete random variable is the measure on $\mathbb{R}$ obtained by pushing forward the probability masses at elements of the sample space to the locations of their images on the real line.

Exercise
A problem on a test requires students to match molecule diagrams to their appropriate labels. Suppose there are three labels and three diagrams and that a student guesses a matching uniformly at random. Let X denote the number of diagrams the student correctly labels. What is the probability mass function of the distribution of X?

Solution. The number of correctly labeled diagrams is an integer between 0 and 3 inclusive. Suppose the labels are \mathrm{A},\mathrm{B},\mathrm{C}, and suppose the correct labeling sequence is ABC (the final result would be the same regardless of the correct labeling sequence). The sample space consists of all six possible labeling sequences, and each of them is equally likely since the student applies the labels uniformly at random. So we have

\begin{align*}\Omega &= \{\mathrm{ABC}, \mathrm{ACB}, \mathrm{BAC}, \mathrm{BCA}, \mathrm{CAB}, \mathrm{CBA}\}, \\\ \{X = 0\} &= \{\mathrm{BCA},\mathrm{CAB}\}, \\\ \{X = 1\} &= \{\mathrm{ACB},\mathrm{CBA},\mathrm{BAC}\}, \\\ \{X = 2\} &= \{\}, \text{ and} \\\ \{X = 3\} &= \{\mathrm{ABC}\}.\end{align*}

The probability mass function of the distribution of X is therefore

\begin{align*}m_X(0) = \frac{1}{3}\end{align*}

\begin{align*}m_X(1) = \frac{1}{2}\end{align*}

\begin{align*}m_X(2) = 0\end{align*}

\begin{align*}m_X(3) = \frac{1}{6}\end{align*}

All together, we have

\begin{align*}m_X(x) = \begin{cases} \frac{1}{3} & \text{if }x = 0 \\\ \frac{1}{2} & \text{if }x = 1 \\\ \frac{1}{6} & \text{if }x = 3 \\\ 0 & \text{otherwise}. \end{cases}\end{align*}

Cumulative distribution function

The distribution of a random variable X may be specified by its probability mass function or by its cumulative distribution function F_X:

Definition (Cumulative distribution function)
If X is a random variable, then its cumulative distribution function F_X is the function from \mathbb{R} to [0,1] defined by

\begin{align*}F_X(x) = \mathbb{P}(X \leq x).\end{align*}

A probability mass function XEQUATIONX1780XEQUATIONX and its corresponding CDF F_X.

Exercise
Consider a random variable X whose distribution is as shown in the figure above. Select the true statements.

XEQUATIONX1765XEQUATIONX is greater than XEQUATIONX1766XEQUATIONX
XEQUATIONX1767XEQUATIONX
XEQUATIONX1768XEQUATIONX is greater than XEQUATIONX1769XEQUATIONX
XEQUATIONX1770XEQUATIONX is greater than \frac{1}{2}

Solution. The first one is true, since the CDF goes from about 0.1 at -1 to about 0.9 at +1. The difference, about 0.8 is larger than 0.6.

The second one is also true, since there is no probability mass past 2.

The third one is false: there is no probability mass in the interval from -\frac{1}{2} to 0.

\mathbb{P}(100X < 1) is equivalent to the probability that X is less than \frac{1}{100}, which (reading the graph of the CDF) we see is between 0.25 and 0.5. Therefore, the last one is false.

Exercise
Suppose that X is a random variable with CDF F_X and that Y = X^2. Express \mathbb{P}(Y > 9) in terms of the function F_X. For simplicity, assume that \mathbb{P}(X = -3) = 0.

Solution. By definition of Y, we have that Y^2 > 9 if X < -3 or X> 3. Since these events are mutually exclusive, we have

\begin{align*}\mathbb{P}(Y > 9) &= \mathbb{P}(X < -3) + \mathbb{P}(X > 3) \\\ &= \mathbb{P}(X < -3) + 1 - \mathbb{P}(X \leq 3) \\\ &= F_X(-3) + 1 - F_X(3),\end{align*}

where the last step follows since \mathbb{P}(X < -3) = \mathbb{P}(X \leq 3) for this random variable X.

Exercise
Random variables with the same cumulative distribution function are not necessarily equal as random variables, because the probability mass sitting at each point on the real line can come from different \omega's.

For example, consider the two-fair-coin-flip experiment and let X be the number of heads. Find another random variable Y which is not equal to X but which has the same distribution as X.

Solution. If we define Y to be the number of tails, then it's clear from symmetry that it has the same distribution as X. Furthermore, X and Y are unequal as random variables because if X = 0, then Y = 2(and vice versa).

(In fact, we can express Y in terms of X as Y = 2-X.)

Bruno
Bruno Bruno