Glossary

Select one of the keywords on the left…

ProbabilityContinuous Distributions

Reading time: ~35 min

Not every random phenomenon is ideally modeled using a discrete probability space. For example, we will see that the study of discrete distributions leads us to the Gaussian distribution, which smooths its probability mass out across the whole real number line, with most of the mass near the origin and less as you move out toward -\infty or +\infty.

The Gaussian distribution spreads its probability mass out across the real number line. There is no single point where a positive amount of probability mass is concentrated.

We won't be able to work with such distributions using probability mass functions, since the function which maps each point to the amount of probability mass at that point is the zero function. However, calculus provides us with a smooth way of specifying where stuff is on the number line and how to total it up: integration. We can define a function f which is larger where there's more probability mass and smaller where there's less, and we can calculate probabilities by integrating f.

The probability measure \nu associated with a density f assigns the measure XEQUATIONX1787XEQUATIONX to each interval XEQUATIONX1788XEQUATIONX.

The simplest possible choice for f is the function which is 1 on [0,1] and 0 elsewhere. In this case, the probability mass associated with a set \mathbb{E} \subset [0,1] is the total length of E. In higher dimensions, \Omega = [0,1]^2 with the probability measure \mathbb{P}(E) = \text{area}(E) gives us a probability space, as does \Omega = [0,1]^3 with the probability measure \mathbb{P}(E) = \text{volume}(E).

Exercise
Consider the probability space \Omega = [0,1]^2 with the area probability measure. Show that if X((\omega_1, \omega_2)) = \omega_1 and Y((\omega_1, \omega_2)) = \omega_2, then the events \{X \in I\} and \{Y \in J\} are independent for any intervals I\subset [0,1] and J\subset [0,1].

Solution. We have

\begin{align*}\mathbb{P}(\{X \in I\} \cap \{Y \in J\}) = \text{area}(I \times J) = \text{length}(I) \text{length}(J),\end{align*}

by the area formula for a rectangle. Since \text{length}(I) = \mathbb{P}(\{X \in I\} \cap \{Y \in [0,1]\}) = \mathbb{P}(X \in I) and \text{length}(J) = \mathbb{P}(\{Y \in J\} \cap \{X \in [0,1]\}) = \mathbb{P}(Y \in J), we conclude that \{X \in I\} and \{Y \in J\} are independent.

The probability density function

Just as a function we integrate to find total mass is called a mass density function, the function we integrate to find total probability is called a probability density function. We refer to f as a density because its value at a point may be interpreted as limit as \epsilon \to 0 of the probability mass in the ball of radius \epsilon around \omega divided by the volume (or area/length) of that ball.

Definition
Suppose that \Omega \subset \mathbb{R}^n for some n \geq 1, and suppose that f:\Omega \to [0,\infty) has the property that \int_\Omega f \mathrm{d} V = 1. We call f a probability density function, abbreviated PDF, and we define

\begin{align*}\mathbb{P}(E) = \int_E f \, \mathrm{d} V\end{align*}

for events E \subset \Omega. We call (\Omega, \mathbb{P}) a continuous probability space.

Exercise
Consider the probability space with \Omega = [0,1] and probability measure given by the density f(x) = 2x for x \in [0,1]. Find \mathbb{P}([\frac{1}{2},1]).

Solution. We calculate \mathbb{P}([\frac{1}{2},1]) = \displaystyle{\int_{\frac{1}{2}}^1 2x ,\mathrm{d} x = \frac{3}{4}}.

If f is constant on \Omega, then we call f the uniform measure on \Omega. Note that this requires that \Omega have finite volume.

All of the tools we developed for discrete probability spaces have analogues for continuous probability spaces. The main idea is to replace sums with integrals, and many of the definitions transfer over with no change. Let's briefly summarize and follow up with some exercises.

  • The distribution of a continuous random variable X is the measure A\mapsto \mathbb{P}(X \in A) on \mathbb{R}.
  • The cumulative distribution function F_X of a continuous random variable X is defined by F_X(x) = \mathbb{P}(X \leq x) for all x \in \mathbb{R}.
  • The joint distribution of two continuous random variables X and Y is the measure A \mapsto \mathbb{P}((X,Y) \in A) on \mathbb{R}^2.
  • If (X,Y) is a continuous pair of random variables with joint density f_{X,Y}: \mathbb{R}^2 \to \mathbb{R}, then the conditional distribution of Y given the event \{X=x\} has density f_{Y| X=x} defined by f_{Y| {X=x}}(x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, where \displaystyle{f_X(x) = \int_{-\infty}^\infty f(x,y) , \mathrm{d} y} is the pdf of Y
  • Two continuous random variables X and Y are independent if \mathbb{P}((X,Y) \in A \times B) = \mathbb{P}(X \in A) \mathbb{P}(Y \in B) for all A\subset \mathbb{R} and B \subset \mathbb{R}. This is true if and only if (X,Y) has density (x,y) \mapsto f_X(x)f_Y(y), where f_X and f_Y are the densities of X and Y, respectively.
  • The expectation of a continuous random variable X defined on a probability space (\Omega, \mathbb{P}) is \mathbb{E}[X] = \int_\Omega X(\omega) f(\omega) , \mathrm{d} \omega, where f is \mathbb{P}'s density. The expectation is also given by \mathbb{E}[X] = \int_{\mathbb{R}} x f_X(x) , \mathrm{d} x, where f_X is the density of the distribution of X.

Example
Suppose that f is the function which returns 2 for any point in the triangle \Omega with vertices (0,0), (1,0), and (1,1) and otherwise returns 0. Suppose that (X,Y) has density f. Find the conditional density of X given \{Y = y\}, where y is a number between and 0 and 1.

Solution. Then the conditional density of X given \{Y = y\} is the uniform distribution on the segment [y,1], since that interval is the intersection of the triangle and the horizontal line at height y.

Exercise
Find the expectation of a random variable whose density is f(x) = \mathrm{e}^{-x}\boldsymbol{1}_{x \in [0,\infty)}.

Solution. We calculate

\begin{align*}\int_{-\infty}^\infty x \mathrm{e}^{-x} \boldsymbol{1}_{x \in [0,\infty)} \, \mathrm{d} x = \int_{0}^\infty x \mathrm{e}^{-x} \, \mathrm{d} x = 1.\end{align*}

Exercise
Show that the cumulative distribution function of a continuous random variable is increasing and continuous.

(Note: if f is a nonnegative-valued function on \mathbb{R} satisfying \int_\mathbb{R} f = 1, then \lim_{\epsilon \to 0}\int_{x}^{x+\epsilon}f(t) \mathrm{d} t = 0 for all x \in R.)

Solution. The CDF is increasing since F(s) = \int_{-\infty}^s f(x) , \mathrm{d} x \leq \int_{-\infty}^t f(x) , \mathrm{d} x = F(t) whenever s < t.

To see that F is continuous, we note that the difference between F(s) and F(s+\epsilon) is the integral of the density f over a width- \epsilon interval. Thus we can use the supplied note to conclude that F(s + \epsilon) \to F(s) as \epsilon \to 0 for all s \in \mathbb{R}.

Exercise
Suppose that f is a density function on \mathbb{R} and that F is the cumulative distribution function of the associated probability measure on \mathbb{R}. Show that F is differentiable and that F' = f wherever f is continuous.

Use this result to show that if U is uniformly distributed on [0,1], then U^2 has density function f(x) = \frac{1}{2\sqrt{x}} on (0,1].

Solution. The equation F'(x) = f(x) follows immediately from the fundamental theorem of calculus. We have

\begin{align*}F'(x) = \frac{\mathrm{d} }{\mathrm{d} x} \int_{-\infty}^x f(t) \, \mathrm{d} t = f(x)\end{align*}

at any point x where f is continuous.

Let F be the CDF of U^2. Since \mathbb{P}(U \le t) = t for t \in [0,1], we have F(x) = \mathbb{P}(U^2 < x) = \mathbb{P}(U < \sqrt{x}) = \sqrt{x} for x \in [0,1]. Differentiating, we find that the density is \frac{1}{2\sqrt{x}} on (0,1).

Exercise
Given a cumulative distribution function F, let us define the generalized inverse F^{-1}: [0,1] \to [-\infty,\infty] so that F^{-1}(u) is the left endpoint of the interval of points which are mapped by F to a value which is greater than or equal to u.

The generalized inverse is like the inverse function of F, except that if the graph of F has a vertical jump somewhere, then all of the y values spanned by the jump get mapped by F^{-1} to the x-value of the jump, and if the graph of F is flat over a stretch of x-values, then the corresponding y-value gets mapped by F^{-1} back to the left endpoint of the interval of x values.

The remarkably useful inverse CDF trick gives us a way of sampling from any distribution whose CDF we can compute a generalized inverse for: it says that if U is uniformly distributed on [0,1], then the cumulative distribution of X = F^{-1}(U) is F.

  • Confirm that if the graph of F has a jump from (x,y_1) to (x,y_2), then the probability of the event \{X = x\} is indeed y_2 - y_1.
  • Show that the event \{X \leq t\} has the same probability as the event \{U \leq F(t)\}. Conclude that F is in fact the CDF of X. Hint: draw a figure showing the graph of F together with U somewhere on the y-axis and X in the corresponding location on the x-axis.
  • Write a Python function which samples from the distribution whose density function is 2x\boldsymbol{1}_{0 \leq x \leq 1}.

Solution.

It can be shown that, as result of monotonicity and additivity,

\begin{align*}\mathbb{P}(X < x) = \max\{F(t) : t < x \},\end{align*}

whenever the maximum exists. Now, because a CDF is monotonic, if F has a jump from y_1 to y_2 at x it must be the case that F(x) = y_2 and \max\{F(t) : t < x \} = y_1. Therefore, \mathbb{P}(X < x) = y_1. Since

\begin{align*}\mathbb{P}(X = x) = F(x) - \mathbb{P}(X < x)\end{align*}

by additivity, it follows that \mathbb{P}(X = x) = y_1 - y_2 as required.

The CDF F_U of a uniform [0, 1] random variable is

\begin{align*}F_U(t) = \begin{cases} 0 & \text{if} \; t < 0 \\ t & \text{if} \; 0 \leq t \leq 1 \\ 1 & \text{otherwise.} \end{cases}\end{align*}

Therefore we have

\begin{align*}\mathbb{P}(U \leq F(t)) = F_U(F(t)) = F(t) = \mathbb{P}(X \leq t)\end{align*}

as required.

A random variable with this density function has a CDF defined by

\begin{align*}F(t) = \begin{cases} 0 & \text{if} \; t < 0 \\ t^2 & \text{if} \; 0 \leq t \leq 1 \\ 1 & \text{otherwise}. \end{cases}\end{align*}

Therefore the generalized inverse of F is F^{-1}(u) = \sqrt{u} for all 0 \leq u \leq 1. This leads to the following Julia code for sampling from this distribution.

np.sqrt(np.random.random_sample())
using Distributions
sqrt(rand(Uniform(0, 1)))

General probability spaces

So far we have discussed probability spaces which are specified with the help of either a probability mass function or a probability density function. These are not the only possibilities. For example, if we produce an infinite sequence of independent bits B_1, B_2, \ldots, then the distribution of B_1/3 + B_2 / 3^2 + B_3 / 3^3 + \cdots has CDF as shown in the figure below. This function doesn't have jumps, so it does not arise from cumulatively summing a mass function. But it does all of its increasing on a set of total length zero (in other words, there is a set of total length 1 on which the derivative of this function is zero), so it also does not arise from cumulatively integrating a density function.

In general, a person may propose a probability space by specifying any set \Omega, a collection of subsets of \Omega which supports taking countable unions, intersections, and complements, and a function \mathbb{P} defined on that collection of subsets. We require that certain properties are satisfied:

The CDF of the uniform measure on the Cantor set.

Definition (Probability space: the general definition)
Suppose that \Omega is a set and \mathbb{P} is a function defined on a collection of subsets of \Omega(called events). If

  • \mathbb{P}(\Omega) = 1,
  • \mathbb{P}(E) \geq 0 for all events E, and
  • \mathbb{P}(E_1 \cup E_2 \cup \cdots) = \mathbb{P}(E_1) + \mathbb{P}(E_2) + \cdots for all sequences of pairwise disjoint events E_1, E_2, \ldots,

then we say that \mathbb{P} is a probability measure on \Omega, and that \Omega together with the given collection of events and the measure \mathbb{P} is a probability space.

Bruno
Bruno Bruno