ProbabilityCentral Limit Theorem

Convergence in distribution

The central limit theorem, one of the most important results in applied probability, is a statement about the convergence of a sequence of probability measures. So, we begin this section by exploring what it should mean for a sequence of probability measures to converge to a given probability measure.

Roughly speaking, we will consider two probability measures close if they put approximately the same amount of probability mass in approximately the same places on the number line. For example, a sequence of continuous probability measures with densities converges to a continuous probability measure with density if for all :

The sequence of densities XEQUATIONX1772XEQUATIONX converges to the density as .

If the limiting probability measure is not continuous, then the situation is slightly more complicated. For example, we would like to say that the probability measure which puts a mass of at and a mass of at converges to the fair coin flip distribution as . This does not correspond to pointwise convergence of the probability mass functions, since we don't have convergence of probability mass function values at 0 or at 1 in this example.

The probability measures which assign mass and XEQUATIONX1773XEQUATIONX to and XEQUATIONX1774XEQUATIONX, respectively, (shown in sea green) converge to the Bernoulli distribution with success probability (shown in red).

We can get around this problem by giving ourselves a little space to the left and right of any point where the limiting measure has a positive probability mass. In other words, suppose that is a probability measure on with probability mass function , and consider an interval . Let's call such an interval a continuity interval of if and are both zero.

We will say that a sequence of probability measures converges to if converges to for every continuity interval of .

We can combine the discrete and continuous definitions into a single definition:

Definition (Convergence of probability measures on )
A sequence of probability measures on converges to a probability measure on if whenever is an interval satisfying , where and are the endpoints of .

Exercise
Define to be when and 0 otherwise, and let be the probability measure with density . Show that converges to the probability measure which puts of all its mass at the origin.

Solution. Suppose is a continuity interval of .

If contains the origin, then the terms of sequence are equal to for large enough , since all of the probability mass of is in the interval and eventually .

If does not contain the origin, then the terms of the sequence are eventually equal to 0, for the same reason.

In either case, converges to . Therefore, converges to .

The central limit theorem

The law of large numbers tells us that the distribution of a mean of many independent, identically distributed finite-variance, mean- random variables is concentrated around . This a mathematical formalization of the well-known fact that flipping a coin many times results in a heads proportion close to 1/2 with high probability, or the average of many die rolls is very close to 3.5 with high probability.

The central limit theorem gives us precise information about how the probability mass of is concentrated around its mean. Consider a sequence of independent fair coin flips , and define the sums

for . The probability mass functions of the 's can be calculated exactly, and they are graphed in the figure below, for several values of . We see that the graph is becoming increasingly bell-shaped as increases.

Probability mass functions of sums of Bernoulli(1/2) random variables.

If we repeat this exercise with other distributions in place of the independent coin flips, we obtain similar results. For example, the Poisson distribution is a discrete distribution which assigns mass to each nonnegative integer . The probability mass functions for sums of the independent Poisson(3) random variables is shown in the figure below. Not only is the shape of the graph stabilizing as increases, but we're apparently getting the same shape as in the Bernoulli example.

Probability mass functions of sums of Poisson(3) random variables.

To account for the shifting and spreading of the distribution of , we normalize it: we subtract its mean and then divide by its standard deviation to obtain a random variable with mean zero and variance 1:

So, we define , which has mean 0 and variance 1. Based on the figures above, we conjecture that the distribution of converges as to some distribution with a bell-shaped probability density function.

This conjecture turns out to be correct, with a Gaussian as the limiting distribution. The standard Gaussian distribution is denoted and has probability density function .

Theorem (Central Limit theorem)
Suppose that are independent, identically distributed random variables with mean and finite standard deviation , and defined the normalized sums for .

For all , we have

where . In other words, the sequence converges in distribution to .

The normal approximation is the technique of approximating the distribution of as .

Example
Suppose we flip a coin which has probability 60% of turning up heads times. Use the normal approximation to estimate the value of such that the proportion of heads is between 59% and 61% with probability approximately 99%.

Solution. We calculate the standard deviation and the mean of each flip, and we use these values to rewrite the desired probability in terms of . We find

where the last step was obtained by multiplying all three expressions in the compound inequality by . Since is distributed approximately like a standard normal random variable, the normal approximation tells us to look for the least so that

where . By the symmetry of the Gaussian density, we may rewrite this equation as

Defining the normal CDF , we want to find the least integer such that exceeds . The following code tells us that .

from scipy.stats import norm
norm.ppf(0.995)

using Distributions
quantile(Normal(0,1), 0.995)

Setting this equal to and solving for gives 15,924. The exact value of for which the probability is closest to 99% is 15,861, so we can see that the normal approximation worked pretty well in this case.

Example
Consider a random variable which is defined to be the sum of independent fair coin flips. The law of such a random variable is called a binomial distribution. Let be the pmf of . Use the code block below to observe that appears to converge to for all , and explain why this does not contradict the central limit theorem.

For simplicity, you may assume that is even.

import matplotlib.pyplot as plt
import scipy.stats
def binom_stickplot(n):
    """
    Return a stick plot representing the pmf
    of a sum of n independent coin flips
    """
    ν = scipy.stats.binom(n,0.5)
    # x contains the possible RV values:
    x = (np.arange(n+1) - n/2)/np.sqrt(n/2)
    # y contains the probabilities:
    y = [ν.pmf(k) for k in range(n+1)]
    plt.ylim(0,1)
    return plt.vlines(x,y)

binom_stickplot(10)

using Plots, Distributions
function binom_stickplot(n)
    ν = Binomial(n, 0.5)
    sticks((-n÷2: n÷2)/sqrt(n/2), [pdf(ν, k) for k in 0:n],
           label = "Binomial($n,1/2)", ylims = (0, 1))
end
binom_stickplot(1000)

Solution. Executing the cells, we see that the height of the tallest stick indeed goes to zero as the argument to binom_stickplot is increased.

This finding does not contract the central limit theorem, since convergence in distribution is not based on convergence of the amount of probability mass at individual points but rather on the amount of probability mass assigned to intervals. In any positive-width interval, the distribution of has many points with nonzero probability mass. Since there are many of them, they can be small individually while nevertheless totaling up to a non-small mass.

Exercise
Suppose that the percentage of residents in favor of a particular policy is 64%. We sample individuals uniformly at random from the population.

In terms of , find a interval centered at 0.64 such that the proportion of residents polled who are in favor of the policy is in with probability about 95%.
How many residents must be polled for the proportion of poll participants who are in favor of the policy to be between 62% and 66% with probability at least 95%?

Solution. Let be the th sample from the population (1 if the resident is in favor, and 0 otherwise). Then the proportion of the residents in favor of the policy is Each is a Bernoulli random variable with and .

We need to find such that Equivalently, we need to find such that Now by the Central Limit Theorem, for large. Since for we look to solve

Therefore,

and with probability 95%, the proportion of polled residents in favor of the policy will be in

For the second part, we want to find such that From above, we find that and thus Therefore at least 2,213 residents must be polled, according to the normal approximation.

Exercise
Suppose that is a sequence of independent, identically distributed random variables with variance 2 and mean 7. Find the limits of each of the following probabilities .

Solution. Let

For each non-negative integer we have By the Central Limit Theorem (CLT),

We have

by the CLT. Since for all we find that

by the CLT. We have

by the CLT.

Change Language

Sign in to Mathigon

Share

Reset Progress

Glossary

ProbabilityCentral Limit Theorem

Convergence in distribution

The central limit theorem