ProbabilityExpectation and Variance

We often want to distill a random variable's distribution down to a single number. For example, consider the height of an individual selected uniformly at random from a given population. This is a random variable, and communicating its distribution would involve communicating the heights of every person in the population. However, we can summarize the distribution by reporting an average height: we add up the heights of the people in the population and by the number of people.

If the random individual is selected according to some non-uniform probability distribution on the population, then it makes sense to calculate a average rather than a uniform average. The probability-weighted average of the values of a random variable is called its expectation.

Definition
The expectation (or mean ) of a random variable is the probability-weighted average of :

For example, the expected number of heads in two fair coin flips is

There are two common ways of interpreting expected value.

The expectation may be thought of as the value of a random game with payout . According to this interpretation, you should be willing to pay anything less than $1 to play the game where you get a dollar for each head in two fair coin flips. For more than $1 you should be unwilling to play the game, and at $1 you should be indifferent.
The second way of thinking about expected value is as a long-run average. If you play the dollar-per-head two-coin-flip game a very large number of times, then your average payout per play is very likely to be close to $1.

We can test this second interpretation out:

Exercise
Use the expression sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6 mean(rand(0:1) + rand(0:1) for k=1:10^6) to play the dollar-per-head two-coin-flip game a million times and calculate the average payout in those million runs.

How close to 1 is the result typically? Choose the best answer.

Around 0.1

Around 0.01

Around 0.0001

Around 0.0000001

from numpy.random import randint
sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6

mean(rand(0:1) + rand(0:1) for k=1:10^6)

Solution. Running the code several times, we see that the error is seldom as large as 0.01 or as small as 0.0000001. So the correct answer choice is the third one.

We will see that this second interpretation is actually a theorem in probability, called the law of large numbers. In the meantime, however, this interpretation gives us a useful tool for investigation: if a random variable is easy to simulate, then we can sample from it many times and calculate the average of the resulting samples. This will not give us the expected value exactly, but we can get as close as desired by using sufficiently many samples. This is called the Monte Carlo method of approximating the expectation of a random variable.

Exercise
Use a Monte Carlo simulation to estimate the expectation of , where and are independent die rolls.

import numpy as np

Solution. sum(randint(1,7)/randint(1,7) for i in range(10_000_000))/10_000_000 returns approximately 1.43. The actual mean is sum(x/y for x in range(1,7) for y in range(1,7))/36, which is . So we can say that the Monte Carlo result with 10 million trials is quite close to the correct value.

Solution. mean(rand(1:6)/rand(1:6) for i=1:10^8) returns approximately 1.43. The actual mean is mean(x/y for x=1:6, y=1:6), which is , so we can say that the Monte Carlo result with 100 million trials is very close to the correct value.

The following exercise confirms an intuitive fact about expectation: a random variable which is always larger than another has a larger mean. We will state this idea with "larger" replaced by its weak version "at least as large as".

Exercise
Explain why if for all .

Solution. If for all then we have

Expectation and distribution

Although the definition involves the probability space , we can also write a formula for expectation in terms of the probability mass function of the distribution of :

Theorem
The expectation of a discrete random variable is equal to

The idea is that the given formula is just a rearrangement of the terms in the definition of expectation. Let's begin by considering an example. Suppose with probability mass function satisfying , , and . Suppose and . Then

We can group the first two terms together to get

This expression is the one we would get if we wrote out

Therefore, we can see that the two sides are the same.

Let's write this idea down in general form. We group terms on the right-hand side in the formula according to the value of :

Then we can replace with and pull it out of the inside sum to get

Since is equal to , we get

as desired.

Exercise
The expectation of a random variable need not be finite or even well-defined. Show that the expectation of the random variable which assigns a probability mass of to the point (for all ) is not finite.

Consider a random variable whose distribution assigns a probability mass of to each point for and a probability mass of to for each . Show that is not well-defined. (Note: a sum is not defined if and are equal to and , respectively.)

Solution. We multiply the probability mass at each point by the location and sum to get

For the second distribution, the positive and negative parts of the are both infinite for the same reason. Therefore, the sum does not make sense and the mean is therefore not well-defined.

We can also work out the expectation of a function of two discrete random variables in terms of their joint distribution.

Theorem
If , and and are discrete random variables defined on the same probability space, then

Proof. We use the same idea we used in the proof of the expectation formula: group terms in the definition of expectation according the value of the pair . We get

We can use this theorem to show that expectation distributes across multiplication for independent random variables:

Exercise (independence product formula)
Show that if and are independent random variables.

Solution. Using the definition of independence, we have

as desired.

Variance

The expectation of a random variable gives us some coarse information about where on the number line the random variable's probability mass is located. The variance gives us some information about how widely the probability mass is spread around its mean. A random variable whose distribution is highly concentrated about its mean will have a small variance, and a random variable which is likely to be very far from its mean will have a large variance. We define the variance of a random variable to be the average squared distance from to its mean:

Definition (Variance)
The variance of a random variable is defined to be

The standard deviation of is the square root of the variance:

Exercise
Consider a random variable which is obtained by making a selection from the list

[0.245, 0.874, 0.998, 0.567, 0.482]

uniformly at random. Make a rough estimate of the mean and variance of this random variable just from looking at the number line. Then use Python to calculate the mean and variance exactly to see how close your estimates were.

import numpy as np

Solution. My estimate of the mean and variance are and , respectively. The value 0.6 appears to be around the balance point of the points on the number line, and the squared deviation from 0.6 is very small for a couple of the points and as large as about 0.15 for others.

Calculating the mean exactly using m = mean([0.245, 0.874, 0.998, 0.567, 0.482]), we get a value of 0.6332. Calculating the variance exactly using mean([(a-m)^2 for a in A]) (where is the array above), we get a value of 0.074. Therefore, my estimate was a little high.

Exercise
Consider the following game. We begin by picking a number in with uniform probability. If that number is less than , we pick another number from the same distribution and add it to the first. We repeat this procedure until the running sum exceeds . Let be the random variable whose value is the number of draws needed to end the game. Use a simulation to approximate the expected value and variance of . Include your code in your answer as well as some discussion of your results.

Tips: rand(0:1000)/1000 returns a sample from the desired distribution. Also, it's a good idea to wrap a single run of the game into a zero-argument function.

import numpy as np

Solution. We define a function run which plays the game once, and we record the result of the game over a million runs. We estimate the mean as the mean of the resulting list, and we estimate the variance using

import numpy as np
def runs_till_over():
    s = 0
    ctr = 0
    while s < 1.0:
        s += np.random.randint(0,1001)/1000
    return ctr

A = [runs_till_over() for _ in range(1_000_000)]
μ = np.mean(A)
var = np.mean((a-μ)**2 for a in A)
μ,var

function runs_till_over()
    s = 0
    ctr = 0
    while s < 1.0
        s += rand(1:1000)/1000
    end
    ctr
end

A = [runs_till_over() for _ in 1:1_000_000]
μ = mean(A)
var = mean((a-μ)^2 for a in A)
μ,var

We get a mean of about , and a variance of about .

We can use linearity of expectation to rewrite the formula for variance in a simpler form:

We can use this formula to show how variance interacts with linear operations:

Exercise
Show that variance satisfies the properties

if is a real number and is a random variable, and if and are independent random variables, respectively.

Proof. The first part of the statement follows easily from linearity of expectation

Since by linearity, we have

Rearranging and using linearity of expectation, we get

The desired result follows because if and are independent, by the independence product formula.

Exercise
Consider the distribution which assigns a probability mass of to each integer point , where is equal to the reciprocal of .

Show that this distribution has a finite mean but not a finite variance.

Solution. Let be a random variable with this distribution. Then

Since the sum on the right converges by the -test, it follows that is finite. On the other hand,

does not converge because of the harmonic series term. Therefore is infinite.

Change Language

Sign in to Mathigon

Share

Reset Progress

Glossary

ProbabilityExpectation and Variance

Expectation and distribution

Variance