Select one of the keywords on the left…


Reading time: ~20 min

Just as mean and variance are summary statistics for the distribution of a single random variable, covariance is useful for summarizing how (X,Y) are jointly distributed.

The covariance of two random variables X and Y is defined to be the expected product of their deviations from their respective means:

\begin{align*}\operatorname{Cov}(X,Y) = \mathbb{E}[ (X - \mathbb{E}[X]) (Y - \mathbb{E}[Y])].\end{align*}

The covariance of two independent random variables is zero, because the expectation distributes across the product on the right-hand side in that case. Roughly speaking, X and Y tend to deviate from their means positively or negatively together, then their covariance is positive. If they tend to deviate oppositely (that is, X is above its mean and Y is below, or vice versa), then their covariance is negative.

Identify each of the following joint distributions as representing positive covariance, zero covariance, or negative covariance. The size of a dot at (x,y) represents the probability that X = x and Y = y.

Solution. The first graph shows negative covariance, since X-\mathbb{E}[X] and Y - \mathbb{E}[Y] have opposite sign for the top-left mass and for the bottom-right mass, and the contributions of the other two points are smaller since these points are close to the mean (\mathbb{E}[X],\mathbb{E}[Y]).

The second graph shows positive covariance, since the top right and bottom left points contribute positively, and the middle point contributes much less.

The third graph shows zero covariance, since the points contribute to the sum defining \mathbb{E}((X - \mathbb{E}(X))(Y-\mathbb{E}[Y])) in two cancelling pairs.

Does \operatorname{Cov}(X,Y) = 0 imply that X and Y are independent?

Hint: consider the previous exercise. Alternatively, consider a random variable X which is uniformly distributed on \{1,2,3\} and an independent random variable Z which is uniformly distributed on \{-1,1\}. Set Y = ZX. Consider the pair (X,Y).

Solution. The third example in the previous exercise shows a non-independent pair of random variables which has zero covariance.

Alternatively, the suggested random variables X and Y have zero covariance, but they are not independent since, for example, \mathbb{P}(X = 2 \text{ and } Y = 1) = 0 even though \mathbb{P}(X = 2) and \mathbb{P}(Y = 1) are both positive.

The correlation of two random variables X and Y is defined to be their covariance normalized by the product of their standard deviations:

\begin{align*}\operatorname{Corr}(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sigma(X)\sigma(Y)}\end{align*}

In this problem, we will show that the correlation of two random variables is always between -1 and 1. Let \mu_X = \mathbb{E}[X], and let \mu_Y = \mathbb{E}[Y].

Consider the following quadratic polynomial in t:

\begin{align*}\mathbb{E}[&((X - \mu_X) + (Y - \mu_Y) t)^2] \\\ &= \mathbb{E}[(X-\mu_X)^2] + 2t\mathbb{E}[(X-\mu_X)(Y-\mu_Y)] + t^2 \mathbb{E}[(Y-\mu_Y)^2],\end{align*}

where t is a variable. Explain why this polynomial is nonnegative for all t \in \mathbb{R}.

Recall that a polynomial at^2 + bt + c is nonnegative for all t if and only if the discriminant b^2 - 4ac is nonpositive (this follows from the quadratic formula). Use this fact to show that

\begin{align*}\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]^2 \leq \operatorname{Var} X \operatorname{Var} Y.\end{align*}

Conclude that -1 \leq \operatorname{Corr}(X,Y) \leq 1.


The polynomial is nonnegative because the left-hand side of the given equation is the expectation of a nonnegative random variable.

Substituting \mathbb{E}[(Y-\mu_Y)^2] for a, 2\mathbb{E}[(X-\mu_X)(Y-\mu_Y)] for b, and \mathbb{E}[(X-\mu_X)^2] for c, the inequality b^2 - 4ac \leq 0 implies

\begin{align*}4\mathbb{E}[(X-\mu_X)(Y-\mu_Y)]^2 - 4 \mathbb{E}[(X-\mu_X)^2]\mathbb{E}[(Y-\mu_Y)^2] \leq 0,\end{align*}

which implies the desired inequality.

Dividing both sides of the preceding inequality by \operatorname{Var} X \operatorname{Var} Y and taking the square root of both sides, we find that |\operatorname{Corr}(X,Y)| \leq 1, which implies -1 \leq \operatorname{Corr}(X,Y) \leq 1.

Show that

\begin{align*}\operatorname{Var}(X_{1}+X_{2}+\cdots+X_{n}) = \sum_{k=1}^n \operatorname{Var} X_k\end{align*}

if X_1, \ldots, X_n are independent random variables.

Solution. The expectation of (X_1+\cdots+X_n)^2 is the sum of the values in this table:


The square of the expectation of X_1+\cdots+X_n is the sum of the values in this table:


Subtracting these two tables entry-by-entry, we get the variances on the right-hand side from the diagonal terms, and all of the off-diagonal terms cancel, by the independence product formula.

Exercise (Mean and variance of the sample mean)
Suppose that X_1, \ldots, X_n are independent random variables with the same distribution. Find the mean and variance of

\begin{align*}\frac{X_1 + \cdots + X_n}{n}\end{align*}

Solution. By linearity of expectation, we have

\begin{align*}\mathbb{E}\left[\frac{X_1 + X_2 + \cdots + X_n}{n}\right] &= \frac{\mathbb{E}[X_1] + \mathbb{E}[X_2]+ \cdots + \mathbb{E}[X_n]}{n} \\\ &= \mathbb{E}[X_1],\end{align*}


\begin{align*}\operatorname{Var}\left(\frac{X_1+X_2 + \cdots + X_n}{n}\right) &= \sum_{k = 1}^{n}\operatorname{Var}\left(\frac{X_k}{n}\right) \\\ &= \sum_{k = 1}^{n}\operatorname{Var}\left(\frac{X_k}{n}\right) \\\ &= \sum_{k = 1}^{n} \frac{1}{n^2} \operatorname{Var}(X_k) \\\ &= \frac{\operatorname{Var}(X_1)}{n}.\end{align*}

The covariance matrix of a vector \mathbf{X} = [X_1, \ldots, X_n] of random variables defined on the same probability space is defined to be the matrix \Sigma whose (i,j) th entry is equal to \operatorname{Cov}(X_i,X_j).

Show that \Sigma = \mathbb{E}[\mathbf{X} \mathbf{X}'] if all of the random variables X_1, \ldots, X_n have mean zero. (Note: expectation operates on a matrix or vector of random variables entry-by-entry.)

Solution. The definition of matrix multiplication implies that the (i,j) th entry of \mathbf{X} \mathbf{X}' is equal to X_i X_j. Therefore, the (i,j) th entry of \mathbb{E}[\mathbf{X} \mathbf{X}'] is equal to \mathbb{E}[X_iX_j], which in turn is equal to \operatorname{Cov}(X_i,X_j) since the random variables have zero mean.

Bruno Bruno