Select one of the keywords on the left…

Linear AlgebraEigenanalysis

Reading time: ~50 min

In this section we will see how we can better understand a linear transformation by breaking it down into linear transformations.

Let T be a linear transformation from \mathbb{R}^n to \mathbb{R}^n. Suppose that \mathcal{B} is a basis of \mathbb{R}^n, that V is the span of some of the vectors in \mathcal{B}, and that W is the span of the remaining vectors in \mathcal{B}. Then any vector in \mathbb{R}^n can be written as the sum of a vector \mathbf{v} in V and a vector \mathbf{w} in W. Since T(\mathbf{v} + \mathbf{w}) = T(\mathbf{v}) + T(\mathbf{w}), we can see how T behaves on all of \mathbb{R}^n if we know how it behaves on V and on W. This decomposition is particularly helpful if V and W are chosen so that T behaves in a simple way on V and on W.

Given such a decomposition of \mathbb{R}^n into the vector spaces V and W, we can apply the same idea to split V and W into lower-dimensional vector spaces and repeat until no more splits are possible. The most optimistic outcome of this procedure would be that we get all the way down to n one-dimensional subspaces and that T acts on each of these subspaces by simply scaling each vector in that subspace by some factor. In other words, we would like to find n vectors \mathbf{v} for which T(\mathbf{v}) is a scalar multiple of \mathbf{v}. This leads us to the following definition.

An eigenvector \mathbf{v} of an n\times n matrix A is a nonzero vector with the property that A\mathbf{v} = \lambda \mathbf{v} for some \lambda \in \mathbb{R} (in other words, A maps \mathbf{v} to a vector which is either zero or parallel to \mathbf{v}). We call \lambda an eigenvalue of A, and we call the eigenvector together with its eigenvalue an eigenpair.

Every nonzero vector is an eigenvector (with eigenvalue ) of the identity matrix.

Find a matrix with eigenpairs ([1,0],2) and ([1,1],3). Sketch the images of some gridlines under multiplication by this matrix to show how it scales space along the lines through its eigenvectors. Verbally describe your results below.

Solution. Writing out the equations implied by the given eigenpair relations, we see that the first implies that the first column of the matrix is [2,0], and the second (together with the first) implies that the second column of the matrix is [1,3].

The following gridline images show how the transformation distorts space. Equally spaced points which are separated in the east-west direction get spread out by a factor of 2, while the diagonal line gets stretched out by a factor of 3. Since 3 > 2, this introduces a bottom-left-to-top-right tilt for the images of the vertical gridlines.


If \mathbf{v}_1, \dots, \mathbf{v}_n are eigenvectors of A with the same eigenvalue \lambda and \mathbf{v} = c_1\mathbf{v}_1 + \cdots + c_n\mathbf{v}_n for some weights c_1, \dots, c_n such that c_i \neq 0 for at least one i \in \{1, \dots, n\}, then \mathbf{v} is also an eigenvector of A with eigenvalue \lambda because

\begin{align*}A\mathbf{v} &= A(c_1\mathbf{v}_1 + \cdots + c_n\mathbf{v}_n) \\ &= c_1A\mathbf{v}_1 + \cdots + c_nA\mathbf{v}_n \\ &= c_1 \lambda \mathbf{v}_1 + \cdots c_n \lambda \mathbf{v}_n \\ &= \lambda (c_1\mathbf{v}_1 + \cdots c_n\mathbf{v}_n) \\ &= \lambda \mathbf{v}.\end{align*}

Therefore, the set of all eigenvectors corresponding to a particular eigenvalue form a . Such a space is called an eigenspace.

Let A be a 4 \times 4 matrix, with eigenvectors \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} and \begin{bmatrix} 0 \\ 0 \\ 2 \\\ -3\end{bmatrix}, both with eigenvalue 3. Find A\left(\begin{bmatrix} 5 \\ 5 \\ 8 \\ -12 \end{bmatrix}\right).

Solution. Since \begin{bmatrix} 5 \\\ 5 \\\ 8 \\\ -12 \end{bmatrix} = 5 \begin{bmatrix} 1 \\\ 1 \\\ 0 \\\ 0 \end{bmatrix} + 4 \begin{bmatrix} 0 \\\ 0 \\\ 2 \\\ -3 \end{bmatrix}, we find that A\left( \begin{bmatrix} 5 \\\ 5 \\\ 8 \\\ -12 \end{bmatrix}\right) = 3 \begin{bmatrix} 5 \\\ 5 \\\ 8 \\\ -12 \end{bmatrix} = \begin{bmatrix} 15 \\\ 15 \\\ 24 \\\ -36 \end{bmatrix}.


Let V \subset \mathbb{R}^n be a subspace spanned by the eigenvectors of a matrix A. If \mathbf{v} \in V, which of the following are necessarily true?

A\mathbf{v} \in V.
A\mathbf{v} is orthogonal to every vector in V.
A\mathbf{v} and \mathbf{v} are always linearly dependent.

Solution. Let \mathbf{a}_1, \dots, \mathbf{a}_k be the eigenvectors of A that span V and let \lambda_1, \dots, \lambda_k be the corresponding eigenvalues. Then \mathbf{v} \in V admits a representation \mathbf{v} = v_1\mathbf{a}_1 + \cdots + v_k\mathbf{a}_k. Since

\begin{align*}A\mathbf{v} &= v_1 A\mathbf{a}_1 + \cdots + v_k A \mathbf{a}_k \\ &= v_1 \lambda_1 \mathbf{a}_1 + \cdots + v_k \lambda_k \mathbf{a}_k,\end{align*}

we see that A\mathbf{v} is also in V. This means (2) is not true in general. Option (3) need not always hold. For instance, it fails to hold if \mathbf{v} = \mathbf{a}_1 + \mathbf{a}_2 and \lambda_1 and \lambda_2 are both non-zero and not equal. Therefore the only true statement is (1) A\mathbf{v} \in V.

Suppose A is a matrix with a eigenvector \mathbf{v} whose eigenvalue is 2 and an eigenvector \mathbf{w} whose eigenvalue is 2. Let \mathbf{u = v+w}. Explain why

\begin{align*}\lim_{n \to \infty}\frac{|A^{n} \mathbf{u}|^2}{|A^{n}\mathbf{v}|^2} = 1\end{align*}

Solution. Let n \geq 1 be an integer. Then by the eigenvalue equation, we have

\begin{align*}A^n\mathbf{u} &= A^n (\mathbf{v} + \mathbf{w}) \\ &= A^n \mathbf{v} + A^n \mathbf{w} \\ &= 3^n\mathbf{u} + 2^n \mathbf{w}.\end{align*}

When n is large, the first term is much larger than the second. Writing the squared norm as a dot product and distributing, we get

\begin{align*}|3^n\mathbf{u} + 2^n \mathbf{w}|^2 = | 3^n\mathbf{v} |^2 + 2 (3^n\mathbf{v}) \cdot (2^n\mathbf{w}) + |2^n\mathbf{w}|^2.\end{align*}


\begin{align*}\frac{|A^n\mathbf{u}|^2}{| A^n \mathbf{v}|^2} &= \frac{|3^n\mathbf{u} + 2^n \mathbf{w}|^2}{| A^n \mathbf{v}|^2} \\ &= \frac{| 3^n\mathbf{v} |^2 + 2 (3^n\mathbf{v}) \cdot (2^n\mathbf{w}) + |2^n\mathbf{w}|^2}{|3^n\mathbf{v} |^2} \\ & = \frac{3^{2n} |\mathbf{v}|^2 + 2 \left(3 \cdot 2\right)^n \mathbf{v} \cdot \mathbf{w} + 2^{2n} |\mathbf{w}|^2}{3^{2n}| \mathbf{v}|^2} \\ &= 1 + 2 \left(\frac{3 \cdot 2}{3^2}\right)^n \cdot \left(\frac{\mathbf{v} \cdot \mathbf{w}}{|\mathbf{v}|^2}\right) + \left(\frac{2}{3}\right)^{2n} \cdot \left(\frac{| \mathbf{w} |^2}{|\mathbf{v}|^2}\right).\end{align*}

Since \lim\limits_{n \to \infty} \left(\frac{3 \cdot 2}{3^2}\right)^n = \lim\limits_{n \to \infty} \left(\frac{2}{3}\right)^{2n} = 0, we find that \lim\limits_{n \to \infty} \frac{|A^n\mathbf{u}|^2}{| A^n \mathbf{v}|^2} = 1.


If an n\times n matrix A has n linearly independent eigenvectors, then we can think of the one-dimensional subspaces spanned by each of these vectors as (not necessarily orthogonal) axes along which A acts by scaling.

In matrix terms, we can define V to be the matrix with the eigenvectors of A as columns. Then from the definition of an eigenpair, we have

\begin{align*}AV = V \Lambda,\end{align*}

where \Lambda is a matrix whose diagonal entries are the eigenvalues (in order corresponding to the columns of V) and whose other entries are zero. By the invertible matrix theorem, the assumption about V's columns being linearly independent implies that V is invertible, so we find that A = V \Lambda V^{-1}, where \Lambda is a diagonal matrix, and we say that A is diagonalizable.

Some matrices are not diagonalizable, because they correspond to geometric transformations that cannot be viewed as scaling along any set of axes. Use this geometric intuition to come up with a 2 \times 2 matrix which is not diagonalizable.

Solution. Rotation matrices in \mathbb{R}^2(except for 0 degree rotations and 180-degree rotations) are not diagonalizable. For example, the 90-degree rotation matrix

\begin{align*}A = \begin{bmatrix} 0 & -1 \\\ 1 & 0 \end{bmatrix}\end{align*}

does not send any nonzero vector \vec{v} \in \mathbb{R}^2 to a multiple of itself.

Suppose that we have diagonalized A as A = VDV^{-1}. Then A^{3} is equal to

Let B be another matrix, with eigenpairs (\mathbf{v}_1,3) and (\mathbf{v}_{2},-2). Let \mathbf{u} = 2\mathbf{v}_{1} + \mathbf{v}_{2}. Which of the following is equal to B^{n}(\mathbf{u})?

2(3)^{n}\mathbf{v}_1 + (-2)^{n}\mathbf{v}_2.
(2(3) - 1)^{n}\mathbf{u}.
(2(3)^{n} - 1)\mathbf{u}.
None of the above.

Solution. We have

\begin{align*}A^2 = VDV^{-1} VDV^{-1} = V D^2 V^{-1}\end{align*}

because V^{-1} V = I is the identity matrix. Similarly, A^3 = V D^3 V^{-1}.

By linearity B^n(\mathbf{u}) = 2B^n\mathbf{v}_1 + B^n \mathbf{v}_2. But B^n(\mathbf{v}_1) = 3^n\mathbf{v}_1 and B^n(\mathbf{v}_2) = (-2)^n\mathbf{v}_2 because \mathbf{v}_1 and \mathbf{v}_2 are eigenvectors of B. Therefore B^n(\mathbf{u}) = 2(3)^n\mathbf{v}_1 + (-2)^n\mathbf{v}_2.

Positive definite matrices

A positive definite matrix A is a symmetric matrix whose eigenvalues are all positive. A positive semidefinite matrix A is a symmetric matrix whose eigenvalues are all nonnegative. Equivalently, a symmetric matrix A is positive semidefinite if \mathbf{x}' A \mathbf{x} \ge 0 for all \mathbf{x}.

Negative definite and negative semidefinite matrices are defined analogously.

(i) Is the sum of two positive definite matrices necessarily positive definite?

Solution. (i) If A and B are n \times n positive definite matrices, then A + B is also positive definite because

\begin{align*}\mathbf{x}' (A + B) \mathbf{x} &= \mathbf{x}' (A\mathbf{x} + B\mathbf{x}) \\ &= \mathbf{x}' A\mathbf{x} + \mathbf{x}' B\mathbf{x}\end{align*}

for any vector \mathbf{x} \in \mathbb{R}^n.

The Gram matrix

If A is an m\times n matrix, then A' A is its Gram matrix. The Gram matrix of A is always positive semidefinite:

Let X = A' A be a Gram matrix, and let \mathbf{v} be a vector. Which of the following is equal to \mathbf{v}' X\mathbf{v}?

\mathbf{v}' A^2\mathbf{v}.

Using your answer above, explain why a Gram matrix is always positive semidefinite, but not necessarily positive definite.

Solution. The correct answer is |A\mathbf{v}|^2 because

\begin{align*}|A\mathbf{v} |^2 &= (A\mathbf{v}) \cdot (A\mathbf{v}) \\\ &= (A\mathbf{v})' A\mathbf{v} \\\ &= \mathbf{v}' A' A\mathbf{v}.\end{align*}

From this we see that the Gram matrix is positive semidefinite because |A\mathbf{v}|^2 \geq 0. Since it is possible to have A\mathbf{v} = \mathbf{0} even if \mathbf{v} \neq \mathbf{0} (for example when A has linearly dependent columns), we see that the Gram matrix is not necessarily positive definite.

Explain why the rank of A is equal to the rank of A'A. (Hint: consider the null spaces of A and A'A.)

Solution. If A\mathbf{x} = \boldsymbol{0}, then multiplying both sides by A' gives A' A \mathbf{x} = \boldsymbol{0}. Therefore, the null space of A is a subset of the null space of A' A.

Conversely, if A' A \mathbf{x} = \boldsymbol{0}, then we can multiply this equation on the left by \mathbf{x}' to get

\begin{align*}\mathbf{x}' A' A \mathbf{x} = \boldsymbol{0},\end{align*}

which in turn implies that |A\mathbf{x}|^2 = \boldsymbol{0}. A vector has zero norm only if it's the zero vector, so we conclude that A\mathbf{x} = \boldsymbol{0}.

Since A and A' A have the same null space dimension and have the same domain (\mathbb{R}^n), they also have the same rank, by the rank-nullity theorem.

The Spectral Theorem

The eigenspace decomposition of a diagonalizable matrix is even easier to understand if the eigenvectors happen to be orthogonal. It turns out that this happens exactly when the matrix is symmetric:

Theorem (Spectral Theorem)
If A is an n\times n symmetric matrix, then A is orthogonally diagonalizable, meaning that A has n eigenvectors which are pairwise orthogonal.

Conversely, every orthogonally diagonalizable matrix is symmetric.

In other words, if A is symmetric, then the one-dimensional subspaces along which A is decomposed form a set of axes for \mathbb{R}^n which are orthogonal. In matrix terms, we have

\begin{align*}A = V \Lambda V',\end{align*}

for some orthogonal matrix V.

Although it seems that the spectral theorem may be of limited use since so many matrices are not symmetric, we will see that we can associate any rectangular matrix with a symmetric square matrix that we can apply the spectral theorem to and use to extract insight about the original matrix. This beautiful idea is called the singular value decomposition and is the subject of the next section.

Given an invertible matrix A, we are often interested in solving a system of the form A \mathbf{x} = \mathbf{b}. Our knowledge of \mathbf{b} is seldom perfect however, so it is important to consider what happens to the solution if we replace \mathbf{b} with a slightly different vector \widehat{\mathbf{b}}.

It is possible that a small change in \mathbf{b} leads to a substantial change in the vector \mathbf{x}=A^{-1}\mathbf{b}.

  • Find an invertible 2\times 2 matrix A all of whose entries are between -2 and 2 and a vector \mathbf{b} with entries between -2 and 2 and another vector \widehat{\mathbf{b}} whose components are nearly equal to those of \mathbf{b} for which A^{-1}\mathbf{b} and A^{-1}\widehat{\mathbf{b}} are not very close.

To be concrete, let's say "nearly equal" means "having ratio between 0.99 and 1.01", and let's say that "not very close" means "having a difference whose norm is greater than the norm of either". Find the eigenvalues of your matrix A.

Solution. One simple way to do this is make \mathbf{b} and \widehat{\mathbf{b}} the columns of the matrix. For example, solve(array([[1,1],[1, 1.01]]),[1,1]) returns [1,0] while solve(array([[1,1],[1, 1.01]]),[1,1.01]) returns [0,1].

Solution. One simple way to do this is make \mathbf{b} and \widehat{\mathbf{b}} the columns of the matrix. For example, [1 1; 1 1.01]  [1, 1] returns [1, 0], while [1 1; 1 1.01]  [1, 1.01] returns [0, 1].

from numpy.linalg import solve
from numpy import array
[1 1; 1 1.01] \ [1, 1]

The eigenvalues of the matrix [1 1; 1 1.01] are approximately 0.005 and 2.005. In particular, the ratio of the eigenvalues is very large. You will find that the ratio of eigenvalues for your matrix is also large, because a matrix A with a modest maximum eigenvalue ratio is backwards stable, meaning that small changes in \mathbf{b} do not lead to large changes in A^{-1}\mathbf{b},

Bruno Bruno