This will delete your progress and chat data for all chapters in this course, and cannot be undone!
Glossary
Select one of the keywords on the left…
Linear AlgebraSingular Value Decomposition
Reading time: ~60 min
In this section we will develop one of the most powerful ideas in linear algebra: the singular value decomposition. The first step on this journey is the polar decomposition.
Polar decomposition
The Gram matrix of a square matrix is a useful tool for understanding the behavior of .Let's define the matrix to be , where is the diagonalization of and is the matrix obtained by taking the square root of the diagonal entries of .Then is and satisfies
as befits the notation .
The matrix is simpler to understand than because it is symmetric and positive semidefinite, yet it transforms space very similarly to :if , then
In other words, for all , the images of under and under have equal norm. If points are the same distance from the origin, then one may be mapped to the other by a about the origin.
Therefore, for each , there is an orthogonal transformation from the range of to the range of which sends to .
It can be shown that the orthogonal transformation mapping to is the same transformation for every .Furthermore, even if the range of is not all of , we can extend this orthogonal transformation to an orthogonal transformation on by arbitrarily mapping vectors in a basis of the orthogonal complement of the range of to the orthogonal complement of the range of .In this way, we obtain the polar decomposition:
Theorem (Polar Decomposition) For any matrix , there exists an orthogonal matrix such that
This representation is useful because it represents an arbitrary square matrix as a product of matrices whose properties are easier to understand (the orthogonal matrix because it is distance- and angle-preserving, and the positive-definite matrix because it is orthogonally diagonalizable, by the Spectral Theorem.
Exercise Let's explore a fast method of computing a polar decomposition .This method actually works by calculating and then recovering as (since this is computationally faster than calculating the matrix square root). We call the orthogonal part of and the symmetric part of .
We set and define the iteration
Let's see why this converges to .
Defining and using the equation , show that Use the prior step to explain why the 's all have the same orthogonal parts and have symmetric parts converging to the identity matrix. Hint: consider the eigendecompositions of the symmetric parts. You may assume that the sequence defined by converges to 1 regardless of the starting value .
Write some code to apply this algorithm to the matrix
import numpy as np
A = np.array([[1,3,4],[7,-2,5],[-3,4,11]])
A = [1 3 4; 7 -2 5; -3 4 11]
and confirm that the resulting matrices and satisfy and .
Solution.Since both conjugation and inversion reverse the order of matrix products, we get
Since is orthogonal, , as .Since is symmetric, .So this is equal to
We see that the and have the same orthogonal part, and repeating the calculation shows that all the have the same orthogonal part. As for the symmetric parts , we see that
Let's see why this averaging process converges to the identity matrix. Symmetric matrices are diagonalizable by the Spectral Theorem, so suppose diagonalizes as .Then
Thus the 's converge to the matrix , where is the diagonal matrix whose th entry is the limit obtained when you start with and repeatedly apply the function .By the fact about this iteration given in the problem statement, we conclude that is the identity matrix. Therefore, the limit of as is equal to .
For example:
import numpy as np
def polar(A,n):
R = A
for i in range(n):
R = (R + np.linalg.inv(R.T))/2
return R, np.linalg.solve(R, A)
I = np.eye(3)
A = np.array([[1, 3, 4],[7, -2, 5], [-3, 4, 11]])
R, P = polar(A,100)
R.T @ R - I, P @ P - A.T @ A
using LinearAlgebra
function polar(A,n)
R = A
for i=1:n
R = (R + inv(R'))/2
end
R, R \ A
end
A = [1 3 4; 7 -2 5; -3 4 11]
R, P = polar(A,100)
R'*R - I, P^2 - A'*A
Both of the matrices returned on the last line have entries which are within of zero.
Exercise Show that the product of two matrices with orthonormal columns has orthonormal columns.
Solution.If and , then .
The singular value decomposition
The polar decomposition tells us that any square matrix is almost the same as some symmetric matrix, and the spectral theorem tells us that a symmetric matrix is almost the same as a simple scaling along the coordinate axes. (In both cases, the phrase "almost the same" disguises a composition with an orthogonal transformation.) We should be able to combine these ideas to conclude that any square matrix is basically the same as a simple scaling along the coordinate axes!
Let's be more precise. Suppose that is a square matrix. The polar decomposition tells us that
for some orthogonal matrix .The spectral theorem tells us that for some orthogonal matrix and a diagonal matrix with nonnegative diagonal entries.
Combining these equations, we get
Since a product of orthogonal matrices is , we can define and obtain the singular value decomposition (SVD) of :
where and are orthogonal matrices.
We can visualize the decomposition geometrically by making a figure like the one shown below, which illustrates the successive effects of each map in the composition .If we draw grid lines on the second plot (just before is applied) and propagate those grid lines to the other plots by applying the indicated maps, then we endow the domain and range of with orthogonal sets of gridlines with mapping one to the other.
We can extend the singular value decomposition to rectangular matrices (that is, matrices which are not necessarily square) by adding rows or columns of zeros to a rectangular matrix to get a square matrix, applying the SVD to that square matrix, and then trimming the resulting matrix as well as either or (depending on which dimension of is smaller). We get a decomposition of the form where is an orthogonal matrix, is an orthogonal matrix, and is a rectangular diagonal matrix.
Theorem (Singular value decomposition) Suppose that is an matrix. Then there exist orthogonal matrices and and a rectangular diagonal matrix such that
We call A = U \Sigma V' the a singular value decomposition (or SVD) of A.The diagonal entries of \Sigma are called the singular values of A.
The diagonal entries of \Sigma, which are the square roots of the eigenvalues of A' A, are called the singular values of A.The columns of U are called left singular vectors, and the columns of V are called right singular vectors.
Looking at the bottom half of the SVD figure, we see that the singular values of A are the lengths of the semi-axes of the ellipsoid in \mathbb{R}^m obtained as the image under A of the unit ball in \mathbb{R}^n.Moreover, the directions of these axes are the columns of U, since they are the images under U of the standard basis vectors. We will see an important application of this feature of the SVD in the probability chapter when we discuss principal component analysis.
As an example of how the singular value decomposition can be used to understand the structure of a linear transformation, we introduce the Moore-Penrose pseudoinverseA^+ of an m \times n matrix A.We define A^+ to be V \Sigma^+ U', where \Sigma^+ is the matrix obtained by inverting each nonzero element of \Sigma.The pseudoinverse is a swiss-army knife for solving the linear system A\mathbf{x} = \mathbf{b}:
If A is square and invertible, then A^+ = A^{-1}
If A\mathbf{x} = \mathbf{b} has no solution, then A^+\mathbf{b} is the value of \mathbf{x} which minimizes |A\mathbf{x} - \mathbf{b}|^2 (in other words, the closest thing to a solution you can get).
If A\mathbf{x} = \mathbf{b} has multiple solutions, then A^+\mathbf{b} is the solution with minimal norm.
and let U,\Sigma, and V' be the three matrices given in the problem statement.
We need to check that U,V are orthogonal, and that A = U \Sigma V'.We can verify that U,V are orthogonal by showing that their columns are orthogonal unit vectors. Equivalently, we may compute the products U' U and V' V and observe that they are identity matrices. Similarly, A = U \Sigma V' can be verified by hand or on a computer.
The formula for the Moore-Penrose pseudoinverse is
\begin{align*}A^{+} = V \Sigma^{+}U'\end{align*}
The matrix \Sigma^{+} is obtained by inverting the nonzero elements on the diagonal of \Sigma, and the transposing the resulting matrix.
Linear dependence is numerically fragile: if the columns of a matrix (with more rows than columns) are linearly dependent, then perturbing the entries slightly by adding tiny independent random numbers is almost certain to result in a matrix with linearly independent columns. However, intuition suggests that subverting the principles of linear algebra in this way going to solve any real-world problems that emerge from linear dependence relationships among columns of a matrix.
This intuition is accurate, and it highlights the utility of having a generalization of the idea of linear independence which can quantify how close a list of vectors is to having linear dependence relationships, rather than remaining within the confines of the binary labels "linearly dependent" or "linearly independent". The singular value decomposition provides such a tool.
Exercise Define a matrix with 100 rows and 5 columns, and do it in such a way that two of the five columns are nearly equal to some linear combination of the other three. Calculate the singular values of the matrix, and make a conjecture about how the number of approximate linear dependencies could have been detected from the list of singular values.
import numpy as np
Solution.We see that two of the singular values are much smaller than the other three. (Remember that you have to run the cell twice to get the plot to show.)
import numpy as np
import matplotlib.pyplot as plt
A = np.random.standard_normal((100,5))
A[:,3] = A[:,2] + A[:,1] + 1e-2*np.random.standard_normal(100)
A[:,4] = A[:,1] - A[:,0] + 1e-2*np.random.standard_normal(100)
plt.bar(range(5),np.linalg.svd(A)[1])
We conjecture that k very small singular values indicates that k columns would need to be removed to obtain a matrix which does not have approximate linear dependence relationships among its columns.
In fact, the idea developed in this exercise is used by the NumPy function np.linalg.matrix_rank to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have full rank.Thus np.linalg.matrix_rank computes the singular value decomposition and returns the number of of the matrix which are larger than a given threshold. The threshold is adjustable, but one common setting is 10^{-15} times the largest entry of the matrix times the largest dimension of the matrix.
In fact, the idea developed in this exercise is used by the Julia function rank to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have full rank.Thus rank computes the singular value decomposition and returns the number of of the matrix which are larger than a given threshold. The threshold is adjustable, but one common setting is 10^{-15} times the largest entry of the matrix times the largest dimension of the matrix.
SVD and image compression
We close this section with a computational exercise illustrating another widely applicable feature of the singular value decomposition.
Exercise
Show that if \mathbf{u}_1, \ldots, \mathbf{u}_n are the columns of U,\mathbf{v}_1, \ldots \mathbf{v}_n are the columns of V, and \sigma_1, \ldots, \sigma_n are the diagonal entries of \Sigma, then A = \sigma_{1} \mathbf{u}_{1}\mathbf{v}_{1}'+\sigma_{2}\mathbf{u}_{2}\mathbf{v}_{2}'+\cdots+ \sigma_{n}\mathbf{u}_{n}\mathbf{v}_{n}'.
The equation is useful for compression, because terms with sufficiently small singular value factors can be dropped and the remaining vectors and singular values can be stored using less space. Suppose that A is a 256 \times 128 matrix—how many entries does A have, and how many entries do \mathbf{u}_1,\mathbf{u}_2,\mathbf{u}_3,\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3 have in total?
The Python code below creates a matrix A with pixel values for the image shown. How many nonzero singular values does A have? Explain how you can tell just from looking at the picture.
import numpy as np
import matplotlib.pyplot as plt
m = 80
n = 100
a = m // 8
b = m // 4
A = np.ones((m,n))
def pixel(i,j):
if (a <= i <= b or m-b <= i <= m-a) and a <= j <= n-a:
return 0
elif (a <= j <= b or n-b <= j <= n-a) and a <= i <= m-a:
return 0
return 1
A = np.array([[pixel(i,j) for i in range(1,m+1)] for j in range(1,n+1)])
U, Σ, V = np.linalg.svd(A)
plt.matshow(A)
using LinearAlgebra, Plots
m = 80
n = 100
a = m ÷ 8
b = m ÷ 4
A = ones(m,n)
function pixel(i,j)
if (a ≤ i ≤ b || m-b ≤ i ≤ m-a) && a ≤ j ≤ n - a
0
elseif (a ≤ j ≤ b || n-b ≤ j ≤ n-a) && a ≤ i ≤ m - a
0
else
1
end
end
A = [pixel(i,j) for i=1:m,j=1:n]
U, Σ, V = svd(A)
heatmap(A)
Now add some noise to the image:
B = A + 0.05*np.linalg.standard_normal((m,n))
B = A + 0.05 * randn(m, n)
Display this new matrix B, and also find the matrix obtained by keeping only the first three terms of \sigma_{1}\mathbf{u}_{1}\mathbf{v}_{1}' +\sigma_{2}\mathbf{u}_{2}\mathbf{v}_{2}' +\cdots+\sigma_{n}\mathbf{u}_{n}\mathbf{v}_{n}' for this matrix B.Which looks more like the original image A:(i) B or (ii) the three-term approximation of B?
Hint: you can achieve this computationally either by setting some singular values to 0 or by indexing the matrices U,\Sigma, and V' appropriately. Also, you will need the function np.diagonaldiagm to generate a diagonal matrix from the vector of \Sigma values returned by svd.
import numpy as np
import matplotlib.pyplot as plt
Solution.
Let M = \sigma_1\mathbf{u}_1\mathbf{v}_1' + \cdots + \sigma_n\mathbf{u}_n\mathbf{v}_n'.We need to show that A = M.We will do this by first showing that A\mathbf{x} = M\mathbf{x} for all \mathbf{x} \in \mathbb{R}^n.
Now, A\mathbf{x} = U \Sigma V' \mathbf{x} for all \mathbf{x} \in \mathbb{R}^n.By definition of matrix-vector product, U \Sigma V' \mathbf{x} is a linear combination of the columns of U \Sigma with weights given by V' \mathbf{x}.Since \Sigma is diagonal, it is not hard to see that the i th column of U\Sigma is \Sigma_{ii}\mathbf{u}_i = \sigma_i \mathbf{u}_i.Using definition of matrix-vector product again, we find that the i th weight \left(V' \mathbf{x}\right)_i is the dot product of the i th row of V' and \mathbf{x}.But the i th row of V' is \mathbf{v}_i by definition, and thus \left(V' \mathbf{x}\right)_i = \mathbf{v}_i \cdot \mathbf{x}.Therefore,
and thus A\mathbf{x} = M \mathbf{x} for all \mathbf{x} \in \mathbb{R}^n.Since A\mathbf{x} = M\mathbf{x} for all \mathbf{x} \in \mathbb{R}^n, it follows that AB = MB for any n \times p matrix B.In particular, if B is the identity matrix in \mathbb{R}^n, we have
\begin{align*}A = AB = MB = M\end{align*}
as required.
A has 256 \times 128 = 32768 entries and \mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3, \mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3 combined have 3(256 + 128) = 1152 entries.
It can be seen from the picture that A has 3 kinds of columns: one whose components are all dark, another whose components are light in the middle, and the other whose components are dark on the outside and in the middle with strips of light in between. These columns are clearly linearly independent, and thus A has rank 3.Therefore A has 3 non-zero singular values.
We can select only the first three terms by suitably indexing the vectors, as follows:
U, Σ, V = np.linalg.svd(B)
plt.matshow(U[:,:3] * np.diag(Σ[:3]) * V.T[:3,:])
U, Σ, V = svd(B)
heatmap(U[:,1:3] * diagm(Σ[1:3]) * V'[1:3,:])