Glossary

Select one of the keywords on the left…

Multivariable CalculusMatrix differentiation

Reading time: ~10 min

Just as elementary differentiation rules are helpful for optimizing single-variable functions, matrix differentiation rules are helpful for optimizing expressions written in matrix form. This technique is used often in statistics.

Suppose \mathbf{f} is a function from \mathbb{R}^n to \mathbb{R}^m. Writing \mathbf{f}(\mathbf{x}) = \mathbf{f}(x_1, \ldots, x_n), we define the Jacobian matrix (or derivative matrix) to be

\begin{align*}\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \left[ \begin{array}{cccc} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n} \end{array}\right]\end{align*}

Note that if m=1, then differentiating f with respect to \mathbf{x} is the same as taking the gradient of f.

With this definition, we obtain the following analogues to some basic single-variable differentiation results: if A is a constant matrix, then

\begin{align*}\frac{\partial}{\partial \mathbf{x}} (A \mathbf{x}) &= A \\\ \frac{\partial}{\partial \mathbf{x}} (\mathbf{x}' A) &= A' \\\ \frac{\partial}{\partial \mathbf{x}} (\mathbf{u}' \mathbf{v}) &= \mathbf{u}'\frac{\partial \mathbf{v}}{\partial \mathbf{x}} + \mathbf{v}'\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\end{align*}

The third of these equations is the rule.

The Hessian of a function f:\mathbb{R}^n \to \mathbb{R} may be written in terms of the matrix differentiation operator as follows:

\begin{align*}\mathbf{H}(\mathbf{x}) = \frac{\partial}{\partial \mathbf{x}} \left(\frac{\partial f}{\partial \mathbf{x}}\right)'.\end{align*}

Some authors define \frac{\partial f}{\partial \mathbf{x}'} to be \left(\frac{\partial f}{\partial \mathbf{x}}\right)', in which case the Hessian operator can be written as \frac{\partial^2}{\partial \mathbf{x} \partial \mathbf{x}'}.

Exercise
Let f: \mathbb{R}^n \to \mathbb{R} be defined by f(\mathbf{x}) = \mathbf{x}' A \mathbf{x} where A is a symmetric matrix. Find \frac{\partial f}{\partial \mathbf{x}}.

Solution. We can apply the product rule to find that

\begin{align*}\frac{\partial f}{\partial \mathbf{x}} = \mathbf{x}' \frac{\partial}{\partial \mathbf{x}}(A\mathbf{x}) + (A\mathbf{x})' \frac{\partial \mathbf{x}}{\partial \mathbf{x}} = \mathbf{x}' A + \mathbf{x}' A = 2\mathbf{x}' A.\end{align*}

Exercise
Suppose A is an m\times n matrix and \mathbf{b} \in \mathbb{R}^m. Use matrix differentiation to find the vector \mathbf{x} which minimizes |A \mathbf{x} - \mathbf{b}|^2. Hint: begin by writing |A \mathbf{x} - \mathbf{b}|^2 as (A \mathbf{x} - \mathbf{b})' (A \mathbf{x} - \mathbf{b}). You may assume that the rank of A is n.

Solution. We write

\begin{align*}|A \mathbf{x} - \mathbf{b}|^2 &= (A \mathbf{x} - \mathbf{b})' (A \mathbf{x} - \mathbf{b}) \\\ &= \mathbf{x}' A' A \mathbf{x} - \mathbf{b}' A \mathbf{x} + \mathbf{x}' A' \mathbf{b} + |\mathbf{b}|^2.\end{align*}

To minimize this function, we find its gradient

\begin{align*}\frac{\partial}{\partial \mathbf{x}}|A \mathbf{x} - \mathbf{b}|^2 = 2\,\mathbf{x}' A' A - \mathbf{b}' A + (A'\mathbf{b})' = 2\mathbf{x}' A' A- 2\mathbf{b}' A\end{align*}

and set it equal to \boldsymbol{0} to get

\begin{align*}\mathbf{x}' = \mathbf{b}' A(A' A)^{-1} \implies \mathbf{x} = (A' A)^{-1} A' \mathbf{x}.\end{align*}

(We know that A' A has an inverse matrix because its rank is equal to that of A, which we assumed was m.)

Bruno
Bruno Bruno