Glossary

Select one of the keywords on the left…

Multivariable CalculusThe chain rule

Reading time: ~15 min

One way of describing the chain rule is to say that derivatives of compositions of differentiable functions may be obtained by linearizing. If linear functions (functions of the form x\mapsto mx + b) are composed, then the slope of the composition is the product of the slopes of the functions being composed. Since differentiable functions are practically linear if you zoom in far enough, they behave the same way under composition.

The chain rule in multivariable calculus works similarly. If we compose a differentiable function \mathbf{r}:[a,b]\to \mathbb{R}^2 with a differentiable function f: \mathbb{R}^2 \to \mathbb{R}^1, we get a function whose derivative is

\begin{align*}(f \circ \mathbf{r})'(t) = \frac{\partial f}{\partial \mathbf{x}}(\mathbf{r}(t)) \mathbf{r}'(t)\end{align*}

Note that the right-hand side can also be written as (\nabla f)(\mathbf{r}(t)) \cdot \mathbf{r}'(t), since \frac{\partial f}{\partial \mathbf{x}}(\mathbf{r}(t)) is a row vector, and the product of a row vector and a column vector is the same as the dot product of the of the row vector and the column vector. We can explain this formula geometrically: the change that results from making a small move from \mathbf{r}(t) to \mathbf{r}(t) + \mathbf{r}'(t) \Delta t is the dot product of the gradient of f and the small step \mathbf{r}'(t) \Delta t.

We visualize XEQUATIONX4192XEQUATIONX by drawing the points \mathbf{r}(t), which trace out a curve in the plane. We visualize f only by showing the direction of its gradient at the point \mathbf{r}(t). The change in f from one point on the curve to another is the dot product of the change in position and the gradient.

Exercise
Suppose that \frac{\partial f}{\partial x}(3,2) = 4, that \frac{\partial f}{\partial y}(3,2) = -2, and that x(t) = 1 + 2t and y(t) = 4 - 2t^2. Find the derivative of the function f(x(t),y(t)) at the point t = 1.

Solution. The chain rule implies that the derivative of f(x(t),y(t)) is

\begin{align*}[f_x(x(t),y(t)), f_y(x(t),y(t))]\cdot [x'(t), y'(t)] = (4)(2) + (-2)(-4) = \boxed{16}.\end{align*}

Exercise
Find the derivative with respect to t of the function g(t) = t^t by writing the function as f(x(t),y(t)) where f(x,y) = x^y and x(t) = t and y(t)=t.

Solution. Let f(x(t),y(t)) = x^y where x(t) = t and y(t) = t. We have that \frac{\partial f}{\partial x} = yx^{y-1} and \frac{\partial f}{\partial y} = x^{y} \ln{x}. Since both derivatives of x and y with respect to t are 1, the chain rule implies that

\begin{align*}g'(t) = t\cdot t^{t-1} + t^t\ln{t} = t^t(1 + \ln{t}).\end{align*}

Exercise
Suppose that g(\mathbf{y}) = A\mathbf{x} for some matrix A, and suppose that f is the componentwise squaring function (in other words, f(\mathbf{y}) = [y_1^2, y_2^2, \ldots, y_n^2]). Find the derivative of f \circ g.

Note: you might find it convenient to express your answer using the function diag which maps a vector to a matrix with that vector along the diagonal.

Solution. The derivative matrix of f is diagonal, since the derivative of y_j^2 with respect to y_i is zero unless i = j. The diagonal entries are 2y_1, 2y_2, \ldots. The derivative of g is A, as we saw in the section on matrix differentiation. Therefore, the derivative of the composition is

\begin{align*}\left[ \begin{array}{cccc} 2(A\mathbf{x})_1 & 0 & \cdots & 0 \\\ 0 & (2A\mathbf{x})_2 & \cdots & 0 \\\ \vdots & & \ddots & \vdots \\\ 0 & 0 & \cdots & 2(A\mathbf{x})_n \end{array} \right]A = 2\operatorname{diag}(A\mathbf{x})A.\end{align*}

We can check this exercise numerically:

import numpy as np
A = np.random.random_sample((5,5))
x = np.random.random_sample(5)
Δx = 1e-6 * np.random.random_sample(5)

def f(y):
    "Componentwise square x"
    return y**2

def g(x):
    "Multiply A by x"
    return A @ x

derivative = 2 * np.diag(A @ x) @ A
np.allclose(f(g(x + Δx)) - f(g(x)), derivative @ Δx)

Bruno
Bruno Bruno