051ChainRule.lbz

The chain rule

Many functions we encounter in normal practice are compositions of other functions. For example, the function

is the composition of two other functions

This may be a little hard to see, so we use some additional variable names to make the whole thing more clear.

So that in computing h(x) we essentially compute the sequence

What happens when we have to compute the derivative of h(x)? One option is to expand the expression and then differentiate

So that

This option may not be a reasonable one for many problems. For example, imagine doing

this way.

Fortunately, there is a handy theorem to help with the problem of computing the derivative of a function that can be written as the composition of one or more other functions.

Theorem (The chain rule) If the function g(x) is differentiable at x and f(y) is differentiable at y = g(x), then the composite function

h(x) = f(g(x))

is differentiable at x and has derivative

The good news about this theorem is that it is actually easier to apply than it looks. All you have to be able to do is to look at an expression and identify the inner and outer functions. Applied to the example above, this gives something like

u = h(x) = f(g(x))) = f(y)

In time, and with further experience, you will be able to go directly from

An easier and less confusing way to apply the chain rule is to imagine peeling off layers of functions and differentiating one layer at a time. For example, consider this function

The function is built up in three layers. The outer layer is the application of the sine function. The next layer in is the squaring function, and the innermost layer is the simple 1-x function. To differentiate this, we differentiate one layer at a time, getting