Differentiation
Let \(U\subset\real^n\) and let \(F:U\rightarrow\real^n\) be a function. How should we define differentiability of \(F\) at some point \(a\in U\)? Recall that for a function \(f:I\rightarrow\real\), where \(I\subset\real\), we say that \(f\) is differentiable at \(a\in I\) if \[ \lim_{x\rightarrow a} \frac{f(x)-f(a)}{x-a} \] exists. In this case, we denote \(f'(a)=\lim_{x\rightarrow a} \frac{f(x)-f(a)}{x-a}\) and we call \(f'(a)\) the derivative of \(f\) at \(a\). As it is written, the above definition does not make sense for \(F\) since division of vectors is not well-defined (or at least we have not defined it). An equivalent definition of differentiability of \(f\) at \(a\) is that there exists a number \(m\in\real\) such that \[ \lim_{x\rightarrow a} \frac{f(x)-f(a)-m(x-a)}{x-a} = 0 \] which is equivalent to asking that \[ \lim_{x\rightarrow a} \frac{|f(x)-f(a)-m(x-a)|}{|x-a|} = 0. \] The number \(m\) is then denoted by \(m=f'(a)\) as before. Another way to think about the derivative \(m\) is that the affine function \(g(x) = f(a) + m x\) is a good approximation to \(f(x)\) for points \(x\) near \(a\). The linear part of the affine function \(g\) is \(\ell(x) = mx\). Thought of in this way, the derivative of \(f\) at \(a\) is a linear function.
Let \(U\) be a subset of \(\real^n\). A mapping \(F:U\rightarrow\real^m\) is said to be differentiable at \(a\in U\) if there exists a linear mapping \(L:\real^n\rightarrow\real^m\) such that
\[
\lim_{x\rightarrow a} \frac{\norm{F(x)-F(a) - L(x-a)}}{\norm{x-a}} = 0.
\]
In the definition of differentiability, the expression \(L(x-a)\) denotes the linear mapping \(L\) applied to the vector \((x-a)\in \real^n\). An equivalent definition of differentiability is that
\[
\lim_{h\rightarrow 0} \frac{\norm{F(x+h)-F(a) - L(h)}}{\norm{h}} = 0
\]
where again \(L(h)\) denotes \(h\in\real^n\) evaluated at \(L\). It is not hard to show that the linear mapping \(L\) in the above definition is unique when \(U\subset\real^m\) is an open set. For this reason, we will deal almost exclusively with the case that \(U\) is open without further mention. We therefore call \(L\) the derivative of \(F\) at \(a\) and denote it instead by \(L=DF(a)\). Hence, by definition, the derivative of \(F\) at \(a\) is the unique linear mapping \(DF(a):\real^n\rightarrow\real^m\) satisfying
\[
\lim_{x\rightarrow a} \frac{\norm{F(x)-F(a) - DF(a)(x-a)}}{\norm{x-a}} = 0.
\]
Applying the definition of the limit, given arbitrary \(\eps \gt 0\) there exists \(\delta \gt 0\) such that if \(\norm{x-a} \lt \delta\) then
\[
\frac{\norm{F(x)-F(a) - DF(a)(x-a)}}{\norm{x-a}} \lt \eps
\]
or equivalently
\[
\norm{F(x)-F(a) - DF(a)(x-a)} \lt \eps \norm{x-a}.
\]
If \(F:U\rightarrow\real^m\) is differentiable at each \(x\in U\) then \(x\mapsto DF(x)\) is a mapping from \(U\) to the space of linear maps from \(\real^n\) to \(\real^m\). In other words, if we denote by \(\mathcal{L}(\real^n;\real^m)\) the space of linear maps from \(\real^n\) to \(\real^m\) then we have a well-defined mapping \(DF:U\rightarrow \lin(\real^n;\real^m)\) called the derivative of \(F\) on \(U\) which assigns the derivative of \(F\) at each \(x\in U\).
We now relate the derivative of \(F\) with the derivatives of its component functions. To that end, we need to recall some basic facts from linear algebra and the definition of the partial derivative. For the latter, recall that a function \(f:U\subset\real^n\rightarrow\real\), has partial derivative at \(a\in U\) with respect to \(x_i\), if the following limit exists
\[
\lim_{t\rightarrow 0} \frac{f(a_1,\ldots,a_{i-1}, a_i + t, a_{i+1}, \ldots, a_n) - f(a)}{t}
\]
or equivalently, if there exists a number \(m_i\in\real\) such that
\[
0 = \lim_{t\rightarrow 0} \frac{f(a+e_i t) - f(a) - m_i t}{t}
\]
where \(e_i = (0,\ldots,0,1,0,\ldots,0)\) denotes the \(i\)th standard basis vector in \(\real^n\). We then denote \(m_i = \pfrac{f}{x_i}(a)\). Now, given any linear map \(L:\real^n\rightarrow\real^m\), the action of \(L\) on vectors in \(\real^n\) can be represented as matrix-vector multiplication once we choose a basis for \(\real^n\) and \(\real^m\). Specifically, if we choose the most convenient bases in \(\real^n\) and \(\real^m\), namely the standard bases, then
\[
L(x) = Ax
\]
where \(A\in\real^{m\times n}\) and the the \((j,i)\) entry of the matrix \(A\) is the \(j\)th component of the vector \(Ae_i\in\real^m\). We can now prove the following.
Let \(U\subset\real^n\) be open and suppose that \(F:U\rightarrow\real^m\) is differentiable at \(a\in U\), and write \(F=(f_1,f_2,\ldots,f_m)\). Then the partial derivatives \(\pfrac{f_j}{x_i}(a)\) exist, and the matrix representation of \(DF(a)\) in the standard bases in \(\real^n\) and \(\real^m\) is
\[
\begin{bmatrix}
\pfrac{f_1}{x_1} & \pfrac{f_1}{x_2} & \cdots & \pfrac{f_1}{x_n} \\[2ex]
\pfrac{f_2}{x_1} & \pfrac{f_2}{x_2} & \cdots & \pfrac{f_2}{x_n} \\[2ex]
\vdots & \vdots & \ddots & \vdots \\[2ex]
\pfrac{f_m}{x_1} & \pfrac{f_m}{x_2} & \cdots & \pfrac{f_m}{x_n}
\end{bmatrix}
\]
where all partial derivatives are evaluated at \(a\). The matrix above is called the Jacobian matrix of \(F\) at \(a\).
Let \(m_{j,i}\) denote the \((j,i)\) entry of the matrix representation of \(DF(a)\) in the standard bases in \(\real^n\) and \(\real^m\), that is, \(m_{j,i}\) is the \(j\)th component of \(DF(a)e_i\). By definition of differentiability, it holds that
\[
0 = \lim_{x\rightarrow a} \frac{\norm{F(x)-F(a) - DF(a)(x-a)}}{\norm{x-a}}.
\]
Let \(x = a + te_i\) where \(e_i\in\real^n\) is the \(i\)th standard basis vector. Since \(U\) is open, \(x\in U\) provided \(t\) is sufficiently small. Then since \(\norm{x-a}=\norm{t e_i} = |t|\rightarrow 0\) iff \(\norm{x-a}\rightarrow 0\) we have
\begin{align*}
0 &= \lim_{t\rightarrow 0} \frac{\norm{F(a+te_i)-F(a) - DF(a)e_i t}}{|t|} \\
&= \lim_{t\rightarrow 0} \norm{\frac{1}{t} \big[ F(a+te_i) - F(a) - DF(a) e_i t \big] }.
\end{align*}
It follows that each component of the vector \(\frac{1}{t} \big[ F(a+te_i) - F(a) - DF(a) e_i t \big]\) tends to \(0\) as \(t\rightarrow 0\). Hence, for each \(j\in \{1,2,\ldots,m\}\) we have
\[
0 = \lim_{t\rightarrow 0} \frac{1}{t} (f_j(a+te_i) - f_j(a) - m_{j,i} t).
\]
Hence, \(\pfrac{f_j}{x_i}(a)\) exists and \(m_{j,i} = \pfrac{f_j}{x_i}(a)\) as claimed.
It is customary to write
\[
DF(a) = \begin{bmatrix}
\pfrac{f_1}{x_1} & \pfrac{f_1}{x_2} & \cdots & \pfrac{f_1}{x_n} \\[2ex]
\pfrac{f_2}{x_1} & \pfrac{f_2}{x_2} & \cdots & \pfrac{f_2}{x_n} \\[2ex]
\vdots & \vdots & \ddots & \vdots \\[2ex]
\pfrac{f_m}{x_1} & \pfrac{f_m}{x_2} & \cdots & \pfrac{f_m}{x_n}
\end{bmatrix}
\]
since for any \(x\in \real^n\) the vector \(DF(a)x\) is the Jacobian matrix of \(F\) at \(a\) multiplied by \(x\) (all partials are evaluated at \(a\)). When not explicitly stated, the matrix representation of \(DF(a)\) will always mean the Jacobian matrix representation.
We now prove that differentiability implies continuity. To that end, we first recall that if \(A\in\real^{m\times n}\) and \(B\in\real^{n\times p}\) then
\[
\norm{AB}_2 \leq \norm{A}_2 \norm{B}_2.
\]
The proof of this fact is identical to the one in Example 9.4.16. In particular, if \(x\in\real^n\) then \(\norm{Ax}_2 \leq \norm{A}_2\norm{x}\).
Let \(U\subset\real^n\) be an open set. If \(F:U\rightarrow\real^m\) is differentiable at \(a\in U\) then \(F\) is continuous at \(a\).
Let \(\eps_1=1\). Then there exists \(\delta_1 \gt 0\) such that if \(\norm{x-a} \lt \delta_1\) then
\[
\norm{F(x) - F(a) - DF(a)(x-a) + DF(a)(x-a)} \lt 1\cdot \norm{x-a}.
\]
Then if \(\norm{x-a} \lt \delta_1\) then
\begin{align*}
\norm{F(x) - F(a)} &= \norm{F(x) - F(a) - DF(a)(x-a) + DF(a)(x-a)} \\[2ex]
&\leq \norm{F(x)-F(a)-DF(a)(x-a)} + \norm{DF(a) (x-a)} \\[2ex]
&\leq \norm{x-a} + \norm{DF(a)}_2 \norm{x-a}
\end{align*}
and thus \(\norm{F(x)-F(a)} \lt \eps\) provided
\[
\norm{x-a} \lt \min\{\delta_1, \eps/(1+\norm{DF(a)}_2)\}.
\]
Hence, \(F\) is continuous at \(a\).
Notice that Theorem 10.1.2 says that if \(DF(a)\) exists then all the relevant partials exist. However, it does not generally hold that if all the relevant partials exist then \(DF(a)\) exists. The reason is that partial derivatives are derivatives along the coordinate axes whereas, as seen from the definition, the limit used to define \(DF(a)\) is along any direction that \(x\rightarrow a\).
Consider the function \(f:\real^2\rightarrow\real\) defined as
\[
f(x,y) =
\begin{cases}
\frac{2xy}{x^2+y^2}, & (x,y)\neq (0,0)\\
0, & (x,y)=(0,0)
\end{cases}
\]
We determine whether \(\frac{\partial f}{\partial x}(0,0)\) and \(\frac{\partial f}{\partial y}(0,0)\) exist. To that end, we compute
\begin{align*}
\lim_{t\rightarrow 0} \frac{f(x+t, 0) - f(0,0)}{t} &= \lim_{t\rightarrow 0} \frac{0}{t} = 0 \\[2ex]
\lim_{t\rightarrow 0} \frac{f(0, y+t) - f(0,0)}{t} &= \lim_{t\rightarrow 0} \frac{0}{t} = 0
\end{align*}
Therefore, \(\frac{\partial f}{\partial x}(0,0)\) and \(\frac{\partial f}{\partial y}(0,0)\) exist and are both equal to zero. It is straightforward to show that \(f\) is not continuous at \((0,0)\) and therefore not differentiable at \((0,0)\).
The previous examples shows that existence of partial derivatives is a fairly weak assumption with regards to differentiability, in fact, even with regards to continuity. The following theorem gives a sufficient condition for \(DF\) to exist in terms of the partial derivatives.
Let \(U\subset\real^n\) be an open set and consider \(F:U\rightarrow\real^m\) with \(F=(f_1,f_2,\ldots,f_m)\). If each partial derivative function \(\pfrac{f_j}{x_i}\) exists and is continuous on \(U\) then \(F\) is differentiable on \(U\).
We will omit the proof of Theorem 10.1.5.
Let \(F:\real^2\rightarrow\real^3\) be defined by
\[
F(x) = (x_1\sin(x_2), x_1x_2^2, \ln(x_1^2+1)+2x_2).
\]
Explain why \(DF(x)\) exists for each \(x\in\real^2\) and find \(DF(x)\).
It is clear that the component functions of \(F\) that are given by \(f_1(x) = x_1\sin(x_2)\), \(f_2(x)=x_1x_2^2\), and \(f_3(x) = \ln(x_1^2+1)+2x_2\) have partial derivatives that are continuous on all of \(\real^2\). Hence, \(F\) is differentiable on \(\real^2\). Then
\[
DF(x) =
\begin{bmatrix}
\sin(x_2) & x_1\cos(x_2)\\[2ex]
x_2^2 & 2x_1x_2 \\[2ex]
\frac{2x_1}{x_1^2+1} & 2
\end{bmatrix}
\]
Prove that the given function is differentiable on \(\real^2\).
\[
f(x,y) =
\begin{cases}
\frac{x^2y^2}{\sqrt{x^2+y^2}}, & (x,y)\neq (0,0)\\[2ex]
0, & (x,y)=(0,0)
\end{cases}
\]
We compute
\[
\lim_{t\rightarrow 0} \frac{f(0+t,0) - f(0,0)}{t} = \lim_{t\rightarrow} \frac{ \frac{0}{\sqrt{t^2}}}{t} = 0
\]
and thus \(\pfrac{f}{x}(0,0)=0\). A similar computations shows that \(\pfrac{f}{y}(0,0)=0\). On the other hand, if \((x,y)\neq (0,0)\) then
\begin{align*}
\pfrac{f}{x}(x,y) &= \frac{xy^2(x^2+2y^2)}{(x^2+y^2)^{3/2}}\\
\pfrac{f}{y}(x,y) &= \frac{x^2y(2x^2+y^2)}{(x^2+y^2)^{3/2}}.
\end{align*}
To prove that \(Df(x,y)\) exists for any \((x,y)\in\real^2\), it is enough to show that \(\pfrac{f}{x}\) and \(\pfrac{f}{y}\) are continuous on \(\real^2\) (Theorem 10.1.5). It is clear that \(\pfrac{f}{x}\) and \(\pfrac{f}{y}\) are continuous on the open set \(U=\real^2\backslash\hspace{-0.3em}\{(0,0)\}\) and thus \(Df\) exists on \(U\). Now consider the continuity of \(\pfrac{f}{x}\) at \((0,0)\). Using polar coordinates \(x=r\cos(\theta)\) and \(y=r\sin(\theta)\), we can write
\begin{align*}
\pfrac{f}{x}(x,y)&= \frac{xy^2(x^2+2y^2)}{(x^2+y^2)^{3/2}} \\[2ex]
&= \frac{ r^3\cos(\theta)\sin^2(\theta)(r^2\cos^2(\theta) + 2r^2\sin^2(\theta) }{ r^3} \\[2ex]
&= r^2 \cos(\theta)\sin^2(\theta)(\cos^2(\theta) + 2\sin^2(\theta) )
\end{align*}
Now \((x,y)\rightarrow (0,0)\) if and only if \(r\rightarrow 0\) and thus
\begin{align*}
\lim_{(x,y)\rightarrow (0,0)} \pfrac{f}{x}(x,y) &= \lim_{r\rightarrow 0}\big[ r^2 \cos(\theta)\sin^2(\theta)(\cos^2(\theta) + 2\sin^2(\theta) )\big]\\
& = 0
\end{align*}
In other words, \(\lim_{(x,y)\rightarrow (0,0)} \pfrac{f}{x}(x,y) = \pfrac{f}{x}(0,0)\) and thus \(\pfrac{f}{x}\) is continuous at \((0,0)\). A similar computation shows that \(\pfrac{f}{y}\) is continuous at \((0,0)\). Hence, by Theorem 10.1.5, \(Df\) exists on \(\real^2\).
If \(F:U\subset\real^n\rightarrow\real^m\) is differentiable on \(U\) and \(m=1\), then \(DF\) is called the gradient of \(F\) and we write \(\nabla F\) instead of \(DF\). Hence, in this case,
\[
\nabla F(x) = \begin{bmatrix} \pfrac{F}{x_1} & \pfrac{F}{x_2} & \cdots & \pfrac{F}{x_n}\end{bmatrix}
\]
On the other hand, if \(n=1\) and \(m\geq 2\) then \(F:U\subset\real \rightarrow\real^m\) is a curve in \(\real^m\). In this case, it is customary to use lower-case letters such as \(c\), \(\alpha\), or \(\gamma\) instead of \(F\), and use \(I\) for the domain instead of \(U\). In any case, since \(c:I\subset\real\rightarrow\real^m\) is a function of one variable we use the notation \(c(t)=(c_1(t),c_2(t), \ldots, c_m(t))\) and the derivative of \(c\) is denoted by
\[
\frac{dc}{dt} = c'(t) = (c_1'(t), c_2'(t), \ldots, c_m'(t))
\]
where all derivatives are derivatives of single-variable-single-valued functions.
Exercises
Let \(f, g: U\subset\real^n\rightarrow\real^m\) be differentiable functions at \(a\in U\). Prove by definition that \(h=f+g\) is differentiable at \(a\) and that \(Dh = Df + Dg\).
Recall that a mapping \(F:\real^n\rightarrow\real^m\) is said to be linear if \(F(x+y) = F(x) + F(y)\) and \(F(\alpha x) = \alpha F(x)\), for all \(x,y\in \real^n\) and \(\alpha \in \real\). Prove that if \(F\) is linear then \(DF(a)=F\) for all \(a\in\real^n\).
Let \(F:\real^n\rightarrow \real^m\) and suppose that there exists \(M \gt 0\) such that \(\norm{F(x)} \leq M \norm{x}^2\) for all \(x\in\real^n\). Prove that \(F\) is differentiable at \(a=0\in\real^n\) and that \(DF(a) = 0\).
Determine if the given function is differentiable at \((x,y)=(0,0)\).
\[
f(x,y) =
\begin{cases}
\frac{xy}{\sqrt{x^2+y^2}}, & (x,y)\neq (0,0)\\[2ex] 0, & (x,y)=(0,0)
\end{cases}
\]
Compute \(DF(x,y,z)\) if \(F(x,y,z) = (z^{xy}, x^2, \tan(xyz))\).
Differentiation Rules and the MVT
Let \(U\subset\real^n\) and \(W\subset\real^m\) be open sets. Suppose that \(F:U\rightarrow\real^m\) is differentiable at \(a\), \(F(U)\subset W\), and \(G:W\rightarrow\real^p\) is differentiable at \(F(a)\). Then \((G\circ F):U\rightarrow\real^p\) is differentiable at \(a\) and
\[
D(G\circ F)(a) = DG(F(a)) \circ DF(a)
\]
Verify the chain rule for the composite function \(H=G\circ F\) where \(F:\real^3\rightarrow\real^2\) and \(G:\real^2\rightarrow\real^2\) are
\begin{align*}
F(x_1,x_2,x_3) &= \begin{bmatrix} x_1 - 3x_2 \\[2ex] x_1x_2x_3 \end{bmatrix}\\
G(y_1,y_2) &= \begin{bmatrix} 2y_1 + y_2 \\[2ex] \sin(y_2) \end{bmatrix}.
\end{align*}
An important special case of the chain rule is the composition of a curve \(\gamma:I\subset\real\rightarrow\real^n\) with a function \(f:U\subset\real^n\rightarrow\real\). The composite function \(f\circ\gamma:I\rightarrow\real\) is a single-variable and single-valued function. In this case, if \(\gamma'(t)\) is defined for all \(t\in I\) and \(\nabla f(x)\) exists at each \(x\in U\) then
\begin{align*}
D(f\circ\gamma)(t) &= \nabla f (\gamma(t)) \cdot \gamma'(t)\\
& = \begin{bmatrix} \pfrac{f}{x_1}&\pfrac{f}{x_2}&\cdots&\pfrac{f}{x_n}\end{bmatrix}
\begin{bmatrix}\gamma_1'(t)\\ \gamma_2'(t) \\ \vdots \\ \gamma_n'(t)\end{bmatrix}\\
& = \sum_{i=1}^n \pfrac{f}{x_i}(\gamma(t)) \gamma'_i(t).
\end{align*}
In the case that \(\gamma(t) = a + te\) and \(e\in\real^n\) is a unit vector, that is, \(\norm{e}=1\), then
\begin{align*}
\lim_{t\rightarrow 0} \frac{f(a+te) - f(a)}{t} &= D(f\circ\gamma)(0)\\
& = \nabla f(\gamma(0)) \cdot \gamma'(0)\\
& = \nabla f (a) \cdot e
\end{align*}
is called the directional derivative of \(f\) at \(a\) in the direction \(e\in\real^n\).
Let \(f:\real\rightarrow\real\) and \(F:\real^2\rightarrow\real\) be differentiable and suppose that \(F(x,f(x)) = 0\). Prove that if \(\pfrac{F}{y}\neq 0\) then \(f'(x) = - \frac{\partial F/\partial x}{\partial F/\partial y}\) where \(y=f(x)\).
Below is a version of the product rule for multi-variable functions.
Let \(U\subset\real^n\) be open and suppose that \(F:U\rightarrow\real^m\) and \(g:U\rightarrow \real\) are differentiable at \(a\in U\). Then the function \(G= gF:U\rightarrow\real^m\) is differentiable at \(a\in U\) and
\[
D(gF)(a) = F(a) \cdot \nabla g(a) + g(a) DF(a).
\]
Verify the product rule for \(G= gF\) if \(g:\real^3\rightarrow\real\) and \(F:\real^3\rightarrow\real^3\) are
\begin{align*}
g(x_1,x_2,x_3) &= x_1^2x_3 - e^{x_2}\\
F(x_1,x_2,x_3) &= \begin{bmatrix} x_1x_2 \\ \ln(x_3^2+1) \\ 3x_1-x_2-x_3\end{bmatrix}
\end{align*}
Let \(f, g: \real^n\rightarrow\real\) be differentiable functions. Find an expression of \(\nabla (fg)\) in terms of \(f,g,\nabla f\), and \(\nabla g\).
Let \(f:U\subset\real^n\rightarrow\real\) be a differentiable function. Suppose that \(\gamma:[a,b]\rightarrow\real^n\) is differentiable. Prove that \(f(\gamma(t)) = f(\gamma(a))\) for all \(t\in [a,b]\) if and only if \(\nabla f(\gamma(t))\cdot \gamma'(t) = 0\) for all \(t\in [a,b]\).
Recall the mean value theorem (MVT) on \(\real\). If \(f:[a,b]\rightarrow\real\) is continuous on \([a,b]\) and differentiable on \((a,b)\) then there exists \(c\in (a,b)\) such that \(f(b)-f(a) = f'(c) (b-a)\). The MVT does not generally hold for a function \(F:U\subset\real^n\rightarrow\real^m\) without some restrictions on \(U\) and, more importantly, on \(m\). For instance, consider \(f:[0,1]\rightarrow\real^2\) defined by \(f(x) = (x^2, x^3)\). Then \(f(1)-f(0) = (1,1)-(0,0) = (1,1)\) while \(f'(c) (1-0) = (2c, 3c^2)\) and there is no \(c\in\real\) such that \((1,1) = (2c, 3c^2)\). With regards to the domain \(U\), we will be able to generalize the MVT for points \(a, b\in U\) provided all points on the line segment joining \(a\) and \(b\) are contained in \(U\). Specifically, the line segment joining \(x, y\in U\) is the set of points
\[
\{ z \in\real^n\;|\; z = (1-t)x + t y,\; t\in [0,1]\}.
\]
Hence, the image of the curve \(\gamma:[0,1]\rightarrow\real^n\) given by \(\gamma(t) = (1-t)x + ty\) is the line segment joining \(x\) and \(y\). Even if \(U\subset\real^n\) is open, the line segment joining \(x,y\in U\) may not be contained in \(U\) (see Figure 10.1).
Let \(U\subset\real^n\) be open and assume that \(f:U\rightarrow\real\) is differentiable on \(U\). Let \(x,y\in U\) and suppose that the line segment joining \(x, y \in U\) is contained entirely in \(U\). Then there exists \(c\) on the line segment joining \(x\) and \(y\) such that \(f(y) - f(x) = Df(c) (y-x)\).
Let \(\gamma(t) = (1-t)x + ty\) for \(t\in [0,1]\). By assumption, \(\gamma(t) \in U\) for all \(0\leq t\leq 1\). Consider the function \(h(t) = f(\gamma(t))\) on \([0,1]\). Then \(h\) is continuous on \([0,1]\) and by the chain rule is differentiable on \((0,1)\). Hence, applying the MVT on \(\real\) to \(h\) there exists \(t^*\in (0,1)\) such that \(h(1) - h(0) = h'(t^*) (1-0)\). Now \(h(0) = f(\gamma(0)) = f(x)\) and \(h(1) = f(\gamma(1)) = f(y)\), and by the chain rule,
\begin{align*}
h'(t^*) &= Df(\gamma(t^*)) \gamma'(t^*)\\
&= Df(\gamma(t^*)) (y-x).
\end{align*}
Hence,
\[
f(y) - f(x) = Df(\gamma(t^*)) (y-x)
\]
and the proof is complete.
Let \(U\subset\real^n\) be open and assume that \(F=(f_1,f_2,\ldots,f_m):U\rightarrow\real^m\) is differentiable on \(U\). Let \(x,y\in U\) and suppose that the line segment joining \(x, y \in U\) is contained entirely in \(U\). Then there exists \(c_1, c_2, \ldots, c_m \in U\) on the line segment joining \(x\) and \(y\) such that \(f_i(y) - f_i(x) = Df_i (c_i) (y-x)\) for \(i=1,2,\ldots,m\).
Apply the MVT to each component function \(f_i:U\rightarrow\real\)
A set \(U\subset\real^n\) is said to be convex if for any \(x,y\in U\) the line segment joining \(x\) and \(y\) is contained in \(U\). Let \(F:U\rightarrow\real^m\) be differentiable. Prove that if \(U\) is an open convex set and \(DF=0\) on \(U\) then \(F\) is constant on \(U\).
Exercises
Let \(U\subset\real^n\) be an open set satisfying the following property: for any \(x,y\in U\) there is a continuous curve \(\gamma:[0,1]\rightarrow\real^n\) such that \(\gamma\) is differentiable on \((0,1)\) and \(\gamma(0) = x\) and \(\gamma(1) = y\).
- Give an example of a non-convex set \(U\subset\real^2\) satisfying the above property.
- Prove that if \(U\) satisfies the above property and \(f:U\rightarrow\real\) is differentiable on \(U\) with \(Df=0\) then \(f\) is constant on \(U\).
- Give an example of a non-convex set \(U\subset\real^2\) satisfying the above property.
- Prove that if \(U\) satisfies the above property and \(f:U\rightarrow\real\) is differentiable on \(U\) with \(Df=0\) then \(f\) is constant on \(U\).
The Space of Linear Maps
Let \(U\) be an open subset of \(\real^n\). Recall that if \(F:U\rightarrow\real^n\) is differentiable at each \(x\in U\) then \(DF:U\rightarrow\mathcal{L}(\real^n;\real^n)\) denotes the derivative of \(F\) on \(U\). The space of linear maps \(\mathcal{L}(\real^n;\real^n)\) is a vector space which afterSolutions to Differential Equations
A differential equation on \(\real^n\) is an equation of the form \begin{equation}\label{eqn:ode} x'(t) = F(x(t)) \end{equation} where \(F:\real^n\rightarrow\real^n\) is a given function and \(x:\real\rightarrow\real^n\) is the unknown in \eqref{eqn:ode}. A solution to \eqref{eqn:ode} is a curve \(\gamma:I\rightarrow\real^n\) such that \[ \gamma'(t) = F(\gamma(t)) \] where \(I\subset\real\) is an interval, possibly infinite. If \(F\) is defined
Let \(U\subset\real^n\) be an open set and let \(F:U\rightarrow\real^n\) be a differentiable function with a continuous derivative
High-Order Derivatives
In this section, we consider high-order derivatives of a differentiable mapping \(F:U\subset\real^n\rightarrow\real^m\). To do this, we will need to make an excursion into the world of multilinear algebra. Even though we will discuss high-order derivatives for functions on Euclidean spaces, it will be convenient to first work with general vector spaces.
Let \(V_1,V_2,\ldots,V_k\) and \(W\) be vector spaces. A mapping \( T :V_1\times V_2\times\cdots\times V_k\rightarrow W\) is said to be a \(k\)-multilinear map if \( T \) is linear in each variable separately. Specifically, for any \(i\in\{1,2,\ldots,k\}\), and any \(v_j \in V_j\) for \(j\neq i\), the mapping \( T _i : V_i \rightarrow W\) defined by
\[
T _i (x) = T (v_1,v_2,\ldots,v_{i-1}, x, v_{i+1}, \ldots, v_k)
\]
is a linear mapping.
A \(1\)-multilinear mapping is just a linear mapping. A \(2\)-multilinear mapping is called a bilinear mapping. Hence, \( T :V_1\times V_2\rightarrow W\) is bilinear if
\begin{align*}
T (\alpha u+\beta v,w) &= T (\alpha u, w) + T (\beta v,w)\\
& = \alpha T (u,w) + \beta T (v,w)
\end{align*}
and
\begin{align*}
T (u,\alpha w+\beta y) &= T (u,\alpha w) + T (u,\beta y) \\
&= \alpha T (u,w) + \beta T (u,y)
\end{align*}
for all \(u,v\in V_1\), \(w\in V_2\), and \(\alpha,\beta\in\real\). Roughly speaking, a multilinear mapping is essentially a special type of polynomial multivariable function. We will make this precise after presenting a few examples.
Consider \( T :\real\times\real\rightarrow\real\) defined as \( T (x,y) = 2xy\). As can be easily verified, \( T \) is bilinear. On the other hand, if \( T (x,y) = x^2 + y^2\) then \( T \) is not bilinear since for example \( T (\alpha x, y) = \alpha^2 x^2 + y^2 \neq \alpha T (x,y)\) in general, or \( T (a+b, y) = (a+b)^2 + y^2 \neq T (a,y) + T (b,y)\) in general. What about \(T(x,y) = 2xy + y^3\)?
Let \(\{v_1,v_2,\ldots,v_p\}\) be a set of vectors in \(\real^n\) and suppose that \(x=\sum_{i=1}^p x_i v_i\) and \(y=\sum_{i=1}^p y_i v_i\). If \(T:\real^n\times\real^n\rightarrow\real^m\) is bilinear then expand \(T(x,y)\) so that it depends only on \(x_i, y_j\) and \(T(v_i, v_j)\) for \(1\leq i,j\leq p\).
Let \(M\) be a \(n\times n\) matrix and define \( T :\real^n\times\real^n\rightarrow\real\) as \( T (u,v) = u^T M v\). Show that \(T\) is bilinear. For instance, if say \(M=\left[\begin{smallmatrix}1&-3\\2&1\end{smallmatrix}\right]\) then
\begin{align*}
T (u,v) &= [u_1\;u_2] \begin{bmatrix}1&-3\\0&1\end{bmatrix} \begin{bmatrix}v_1\\v_2\end{bmatrix} \\
&= u_1v_1 - 3u_1v_2 +u_2v_2.
\end{align*}
Notice that \(T(u,v)\) is a polynomial in the components of \(u\) and \(v\).
The function that returns the determinant of a matrix is multilinear in the columns of the matrix. Specifically, if say \(A=[a_1+b_1\; a_2\; \cdots \; a_n]\in\real^{n\times n}\) then
\[
\det(A) = \det([a_1\; a_2\; \cdots \; a_n]) + \det([b_1\; a_2\; \cdots \; a_n])
\]
and if \(A=[\alpha a_1 \; a_2 \; \cdots \; a_n]\) then
\[
\det(A) = \alpha \det([a_1\; a_2\; \cdots \; a_n]).
\]
These facts are proved by expanding the determinant along the first column. The same is true if we perform the same computation with a different column of \(A\). In the case of a \(2\times 2\) matrix \(A=\left[\begin{smallmatrix} x_1 & y_1\\ x_2 & y_2\end{smallmatrix}\right]\) we have
\[
\det(A) = x_1 y_2 - y_1 x_2
\]
and if \(A\) is a \(3\times 3\) matrix with columns \(x=(x_1,x_2,x_3)\), \(y=(y_1,y_2,y_3)\), and \(z=(z_1,z_2,z_3)\) then
\begin{align*}
\det(A) &= \det([x\; y\; z])\\
& = x_{{1}}y_{{2}}z_{{3}}-x_{{1}}y_{{3}}z_{{2}}-x_{{2}}y_{{1}}z_{{3}}+x_{{
2}}y_{{3}}z_{{1}}+x_{{3}}y_{{1}}z_{{2}}-x_{{3}}y_{{2}}z_{{1}}.
\end{align*}
We now make precise the statement that a multilinear mapping is a (special type of) multivariable polynomial function. For simplicity, and since this will be the case when we consider high-order derivatives, we consider \(k\)-multilinear mappings \( T :\real^n\times \real^n\times\cdots \times \real^n\rightarrow \real^m\). For a positive integer \(k\geq 1\) let \((\real^n)^k = \real^n\times\real^n\times\cdots\times\real^n\) where on the right-hand-side \(\real^n\) appears \(k\)-times. Let \(\mathcal{L}^k(\real^n,\real^m)\) denote the space of \(k\)-multilinear maps from \((\real^n)^k\) to \(\real^m\). It is easy to see that \(\mathcal{L}^k(\real^n,\real^m)\) is a vector space under the natural notion of addition and scalar \(\real\)-multiplication. In what follows we consider the case \(k=3\), the general case is similar but requries more notation. Hence, suppose that \(T:(\real^n)^3\rightarrow\real^m\) is a multilinear mapping and let \(x=(x_1,x_2,\ldots,x_n)\), \(y=(y_1,y_2,\ldots,y_n)\), and \(z=(z_1,z_2,\ldots,z_n)\). Then \(x=\sum_{i=1}^n x_i e_i\) where \(e_i\) is the \(i\)th standard basis vector of \(\real^n\), and similarly for \(y\) and \(z\). Therefore, by multilinearity of \(T\) we have
\begin{align*}
T (x,y,z) &= T \left( \sum_{i=1}^n x_i e_i, \sum_{i=1}^n y_i e_i, \ldots, \sum_{i=1}^n z_i e_i \right) \\[2ex]
&= \sum_{i=1}^n \sum_{j=1}^n \sum_{k=1}^n x_{i}y_{j} z_k \cdot T (e_{i},e_{j},e_{k}).
\end{align*}
Thus, to compute \( T (x,y,z)\) for any \(x,y,z\in\real^n\), we need only know the values \( T (e_i,e_j,e_k)\in\real^m\) for all triples \((i,j,k)\) with \(1\leq i,j,k\leq n\). If we set
\[
T (e_i,e_j,e_k) = (A^1_{i,j,k}, A^2_{i,j,k}, \ldots, A^m_{i,j,k})
\]
where the superscripts are not exponents but indices, then from our computation above
\[
T (x,y,z) =
\begin{bmatrix}
\displaystyle\sum_{i,j,k=1}^n A^1_{i,j,k}\cdot x_{i}y_{j} z_k\\[2ex]
\displaystyle\sum_{i,j,k=1}^n A^2_{i,j,k}\cdot x_{i}y_{j} z_k\\[2ex]
\vdots \\[2ex]
\displaystyle\sum_{i,j,k=1}^n A^m_{i,j,k} \cdot x_{i}y_{j} z_k
\end{bmatrix}.
\]
Notice that the component functions of \( T \) are multilinear, specifically, the mapping
\[
(x,y,z)\mapsto T_r(x,y,z) = \sum_{i,j,k=1}^n A^r_{i,j,k} \cdot x_{i}y_{j} z_k
\]
is multilinear for each \(r=1,2,\ldots,m\). The \(n^3 m\) numbers \(A^r_{i,j,k}\in\real\) for \(1\leq i,j,k\leq n\) and \(1\leq r\leq m\) completely determine the multilinear mapping \(T\), and we call these the coefficients of the multilinear mapping \(T\) in the standard bases.
The general case \(k\geq 1\) is just more notation. If \(T:(\real^n)^k \rightarrow \real^m\) is \(k\)-multilinear then there exists \(n^km\) unique coefficients \(A^r_{i_1,i_2,\ldots,i_k}\), where \(1\leq i_1,i_2,\ldots,i_k\leq n\) and \(1\leq r\leq m\), such that for any vectors \(u_1, u_2, \ldots, u_k \in \real^n\) it holds that
\[
T(u_1, u_2, \ldots, u_k) = \sum_{r=1}^m \left( \sum_{i_1=1}^n \sum_{i_2=1}^n\cdots\sum_{i_k=1}^n A^r_{i_1,i_2,\ldots,i_k}\cdot u_{1,i_1}u_{2,i_2}\cdots u_{k,i_k}\right) e_r
\]
where \(e_1,e_2,\ldots,e_m\) are the standard basis vectors in \(\real^m\).
A multilinear mapping \(T\in \mathcal{L}^k(\real^n,\real^m)\) is said to be symmetric if the value of \(T\) is unchanged after an arbitrary permutation of the inputs to \(T\). In other words, \(T\) is symmetric if for any \(v_1,v_2,\ldots,v_k\in\real^n\) it holds that
\[
T(v_1,v_2,\ldots,v_k) = T(v_{\sigma(1)}, v_{\sigma(2)}, \ldots, v_{\sigma(n)})
\]
for any permutation \(\sigma:\{1,2,\ldots,n\}\rightarrow\{1,2,\ldots,n\}\). For instance, if \(T:(\real^n)^3\rightarrow\real^m\) is symmetric then for any \(u_1,u_2,u_3\in\real^n\) it holds that
\begin{align*}
T(u_1,u_2,u_3) &=T(u_1,u_3,u_2)\\
&=T(u_2,u_1,u_3)\\
&=T(u_2,u_3,u_2)\\
&=T(u_3,u_1,u_2,)\\
&=T(u_3,u_2,u_1).
\end{align*}
Consider \(T:\real^2\times\real^2\rightarrow\real\) defined by
\[
T(x,y) = 2x_1y_1+3x_1y_2+3y_1x_2-x_2y_2.
\]
Then
\begin{align*}
T(y, x) &= 2y_1x_1 + 3y_1x_2 + 3x_1y_2 - y_2x_2\\ &= T(x,y)
\end{align*}
and therefore \(T\) is symmetric. Notice that
\begin{align*}
T(x,y) &= [x_1\; x_2] \begin{bmatrix}2&3\\3&-1\end{bmatrix} \begin{bmatrix}y_1\\y_2\end{bmatrix}\\
&=x^T M y
\end{align*}
and the matrix \(M=\left[\begin{smallmatrix}2&3\\3&-1\end{smallmatrix}\right]\) is symmetric.
Having introduced the very basics of multilinear mappings, we can proceed with discussing high-order derivatives of vector-valued multivariable functions. Suppose then that \(F:U\rightarrow\real\) is differentiable on the open set \(U\subset\real^n\) and as usual let \(DF:U\rightarrow \mathcal{L}(\real^n,\real^m)\) denote the derivative. Now \(\mathcal{L}(\real^n,\real^m)\) is a finite dimensional vector space and can be equipped with a norm (all norms on a given finite dimensional vector space are equivalent). Thus, we can speak of differentiability of \(DF\), namely, \(DF\) is differentiable at \(a\in U\) if there exists a linear mapping \(L:\real^n \rightarrow \mathcal{L}(\real^n,\real^m)\) such that
\[
\lim_{x\rightarrow a} \frac{\norm{DF(x) - DF(a) - L (x-a)}}{\norm{x-a}} = 0.
\]
If such an \(L\) exists then we denote it by \(L=D(DF)(a)\). To simplify the notation, we write instead \(D(DF)(a) = D^2F(a)\). Hence, \(DF\) is differentiable at \(a\in U\) if there exists a linear mapping \(D^2 F(a):\real^n\rightarrow\mathcal{L}(\real^n,\real^m))\) such that
\[
\lim_{x\rightarrow a} \frac{\norm{DF(x) - DF(a) - D^2F(a) (x-a)}}{\norm{x-a}} = 0.
\]
To say that \(D^2F(a)\) is a linear mapping from \(\real^n\) to \(\mathcal{L}(\real^n,\real^m)\) is to say that
\[
D^2F(a) \in \mathcal{L}(\real^n,\mathcal{L}(\real^n,\real^m)).
\]
Let us focus our attention on the space \(\mathcal{L}(\real^n,\mathcal{L}(\real^n,\real^m))\). If \(L \in \mathcal{L}(\real^n,\mathcal{L}(\real^n,\real^m))\) then \(L(v) \in \mathcal{L}(\real^n,\real^m)\) for each \(v\in \real^n\), and moreover the assignment \(v\mapsto L(v)\) is linear, i.e., \(L(\alpha v + \beta u) = \alpha L(v) + \beta L(u)\). Now, since \(L(v)\in \mathcal{L}(\real^n,\real^m)\), we have that
\[
L(v)(\alpha u + \beta w) = \alpha L(v)(u) + \beta L(v)(w).
\]
In other words, the mapping
\[
(u,v) \mapsto L(u)(v)
\]
is bilinear! Hence, \(L\) defines (uniquely) a bilinear map \(T:\real^n\times \real^n\rightarrow\real^m\) by
\[
T(u,v) = L(u)(v)
\]
and the assignment \(L\mapsto T\) is linear. Conversely, to any bilinear map \(T:\real^n\times\real^n\rightarrow\real^m\) we associate an element \(L\in \mathcal{L}(\real^n,\mathcal{L}(\real^n,\real^m))\) defined as
\[
L(u)(v) = T(u,v)
\]
and the assignment \(T\mapsto L\) is linear. We have therefore proved the following.
Let \(V\) and \(W\) be vector spaces. The vector space \(\mathcal{L}(V,\mathcal{L}(V,W))\) is isomorphic to the vector space \(\mathcal{L}^2(V,W)\) of multilinear maps from \(V\times V\) to \(W\).
The punchline is that \(D^2F(a)\in\mathcal{L}(\real^n,\mathcal{L}(\real^n,\real^m))\) can be viewed in a natural way as a bilinear mapping \(D^2F(a):\real^n\times\real^n\rightarrow\real^m\) and thus from now on we write \(D^2F(a)(u,v)\) instead of the more cumbersome \(D^2F(a)(u)(v)\).
We now determine a coordinate expression for \(D^2F(a)(u,v)\). First of all, if \(F=(f_1,f_2,\ldots,f_m)\) then \(F(x) = \sum_{j=1}^m f_j(x) e_j\) where \(\{e_1,e_2,\) \(\ldots,e_m\}\) is the standard basis of \(\real^m\). By linearity of the derivative and the product rule of differentiation, we have that \(DF=\sum_{j=1}^m Df_j(x) e_j\) and also \(D^2F = \sum_{j=1}^m D^2f_j(x) e_j\). Therefore,
\[
D^2F(a)(u,v) = \sum_{j=1}^m D^2f_j(a)(u,v) e_j.
\]
This shows that we need only consider \(D^2f\) for \(\real\)-valued functions \(f:U\subset\real^n\rightarrow\real\). Now,
\[
Df = \begin{bmatrix}\pfrac{f}{x_1} & \pfrac{f}{x_2} & \ldots, \pfrac{f}{x_n}\end{bmatrix}
\]
and thus the Jacobian of \(Df:U\rightarrow\real^n\) is (Theorem 10.1.2)
\[
D^2f = \begin{bmatrix}
\pfrac{^2f}{x_1x_1} & \pfrac{^2f}{x_2x_1} & \cdots & \pfrac{^2f}{x_nx_1} \\[2ex]
\pfrac{^2f}{x_1x_2} & \pfrac{^2f}{x_2x_2} & \cdots & \pfrac{^2f}{x_nx_2} \\[2ex]
\vdots & \vdots & \ddots & \vdots \\[2ex]
\pfrac{^2f}{x_1x_n} & \pfrac{^2f}{x_2x_n} & \cdots & \pfrac{^2f}{x_nx_n}
\end{bmatrix}.
\]
Therefore,
\[
D^2f(a)(e_i, e_j) = \pfrac{^2f}{x_j x_i}(a).
\]
Therefore, for any \(u=(u_1,u_2,\ldots,u_n)\) and \(v=(v_1,v_2,\ldots,v_n)\), by multilinearity we have
\[
D^2f(a)(u,v) = \sum_{i=1}^n\sum_{j=1}^n \pfrac{^2f}{x_ix_j}(a) u_i v_j.
\]
Now, if all second order partials of \(f\) are defined and continuous on \(U\) we can say more. Let us first introduce some terminology. We say that \(f:U\subset\real^n\rightarrow\real\) is of class \(C^k\) if all partial derivatives up to and including order \(k\) of \(f\) are continuous functions on \(U\).
Let \(U\subset\real^n\) be an open set and suppose that \(f:U\rightarrow\real\) is of class \(C^2\). Then
\[
\pfrac{^2f}{x_i x_j} = \pfrac{^2f}{x_jx_i}
\]
on \(U\) for all \(1\leq i,j\leq n\). Consequently, \(D^2f(a)\) is a symmetric bilinear map on \(\real^n\times\real^n\).
If we now go back to a multi-valued function \(F:U\rightarrow\real^m\) with components \(F=(f_1,f_2,\ldots,f_m)\), then if \(D^2F(a)\) exists at \(a\in U\) then
\[
D^2F(a)(u,v) = \begin{bmatrix} \sum_{i,j=1}^n \pfrac{^2f_1}{x_ix_j}(a) u_i v_j \\[2ex]
\sum_{i,j=1}^n \pfrac{^2f_2}{x_ix_j}(a) u_i v_j \\[2ex]
\vdots \\[2ex]
\sum_{i,j=1}^n \pfrac{^2f_m}{x_ix_j}(a) u_i v_j\end{bmatrix}
\]
Higher-order derivatives of \(F:U\rightarrow\real^m\) can be treated similarly. If \(D^{k-1}F:U\rightarrow\mathcal{L}^{k-1}(\real^n,\real^m)\) is differentiable at \(a\in U\) then we denote the derivative at \(a\) by \(D(D^{k-1})F(a)=D^{k}F(a)\). Then \(D^kF(a):\real^n\rightarrow \mathcal{L}^{k-1}(\real^n,\real^m)\) is a linear map, that is,
\[
D^{k}F(a) \in \mathcal{L}(\real^n, \mathcal{L}^{k-1}(\real^n,\real^m)).
\]
The vector space \(\mathcal{L}(\real^n, \mathcal{L}^{k-1}(\real^n,\real^m))\) is isomorphic to the space of \(k\)-multilinear maps \(\mathcal{L}^{k}(\real^n, \real^m)\). The value of \(D^{k}F(a)\) at \(u_1,u_2,\ldots,u_{k}\in\real^n\) is denoted by \(D^{k}F(a)(u_1,u_2,\ldots,u_{k})\). Moreover, \(D^kF(a)\) is a symmetric \(k\)-multilinear map at each \(a\in U\) if \(F\) is of class \(C^k\). If \(f:U\subset\real^n\rightarrow\real\) is of class \(C^k\) then for vectors \(u_1,u_2,\ldots,u_k\in \real^n\) we have
\[
D^k f(a) (u_1, u_2, \ldots, u_k) = \sum_{1\leq i_1,i_2,\ldots,i_k\leq n} \pfrac{^k f}{x_{i_1}\partial x_{i_2}\cdots\partial x_{i_k}} (a)u_{1,i_1}u_{2,i_2}\cdots u_{k,i_k}
\]
where the summation is over all \(k\)-tuples \((i_1,i_2,\ldots,i_k)\) where \(i_j \in \{1,2,\ldots,n\}\). Hence, there are \(n^k\) terms in the above summation. In the case that \(u_1=u_2=\cdots=u_k=x\), the above expression takes the form
\[
D^k f(a)(x,x,\ldots,x) = \sum_{1\leq i_1,i_2,\ldots,i_k\leq n} \pfrac{^k f}{x_{i_1}\partial x_{i_2}\cdots\partial x_{i_k}} (a)x_{i_1}x_{i_2}\cdots x_{i_k}
\]
Compute \(D^3f(a)(u,v,w)\) if \(f(x,y) = \sin(x-2y)\), \(a=(0,0)\), and \(u,v,w\in\real^2\). Also compute \(D^2f(a)(u,u,u)\).
We compute that \(f(0,0) = 0\) and
\begin{align*}
f_x &= \cos(x-2y)\\
f_y &= -2\cos(x-2y)
\end{align*}
and then
\begin{align*}
f_{xx} &= -\sin(x-2y) \\
f_{xy} &= f_{yx} = 2\sin(x-2y) \\
f_{yy} &= -4\sin(x-2y)
\end{align*}
and then
\begin{align*}
f_{xxx} &= -\cos(x-2y)\\
f_{yyy} &= 8\cos(x-2y) \\
f_{xxy}&=f_{xyx}=f_{yxx} = 2\cos(x-2y)\\
f_{xyy}&=f_{yxy}=f_{yyx} = -4\cos(x-2y)
\end{align*}
Then,
\begin{align*}
D^3f(a)(u,v,w) &= f_{xxx}(a) u_1v_1w_1 + f_{xxy}(a) u_1v_1w_2 + f_{xyx}(a) u_1v_2w_1\\
& + f_{xyy}(a)u_1v_2w_2 + f_{yxx}(a)u_2v_1w_1 + f_{yxy}(a)u_2v_1w_2\\
& + f_{yyx}(a) u_2v_2w_1 + f_{yyy}(a) u_2v_2w_2\\[2ex]
&= -u_1v_1w_1 + 2u_1v_1w_2 + 2u_1v_2w_1-4u_1v_2w_2 \\
& + 2u_2v_1w_1 -4u_2v_1w_2-4u_2v_2w_1 + 8 u_2v_2w_2\\[2ex]
&= -u_1v_1w_1 + 2(u_1v_1w_2+u_1v_2w_1+u_2v_1w_1) \\
& - 4(u_1v_2w_2+u_2v_1w_2+u_2v_2w_1)+8u_2v_2w_2
\end{align*}
If \(u=v=w\) then
\[
D^3f(a)(u,u,u) = -u_1^3 + 6u_1^2u_2 - 12 u_1u_2^2 + 8u_2^3
\]
Taylor's Theorem
Taylor's theorem for a function \(f:\real^n\rightarrow\real\) is as follows.
Let \(U\subset\real^n\) be an open set and suppose that \(f:U\rightarrow\real\) if of class \(C^{r+1}\) on \(U\). Let \(a\in U\) and suppose that the line segment between \(a\) and \(x\in U\) lies entirely in \(U\). Then there exists \(c\in U\) on the line segment such that
\[
f(x) = f(a) + \sum_{k=1}^r \frac{1}{k!} D^k f(a) (x-a, x-a, \ldots, x-a) + R_r(x)
\]
where
\[
R_r(x) = \frac{1}{(r+1)!} D^{r+1}f(c)(x-a,x-a,\ldots,x-a).
\]
Furthermore,
\[
\lim_{x\rightarrow a} \frac{R_r(x)}{\norm{x-a}^r} = 0
\]
If \(x=a+h\) in Taylor's theorem then
\begin{align*}
f(a+h) &= f(a) + \sum_{k=1}^r \frac{1}{k!} D^k f(a) (h, h, \ldots, h) \\
& + \frac{1}{(r+1)!} D^{r+1}f(c)(h,h,\ldots,h)
\end{align*}
and
\[
\lim_{h\rightarrow 0} \frac{R_r(h)}{\norm{h}^r} = 0.
\]
We call
\[
T_r(x) = f(a) + \sum_{k=1}^r\frac{1}{k!} D^kf(a) (x-a, x-a, \ldots, x-a)
\]
the \(r\)th order Taylor polynomial of \(f\) centered at \(a\) and
\[
R_r(x) = \frac{1}{(r+1)!} D^{r+1}f(c)(x-a,x-a,\ldots,x-a)
\]
the \(r\)th order remainder term. Hence, Taylor's theorem says that
\[
f(x) = T_r(x) + R_r(x)
\]
Since \(\lim_{x\rightarrow a}R_r(x) = 0\), for \(x\) close to \(a\) we get an approximation
\[
f(x) \approx T_r(x).
\]
Moreover, since \(D^{r+1}f\) is continuous, there is a constant \(M \gt 0\) such that if \(x\) is sufficiently close to \(a\) then the remainder term satisfies the bound
\[
|R_{r}(x)| \leq M \norm{x-a}^{r+1}.
\]
From this it follows that
\[
\lim_{x\rightarrow a} \frac{R_r(x)}{\norm{x-a}^r} = 0
\]
Compute the third-order Taylor polynomial of \(f(x,y)=\sin(x-2y)\) centered at \(a=(0,0)\).
Most of the work has been done in Example 10.5.10. Evaluating all derivatives at \(a\) we find that
\begin{align*}
Df(a)(u) &= f_x(a) u_1 + f_y(a) u_2 = u_1 - 2u_2\\[2ex]
D^2f(a)(u,u) &= 0\\[2ex]
D^3f(a)(u,u,u) &= -u_1^3 + 6u_1^2u_2 - 12 u_1u_2^2 + 8u_2^3
\end{align*}
Therefore,
\[
T_r(u) = u_1 - 2u_2 -u_1^3 + 6u_1^2u_2 - 12 u_1u_2^2 + 8u_2^3.
\]
Exercises
Find the 2nd order Taylor polynomial of the function \(f(x,y,z) = \cos(x+2y)e^z\) centered at \(a=(0,0,0)\).
A function \(L:\real^n\rightarrow\real\) is called a homogeneous function of degree \(k\in\mathbb{N}\) if for all \(\alpha\in\real\) and \(x\in \real^n\) it holds that \(L(\alpha x) = \alpha^k L(x)\). Prove that if \(f:\real^n\rightarrow\real\) is differentiable at \(a\in\real^n\) then the mapping
\[
L(x) = D^kf(a)(x,x,\ldots,x)
\]
is a homogeneous function of degree \(k\in\mathbb{N}\).
The Inverse Function Theorem
A square linear system \begin{align*} a_{1,1}x_1 + a_{1,2}x_2 + \cdots + a_{1,n} x_n &= y_1\\ a_{2,1}x_1 + a_{2,2}x_2 + \cdots + a_{2,n} x_n &= y_2\\ \vdots\hspace{3cm}\vdots\hspace{1cm} &= \;\vdots \\ a_{n,1}x_1 + a_{n,2}x_2 + \cdots + a_{n,n} x_n &= y_n\\ \end{align*} or in vector form \[ Ax = y, \] where the unknown is \(x=(x_1,x_2,\ldots,x_n)\in\real^n\), has a unique solution if and only if \(A^{-1}\) exists if and only if \(\det(A)\neq 0\). In this case, the solution is \(y=A^{-1}x\). Another way to say this is that the mapping \(F(x) = Ax\) has a global inverse given by \(F^{-1}(x) = A^{-1}x\). Hence, invertibility of \(DF = A\) completely determines whether \(F\) is invertible. Consider now a system of equations \[ F(x) = y \] where \(F:\real^n\rightarrow\real^n\) is nonlinear. When is it possible to solve for \(x\) in terms of \(y\), that is, when does \(F^{-1}(x)\) exists? In general, this is a difficult problem and we cannot expect global invertibility even when assuming the most desirable conditions on \(F\). Even in the 1D case, we cannot expect global invertibility. For instance, \(f(x) = \cos(x)\) is not globally invertible but is so on any interval where \(f'(x)\neq 0\). For instance, on the interval \(I=(0,\pi)\), we have that \(f'(x)=\sin(x)\neq 0\) and \(f^{-1}(x) = \arcsin(x)\). In any neighborhood where \(f'(x) = 0\), for instance, at \(x=0\), \(f(x)=\cos(x)\) is not invertible. However, having a non-zero derivative is not necessary for invertibility. For instance, the function \(f(x)=x^3\) has \(f'(0) = 0\) but \(f(x)\) has an inverse locally around \(x=0\); in fact it has a global inverse \(f^{-1}(x) = x^{1/3}\). Let's go back to the 1D case and see if we can say something about the invertibility of \(f:\real\rightarrow\real\) locally about a point \(a\) such that \(f'(a)\neq 0\). Assume that \(f'\) is continuous on \(\real\) (or on an open set containing \(a\)). Then there is an interval \(I=[a-\delta,a+\delta]\) such that \(f'(x) \neq 0\) for all \(x\in I\). Now if \(x,y\in I\) and \(x\neq y\), then by the Mean Value Theorem, there exists \(c\) in between \(x\) and \(y\) such that \[ f(y) - f(x) = f'(c) (y-x). \] Since \(f'(c)\neq 0\) and \((y-x)\neq 0\) then \(f(y) \neq f(x)\). Hence, if \(x\neq y\) then \(f(y)\neq f(x)\) and this proves that \(f\) is injective on \(I=[c-\delta,c+\delta]\). Therefore, the function \(f:I\rightarrow\real\) has an inverse \(f^{-1}: J \rightarrow\real\) where \(J=f(I)\). Hence, if \(f(a)\neq 0\), \(f\) has a local inverse at \(a\). In fact, we can say even more, namely, one can show that \(f^{-1}\) is also differentiable. Then, since \(f^{-1}(f(x)) = x\) for \(x\in I\), by the chain rule we have \[ (f^{-1})'(f(x))\cdot f'(x) = 1 \] and therefore since \(f'(x)\neq 0\) for all \(x\in I\) we have \[ (f^{-1})'(f(x)) = \frac{1}{f'(x)}. \] The following theorem is a generalization of this idea.
Let \(V\subset\real^n\) be an open set and let \(F:V\rightarrow\real^n\) be of class \(C^1\). Suppose that \(\det(DF(a))\neq 0\) for \(a\in V\). Then there exists an open set \(U\subset\real^n\) containing \(a\) such that \(W=F(U)\) is open and \(F:U\rightarrow W\) is invertible. Moreover, the inverse function \(F^{-1}:W\rightarrow U\) is also \(C^1\) and for \(y\in W\) and \(x=F^{-1}(y)\) we have
\[
DF^{-1}(y) = \left[ DF(x) \right]^{-1}.
\]
Prove that \(F(x,y) = (f_1(x,y), f_2(x,y)) = (x^2-y^2, 2xy)\) is locally invertible at all points \(a\neq (0,0)\).
Clearly, \(DF(x,y)\) exists for all \((x,y)\) since all partials of the components of \(F\) are continuous on \(\real^2\). A direct computation gives
\[
DF(x,y) = \begin{bmatrix} 2x & -2y \\[2ex] 2y & 2x \end{bmatrix}
\]
and thus \(\det(DF(x,y)) = 2x^2 + 2y^2\). Clearly, \(\det(DF(x,y)) = 0\) if and only if \((x,y)=(0,0)\). Therefore, by the Inverse Function theorem, for each non-zero \(a\in\real^2\) there exists an open set \(U\subset\real^2\) containing \(a\) such that \(F:U\rightarrow F(U)\) is invertible. In this very special case, we can find the local inverse of \(F\) about some \(a\in \real^2\). Let \((u,v) = F(x,y)\), that is,
\begin{align*}
x^2-y^2 &= u\\
2xy &= v
\end{align*}
If \(x\neq 0\) then \(y = \frac{v}{2x}\) and therefore \(x^2 - \frac{v^2}{4x^2} = u\) and therefore \(4x^4 - v^2 = 4ux^2\) or
\[
4x^4 - 4ux^2 - v^2=0.
\]
By the quadratic formula,
\[
x^2 = \frac{4u \pm \sqrt{16u^2 + 16v^2}}{8}
\]
Since \(x\in\real\) we must take
\begin{align*}
x &= \sqrt{\frac{4u + \sqrt{16u^2 + 16v^2}}{8} }\\
& = \sqrt{\frac{u + \sqrt{u^2+v^2}}{2}}
\end{align*}
and therefore
\[
y = \frac{v}{2x} = \frac{\sqrt{2} v}{2 \sqrt{u+\sqrt{u^2+v^2}}}
\]
Hence, provided \(u\neq 0\) and \(v\neq 0\) then
\[
F^{-1}(u,v) = \begin{bmatrix} \sqrt{\frac{u + \sqrt{u^2+v^2}}{2}} \\[2ex] \frac{\sqrt{2} v}{2 \sqrt{u+\sqrt{u^2+v^2}}} \end{bmatrix}.
\]
Exercises
Let \(F:\real^2\rightarrow\real^2\) be defined by
\[
F(x,y) = (f_1(x,y), f_2(x,y)) = (e^x\cos(y), e^x\sin(y))
\]
for \((x,y)\in\real^2\).
- Prove that the range of \(F\) is \(\real^2\backslash\hspace{-0.3em}\{0\}\). Hint: Think polar coordinates.
- Prove that \(F\) is not injective.
- Prove that \(F\) is locally invertible at every \(a\in \real^2\).
Can the system of equations
\begin{align*}
x+xyz &= u\\
y+xy &= v \\
z+2x+3z^2 &= w
\end{align*}
be solved for \(x,y,z\) in terms of \(u,v,w\) near \((0,0,0)\)?
- Prove that the range of \(F\) is \(\real^2\backslash\hspace{-0.3em}\{0\}\). Hint: Think polar coordinates.
- Prove that \(F\) is not injective.
- Prove that \(F\) is locally invertible at every \(a\in \real^2\).