9. Multivariable Calculus Refresher
≪ 8. Convolutions | Table of Contents | 10. The Inverse Function Theorem and More ≫Our next topic in lecture is going to be multivariable (Riemann) integration, which we’ll set up as iterated Riemann integrals over a rectangular domain. Hopefully, by the end of the quarter, we’ll be able to prove some useful results such as the change of variables formula, and we might discuss finding the surface area of a \(C^1\) hypersurface (whatever that means).
Today’s discussion will be a refresher of some things we know about \(C^1\) functions (continuously differentiable functions), particularly when the domain and codomain are \(\mathbb{R}^n\) instead of \(\mathbb{R}\). A handful of tools from single-variable calculus will be available to us (perhaps in a more general setting), but some things that we take for granted will be tossed out the window.
If there’s one thing you take away from today’s discussion, it should be that differentiable functions ought to look linear. Hence, a lot of behaviours we’d expect from a differentiable function should be “inherited” from the properties of linear functions that we know and love.
Definitions
If you were with us last quarter, we saw the following definition of the derivative:
Definition 1. Fréchet Derivatives
Let \(f: \mathbb{R}^n\to \mathbb{R}^m\) be a function. \(f\) is Fréchet differentiable at \(\vec x_0\in \mathbb{R}^n\) if there exists some linear function \(T: \mathbb{R}^n\to \mathbb{R}^m\) such that \[\lim _{\vec h\to 0} \frac{\left\lVert f\left( \vec x_0+\vec h \right) - f\left( \vec x_0 \right) - T\vec h \right\rVert}{ \left\lVert \vec h \right\rVert} = 0. \] \(T\) is called the full derivative or Fréchet derivative of \(f\) at \(x_0\), and it is denoted \(\left. Df\right\rvert_{x_0}\).
This is a scary definition, but in essence what it’s saying is \[f\left( \vec x_0+\vec h \right)\approx f\left( \vec x_0 \right)+ \left. Df\right\rvert_{\vec x_0}\vec h\] whenever \(\vec h\) is quite small, and this is in a way the “best possible” linear approximation one could make. This directly generalises the “tangent line” notion of a derivative to higher dimensions — rather than a line, one gets a generalised linear function.
In practise, it can be hard to compute what this derivative really is, and I’m sure you are all far more familiar with the idea of a partial derivative or a directional derivative.
Definition 2. Partial Derivatives
Let \(f:\mathbb{R}^n\to \mathbb{R}\) be a function. The \(j\)-th partial derivative of \(f\) at \(\vec x_0\in \mathbb{R}^n\) is the quantity \[\frac{\partial f}{\partial x_j} \left( \vec x_0 \right) = \lim _{h\to 0} \frac{f\left( \vec x_0 + h \vec e_j \right) - f\left( \vec x_0 \right)}{h}.\] Here, \(\vec e_j\) is the \(j\)-th standard basis vector.
This is perhaps the more sensible definition to work with, and partial derivatives are very easy to compute. It’s also easy to generalise this to \(f:\mathbb{R}^n\to \mathbb{R}^m\), where one can take partial derivatives of the \(m\) different components of \(f\).
However, it must be said that these two definitions are strictly not equivalent. Fréchet differentiability is far stronger than having partial derivatives, or even having directional derivatives in any direction. While the former says that \(f\) must resemble a linear function near a point, partial differentiability only says that \(f\) must resemble a linear function on planar slices of its graph.
Exercise 3.
Show that if \(f:\mathbb{R}^n\to \mathbb{R}^m\) is Fréchet differentiable at a point, then all of its partial derivatives at that point exist too.
If one writes \(\left. Df\right\rvert_{x_0}\) as a matrix with respect to the standard basis, how are the entries of this matrix related to the partial derivatives of \(f\)? If one writes \(\left. DF\right\rvert_{x_0}\) as a matrix with respect to arbitrary bases, what do the entries represent?
Exercise 4.
Show that the function \(f:\mathbb{R}^2\to \mathbb{R}\) given by \[f\left( x, y \right) = \frac{2x^2y+y^3}{x^2+y^2}\] is continuous and has all first-order partial derivatives, but is not Fréchet differentiable at the origin.
What this second exercise demonstrates is that having partial derivatives does not guarantee the stronger mode of differentiability. In fact, this function is even more pathological in the sense that all directional derivatives exist at the origin!
As the relationship between the full derivative’s entries (as a matrix) and partial derivatives suggest, there is an additional constraint on the partial derivatives’ regularity that makes the converse true.
Theorem 5.
Suppose \(f:\mathbb{R}^n\to \mathbb{R}^m\) and \(\vec x_0\in \mathbb{R}^n\). Suppose there is an open neighbourhood \(U\) around \(\vec x_0\) such that all first-order partial derivatives of \(f\) exist on \(U\) and are continuous at \(\vec x_0\). Then \(f\) is Fréchet differentiable at \(\vec x_0\).
The idea here is to handle the components of \(f\) one at a time and treat only the case where \(m=1\). Then, the full derivative is just the gradient of \(f\). From there, you can either spam the mean value theorem or use the triangle inequality several times. I’ll leave these details as an exercise (this is not the focal point of today…).
The point is, as long as we assume that a multivariable function has continuous partial derivatives, then we can forget about all of these pathologies outlined above and get the best of both worlds. On one hand, we’ll have the robust full derivative (which came in very handy last quarter when we discussed the inverse function theorem, for instance); on the other hand, we’ll also be able to switch back to using partial derivatives, which will be far easier to work with (especially in the setting of iterated integrals).
This motivates the following definition that you have probably seen in lecture:
Definition 6.
Let \(U\subseteq \mathbb{R}^n\) be open. We define the set \(C^1\left( U \right)\) to be the set of functions \(f:U\to \mathbb{R}\) that have continuous partial derivatives.
Remark 7.
One advantage of this definition is that it removes any ambiguity about what “continuously differentiable” means for multivariable functions. One could formulate continuity of the full derivative by thinking of the derivative of \(f:\mathbb{R}^n\to \mathbb{R}^m\) as a function assigning to each point \(x_0\in \mathbb{R}^n\) a linear operator \(\mathbb{R}^n\to \mathbb{R}^m\). One can turn the set of linear functions \(\mathbb{R}^n\to \mathbb{R}^m\) into a metric space with the operator norm, from which continuity springs forth!
Some Familiar Theorems, Some Less Familiar
One of the nice things about working with full derivatives is that they lend themselves very well to the chain rule. You can definitely prove this using only partial derivatives and the single-variable chain rule if you really want to, but you’ll essentially be doing matrix multiplication by hand.
Proposition 8.
Let \(f:\mathbb{R}^n\to \mathbb{R}^m\) be a function that’s Fréchet differentiable at \(x_0\in \mathbb{R}^n\). Let \(g:\mathbb{R}^m\to \mathbb{R}^l\) be another function that’s Fréchet differentiable at \(f\left( x_0 \right)\). Then their composition \(g\circ f:\mathbb{R}^n\to \mathbb{R}^l\) is differentiable at \(x_0\), and \[\left. D(g\circ f)\right\rvert_{x_0} = \left. Dg\right\rvert_{f\left( x_0 \right)}\circ \left. Df\right\rvert_{x_0}.\]
I think I left this as an exercise last quarter. You can actually prove this directly from the definitions! When \(n=m=l=1\), this exactly recovers the single-variable chain rule that we’re all so familiar with.
This notation can be very confusing (and I am very sorry about that). Something that helps me is the knowledge that \(Df\) is a linear map from \(\mathbb{R}^n\to \mathbb{R}^m\), \(Dg\) is a linear map from \(\mathbb{R}^m\to \mathbb{R}^l\), and \(D(g\circ f)\) is a linear map from \(\mathbb{R}^n\to \mathbb{R}^l\). There’s only one way for all three of these guys to fit together in one picture! Something else that helps me is the following side-by-side diagram: \[\begin{align*} \mathbb{R}^n \xrightarrow{f} \mathbb{R}^m \xrightarrow{g} \mathbb{R}^l && \quad && \mathbb{R}^n \xrightarrow{Df} \mathbb{R}^m \xrightarrow{Dg} \mathbb{R}^l. \end{align*}\]
This is something that’s somewhat difficult for me to picture in my head. Perhaps you have a great multi-dimensional image for linear maps, but unfortunately I don’t have enough space in my head to conjure such imagery.
Let’s go back to that remark we started with: differentiable functions ought to behave like linear maps. Now this is true in some senses and completely false in other senses, so be careful with this intuition. However, there is at least one case in which it is amazing:
Theorem 9. Inverse Function Theorem
Let \(f:\mathbb{R}^n\to \mathbb{R}^n\) be a function. Let \(x_0\in \mathbb{R}^n\) and \(U\) an open neighbourhood of \(x_0\) such that
- \(f\) is differentiable on \(U\),
- \(f\) is continuously differentiable at \(x_0\), and
- \(\left. Df\right\rvert_{x_0}\) is nonsingular (invertible).
Then, there is an open neighbourhood \(V\) of \(x_0\) and an open neighbourhood \(W\) of \(f\left( x_0 \right)\) such that \(f:V\to W\) is a bijection and \(f ^{-1}: W\to V\) is differentiable at \(f\left( x_0 \right)\).
This is a dream come true. The third condition says (when combined with our heuristic) that \(f\) looks like an invertible linear function near \(x_0\). The conclusion of the theorem is that \(f\) itself is an invertible function near \(x_0\) — it “inherited” the invertibility from its derivative!
There are some final things to remark about multivariable calculus that might be a bit disappointing.
- The fundamental theorem of calculus does not have a simple analogue: we are not well-equipped to connect iterated integrals and full derivatives in this class. However, if you study differential geometry, you will find that Stokes’ Theorem gives a very nice multivariable generalisation.
- The product rule only gets a direct analogue when you take dot products. Integration by parts is far trickier, but it does get a multivariable version as a consequence of Stokes’ theorem.
- The mean value theorem is reduced to a mean value inequality: if \(f:\mathbb{R}^n\to \mathbb{R}^m\) is Fréchet differentiable and its derivative is bounded in magnitude (in some suitable sense) by a constant \(M\) everywhere, then \[\left\lVert f\left( x \right)-f\left( y \right) \right\rVert \leq M \left\lVert x-y \right\rVert.\] This become very delicate when the domain of \(f\) is not all of \(\mathbb{R}^n\), particularly if there are holes or obstacles!
- Taylor’s formula does have a multivariable form, but it’s far uglier and generally sees less use than its single variable sibling.