10. The Inverse Function Theorem and More
≪ 9. Multivariable Calculus Refresher | Table of Contents | 11. A Generalisation of the Implicit Function Theorem ≫Last week, we looked at two different generalisations of the derivative to multiple dimensions. Although partial derivatives are convenient and familiar both conceptually and computationally, we ultimately decided that the Frechét derivative was the better way to go. Again, the intuition is that a differentiable function is a “locally linear” function in a quantitative sense.
More specifically, let \(F: \mathbb{R}^n\to \mathbb{R}^m\) be a function that’s differentiable at some \(x_0\in \mathbb{R}^n\). Then, its derivative \(\left. DF\right\rvert_{x_0}\) is a linear function from \(\mathbb{R}^n\to \mathbb{R}^m\), and for \(h\in \mathbb{R}^n\) sufficiently small, one has \[F\left( x_0 + h \right) \approx F\left( x_0 \right)+\left. DF\right\rvert_{x_0}(h). \] The difference between the two is negligble.
This numerical estimate can be extended in a more significant way. Recall the \(1\)-dimensional change of variables formula. It says if \(u(x):\left[ a,b \right]\to \left[ c,d \right]\) is sufficiently nice (\(C^1\) with nonvanishing derivative is enough), then for any Riemann integrable \(f:\left[ c,d \right]\to \mathbb{R}\), one has \[\int _{c}^{d}f(t)\ dt =\int _{a}^{b}f\left( u(x) \right)u’(x) \ dx.\] Morally speaking, this is because \[\sum f(t) \Delta t \approx \sum f\left( u(x) \right) u’(x) \Delta x\] when you partition the intervals in the right way. A box of width \(\Delta t\) is distorted to a box of width approximately \(u’(x) \Delta x\). Draw a picture!
More broadly, you will prove in lecture that given two reasonable subsets \(U_1, U_2\subseteq \mathbb{R}^n\) and a diffeomorphism \(\varphi : U_1\to U_2\), one has for any integrable function \(f: U_2\to \mathbb{R}\) that \[\int _{U_2} f(y)\ dV = \int _{U_1} f\left( \varphi(x) \right) \cdot \det \left(\left. D\varphi\right\rvert_{x}\right) \ dV.\] In principle, \(\varphi\) scales the volume of a small box containing \(x\) by a factor of \(\det \left( \left. D\varphi\right\rvert_{x} \right)\), in much the same way \(u\) scaled the length of a small interval near \(x\) by a factor of \(u’(x)\). One can say that \(u\) and \(\varphi\) inherit the volume-scaling properties of their derivatives.
The next question to ask is, in what other ways does \(F\) resemble its derivative? There are some qualitative questions that make sense, such as:
- If \(\left. DF\right\rvert_{x_0}\) is injective, is \(F\) injective?
- If \(\left. DF\right\rvert_{x_0}\) is surjective, is \(F\) surjective?
- If \(\ker \left. DF\right\rvert_{x_0}\) is \(d\)-dimensional, is \(F ^{-1}\left( F\left(x_0\right) \right)\) also \(d\)-dimensional?
For the third question, it’s not immediately obvious what it means for the level sets of \(f\) to be \(d\)-dimensional, but there is some intuition for what \(d\)-dimensional subsets (as opposed to subspaces) are. For instance, the circle \(\left\lbrace (x, y)\in \mathbb{R}^2 : x^2+y^2=1 \right\rbrace\) appears to be a \(1\)-dimensional object, while the sphere \(\left\lbrace (x,y,z)\in \mathbb{R}^3 : x^2+y^2+z^2=1 \right\rbrace\) appears to be \(2\)-dimensional. We will revisit this question later.
The Inverse Function Theorem
Those of you that were here last quarter saw the inverse function theorem several times; these two problems provide a good look at which parts of the inverse function theorem hold when one drops the assumption of continuous derivatives. Let us state the theorem as most people know it:
Theorem 1. Inverse Function Theorem
Let \(U\subseteq \mathbb{R}^n\) be open, and let \(F:U\to \mathbb{R}^n\) be continuously differentiable on \(U\). Let \(x_0\in U\) such that \(\left. DF\right\rvert_{x_0}\) is nonsingular. Then, there exists an open neighbourhood \(V\) of \(x_0\) and an open neighbourhood \(W\) of \(F\left( x_0 \right)\) such that \(F\) is a bijection \(V\to W\), and its inverse is differentiable.
This is a somewhat complicated statement, but ultimately, this theorem boils down to: if a function locally looks invertible, it’s locally invertible.
I must point out that nonsingular means invertible. When \(n>1\), this is strictly different from having a nonzero derivative. In fact, there are many, many maps with singular but nonzero derivatives that are not locally invertible, such as \(F:\mathbb{R}^2\to \mathbb{R}^2\) via \(\left( x, y \right)\mapsto \left( x,0 \right)\).
Besides this all-too-common mistake, let’s highlight a few pitfalls of this theorem:
- When \(F\) is not continuously differentiable, the inverse function theorem fails. In the proof of the inverse function theorem, the continuity of the derivative is critical to ensuring local injectivity. However, \(F\) will continue to be locally surjective (see the aforementioned problems).
- The converse statement is only partially true. If \(\left. DF\right\rvert_{x_0}\) is singular, then it’s entirely possible for \(F\) to still have a local inverse near \(x_0\). However, this local inverse will not be differentiable.
- \(F\) generally will not be invertible on its entire domain (i.e., it need not be globally injective), even if its derivative is nonsingular everywhere. There are plenty of counterexamples to this fact; an example is \(F:\mathbb{R}\setminus \left\lbrace 0 \right\rbrace\to \mathbb{R}\) via \(x\mapsto \left\lvert x \right\rvert\). Draw a picture of what’s going on. (\(F\) can take multiple “sheets” of its domain to the same image.)
Exercise 2.
Show that the function \(f(x)=\frac{x}{2}+x^2\sin\left( \frac{1}{x} \right)\) for \(x\neq 0\) and \(f(0)=0\) is differentiable at \(x=0\) and that \(f’(0)=\frac{1}{2}\). Show that \(f\) is not injective on any neighbourhood of \(x=0\).
Exercise 3.
Let \(F:\mathbb{R}^n\to \mathbb{R}^n\) be a continuously differentiable function such that \(\left. DF\right\rvert_{x}\) is nonsingular for all \(x\in \mathbb{R}^n\). Show that \(F\) is an open map: that is, if \(U\subseteq \mathbb{R}^n\) is open, then \(F(U)\) is open.
Challenge: show that this is still true if \(F\) is differentiable but not necessarily continuously differentiable.
The Implicit Function Theorem
The inverse function addresses all three of the questions that were posed in a very special scenario, assuming continuous derivatives. Let’s return to the more general question of whether or not injectivity/surjectivity are “inherited” from a function’s derivatives.
One glaring issue with the inverse function theorem that I purposefully chose not to mention earlier is that the inverse function theorem only applies to functions whose domain and codomain have equal dimensions. Yet one might expect similar ideas to apply to the more general case. Suppose \(F:\mathbb{R}^n\to \mathbb{R}^m\) is differentiable at \(x_0\).
- If \(n < m\) and \(\left. DF\right\rvert_{x_0}\) is injective, should \(F\) be injective?
- If \(n > m\) and \(\left. DF\right\rvert_{x_0}\) is surjective, should \(F\) be surjective? Should the level sets of \(F\) be \(m\)-dimensional?
These seem entirely plausible, especially if one assumes a continuous derivative. Although the inverse function theorem does not apply out-of-the-box, one can actually adapt the proof of the theorem to these two scenarios! I’ll leave this as an exercise for the comitted student.
By the rank-nullity theorem, if \(n< m\), it’s impossible for \(\left. DF\right\rvert_{x_0}\) to be surjective. Likewise, if \(n> m\), it’s impossible for \(\left. DF\right\rvert_{x_0}\) to be injective.
Remark 4.
It’s hard to believe that there could even be a surjective function \(\mathbb{R}\to \mathbb{R}^2\) or an injective function \(\mathbb{R}^2\to \mathbb{R}\). However, the two sets have the same cardinality, and it is possible to construct set-theoretic bijections between the two.
Okay sure, you say, it’s harder yet to believe that there could be a continuous surjection \(\mathbb{R}\to \mathbb{R}^2\) or a continuous injection \(\mathbb{R}^2\to \mathbb{R}\). Continuous maps have to retain some idea of dimensionality, right? It so turns out that there are ways to continuously and surjectively map \(\left[ 0, 1 \right]\to \left[ 0, 1 \right]\times \left[ 0,1 \right]\) via space-filling curves. That is, one can continuously and surjectively map low-dimensional spaces onto high-dimensional spaces. I am unsure if one can continuously and injectively go the other way.
The answer for both of the questions posed prior is yes. We are not equipped to prove the answer, but we can look at a very specific scenario that we are well-equipped to handle.
Theorem 5. Implicit Function Theorem
Let \(F:\mathbb{R}^n\to \mathbb{R}\) be a continuously differentiable function. Suppose \(\vec v = \left( v_1,\ldots, v_n \right)\in \mathbb{R}^n\) such that \(F\left( \vec v \right) = 0\) and \(\frac{\partial F}{\partial x_n}\left( \vec v \right)\neq 0\). Then, there is an open subset \(U\subseteq \mathbb{R} ^{n-1}\) containing \(\left( v_1,\ldots, v _{n-1} \right)\) and a continuously differentiable function \(f: U\to \mathbb{R}\) such that \(f\left( v_1,\ldots, v _{n-1} \right) = v_n\) and \[F\left( x_1,\ldots, x _{n-1}, f\left( x_1,\ldots, x _{n-1} \right) \right) = 0\] for all \(\left(x_1,\ldots, x _{n-1}\right)\in U\).
In words, what this theorem says is that \(F ^{-1}(0)\) looks like the graph of a function \(f : \mathbb{R} ^{n-1}\to \mathbb{R}\) near \(\vec v\). This graph should be an \(n-1\)-dimensional object!
Based on the title of today’s discussion, it seems as though this is a consequence of the inverse function theorem. Indeed it is, but we need to circumvent the dimensional mismatch problem from earlier.
Proof
Define the function \(G: \mathbb{R} ^{n}\to \mathbb{R} ^{n}\) as \[G \left( \vec x \right) = \left( x_1,\ldots, x _{n-1}, F\left( \vec x \right) \right).\] \(G\) is continuously differentiable — it has continuous first-order partial derivatives in every component. Thus, with \(\vec v\) as above, one has \[\left. DG\right\rvert_{\vec v} = \begin{pmatrix} 1 & 0 & \cdots & 0 & \frac{\partial F}{\partial x_1} \\ 0 & 1 & \cdots & 0 & \frac{\partial F}{\partial x_2} \\ \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & \frac{\partial F}{ \partial x _{n-1}} \\ 0 & 0 & \cdots & 0 & \frac{\partial F}{\partial x_n} \end{pmatrix}\] is nonsingular (it is upper triangular, and all of its diagonal entries are nonzero). So, by the inverse function theorem, it must have a continuously differentiable inverse, say \(H\), defined on a neighbourhood of \(G \left( \vec v \right)\).
Write \(H = \left( h_1,\ldots, h_n \right)\), where each \(h_i\) is defined on a neighbourhood of \(G\left(\vec v\right)\). By using the fact that \(G\circ H\) is the identity, we actually get that \(h_1\left( \vec x \right) = x_1\), and likewise for \(h_2,\ldots, h _{n-1}\). In other words, \(H\left( \vec x \right) = \left( x_1,\ldots, x _{n-1}, h_n \left( \vec x \right) \right)\).
Since \(\left( v_1,\ldots, v _{n-1}, 0 \right)\in G \left( \vec v \right)\), as long as \(\left( x_1,\ldots, x _{n-1} \right)\) are close enough to \(\left(v_1,\ldots, v _{n-1}\right)\), \(h_n \left( x_1,\ldots, x _{n-1}, 0 \right)\) will be defined. Moreover, one has by construction that \[G\circ H\left( x_1,\ldots, x _{n-1}, 0 \right) = \left( x_1,\ldots, x _{n-1}, 0 \right).\] Unpacking the last coordinate of \(G\circ H\) yields \[F\left( x_1,\ldots, x _{n-1}, h_n \left( x_1,\ldots, x _{n-1}, 0 \right) \right) = 0.\] This is the desired result! \(\square\)
It should be added that one cannot weaken any of these conditions, especially the condition that the \(n\)-th partial is nonzero. Draw a picture to see what could happen when the \(n\)-th partial is zero!
Bonus: The Rank Theorem
I don’t think I’ll have time to talk about this during discussion, but the rank theorem is a big part of the picture that gives a unified answer to the question: to what extent does a function resemble its derivative?
The answer is, given continuity of the derivative, the resemblance is extremely strong. If \(F:\mathbb{R}^n\to \mathbb{R}^m\) is continuously differentiable in a neighbourhood \(U\) around \(\vec x_0\) and \(DF\) has rank \(r\) on this neighbourhood, then there is a change of coordinates such that \(F\) looks like \(DF\).
Specifically, there is an open neighbourhood \(V\) of \(x_0\), \(W\) of \(F\left( x_0 \right)\), open neighbourhoods \(V’\subseteq \mathbb{R}^n\) and \(W’\subseteq \mathbb{R}^m\) both containing the origin, and \(C^1\) functions \(\phi : V\to V’\), \(\psi : W\to W’\) with \(C^1\) inverses such that \[\psi \circ F\circ \phi ^{-1}: V’ \to V \to W \to W’\] is given by \(\left( x_1,\ldots, x_n \right) \mapsto \left( x_1,\ldots, x_r, 0,\ldots, 0 \right)\) (where the number of zeroes, possibly none, is chosen to match the dimension \(m\)).
In words, what this is saying is that you can move the origin in \(\mathbb{R}^n\) to \(\vec x_0\) and the origin in \(\mathbb{R}^m\) to \(F\left( \vec x_0 \right)\), then wiggle the standard coordinate axes so that \(F\) does look like a rank \(r\) linear map.
This strictly generalises both the inverse function theorem and the implicit function theorem; in fact, it relies on the technique of padding \(F\) in such a way that the problem gets reduced to analysing a map \(\mathbb{R}^k\to \mathbb{R}^k\) with nonsingular derivative.
Indeed, the moral of this story is: continuously differentiable functions behave locally just like their derivatives!