Hunter Liu's Website

11. A Generalisation of the Implicit Function Theorem

≪ 10. The Inverse Function Theorem and More | Table of Contents | 12. Some Final Practise Problems ≫

Last week, we discussed the inverse function theorem, and we described a way to interpret the implicit function theorem as a generalisation of the inverse function theorem. More broadly, these theorems fit in a bigger picture of how \(C^1\) functions are related to their derivatives. We had posed the question:

Question 1.

Let \(F:\mathbb{R}^n\to \mathbb{R}^m\) be a \(C^1\) function. Suppose \(\left. DF\right\rvert_{x_0}\) is a rank \(r\) linear map and that \(F\left( x_0 \right)=0\); then, \(\ker \left( \left. DF\right\rvert_{x_0} \right)\) is an \(n-r\)-dimensional subspace of \(\mathbb{R}^n\). Heuristically, \(F ^{-1}(0)\) should “look like” \(\ker \left( \left. DF\right\rvert_{x_0} \right)\). Is \(F ^{-1}(0)\) an \(n-r\)-dimensional object, in some suitable sense?

The inverse function theorem answers this question when \(n=m=r\). It says that the level sets of \(F\) isolated points, which is a \(0\)-dimensional object.

The implicit function theorem answers this question when \(m=r=1\), and it says that yes, the level sets of \(F\) will look like the graph of a function \(f: \mathbb{R} ^{n-1}\to \mathbb{R}\). In this sense, the level sets of \(F\) do look \(n-1\)-dimensional, as anticipated.

On this week’s homework, you’ll be tasked with addressing the cases when \(r=m\) and when \(r=n\). The case where \(r=m\) (i.e. the differential is surjective) is a generalisation of the implicit function theorem, and we’ll look at this in more depth today. The case where \(r=n\) can be proven with similar methods.

Some Linear Algebra

You might have thought that you were done doing linear algebra for the rest of your life, but sadly this is not the case. Derivatives are linear maps, and we should remember a few facts about linear transformations in preparation for what’s to come. We’ll specifically be working with matrices, so we won’t be worrying about too much abstract nonsense.

To save myself from thinking too hard (and from potential embarassment), we’ll only consider Euclidean spaces for today.

Theorem 2. The Rank-Nullity Theorem

Let \(T:\mathbb{R}^n\to \mathbb{R}^m\) be a linear transformation. Then, \[\operatorname{rank} T + \operatorname{nullity}T = n.\] In words, the rank plus the nullity is the dimension of the domain.

Corollary 3.

Let \(A\) be an \(m\times n\) matrix (with real entries). Then, the dimension of the column space of \(A\) is equal to the dimension of the row space of \(A\).

Hopefully these are familiar statements, but don’t worry if they’re not. Try working through some examples and drawing plenty of pictures to see why this is true.

Coordinate Changes

Before we keep going, let’s introduce the idea of a “change of coordinates”. If you’ve taken linear algebra before, you should be familiar with the “change of basis” operation. One can graphically interpret this operation as rotating and scaling the coordinate axes of a vector space; in two dimensions, the square coordinate grid then transforms into a skew parallelogram coordinate grid. In a sense, these are transforming the perspective with which one sees space.

This is a linear change of coordinates. Some great theorems of linear algebra say that given a linear transformation, there is a “canonical perspective” to take: rational canonical form, Jordan canonical form, and singular value decomposition come to mind, but algorithms such as Graham-Schmidt and Gaussian elimination also fit into this idea.

Such canonical forms are useful because the encapsulate qualitative data about the algebraic behaiour of a linear transformation. For instance, the Jordan canonical form of a matrix describes both the set of eigenvalues of the transformation and the generalised eigenspaces attached to them. The output of Gaussian elimination describes the dimension of the image (and also the null space) of the linear transformation.

One might wonder, is there a way to meaningfully adapt the idea of the linear change of coordinates so that \(C^1\) maps \(\mathbb{R}^n\to \mathbb{R}^m\) have some sort of “canonical form”? What would even constitute a “change of coordinates”?

You are already very familiar with some changes of coordinates, most likely against your will: cylindrical, spherical, and polar coordinates are all (smooth!) changes of coordinates from the standard rectangular coordinate system we know and love. The utility of these is primarily computational; you’ve most likely used them to solve certain integrals. Draw a picture!

The drawback of these coordinate systems is that they’re not globally functional coordinate systems: polar representations of a point in \(\mathbb{R}^2\) are never unique, and there are some degenerations near the origin. We will get around this by defining

Definition 4.

Let \(U\) and \(V\) be two open subsets of \(\mathbb{R}^n\). A \(C^1\) change of coordinates from \(U\) to \(V\) is a \(C^1\) bijection \(\psi : U\to V\) with a \(C^1\) inverse.

This looks like a garbage definition that has nothing to do with anything. Let’s return to the polar change of coordinates: if \(U = (1, 2)\times \left( 0, 2\pi \right)\) and \[V = \left\lbrace \left( x, y \right)\in \mathbb{R} ^2 : 1 < \sqrt{x^2+y^2} < 2 \right\rbrace \setminus \left\lbrace \left( x, 0 \right)\in \mathbb{R}^2 : x > 0\right\rbrace .\] \(U\) is an open rectangle, and \(V\) is a slit annulus. Then, \(\psi : U\to V\) via \(\left( r, \theta \right)\mapsto \left( r\cos \theta , r \sin \theta \right)\) is a \(C^1\) change of coordinates! Draw a picture to see what I mean.

Let’s now return to the inverse function theorem and implicit function theorem. We can restate them in a new perspective using this idea of changing coordinates:

Theorem 5. Inverse Function Theorem

Let \(U\subseteq \mathbb{R}^n\) open, \(F:U\to \mathbb{R}^n\) a \(C^1\) function. Suppose \(x_0\in U\) such that \(\left. DF\right\rvert_{x_0}\) is nonsingular.

Then, there exist:

  • an open neighbourhood \(V\) of \(x_0\),
  • an open neighbourhood \(W\) of \(F\left( x_0 \right)\),
  • and a \(C^1\) change of coordinates \(\psi : V\to W\)

such that \(F\circ \psi\) is the identity on \(W\) and \(\psi ^{-1} \circ F\) is the identity on \(V\).

In particular, \(F\) itself is a \(C^1\) change of coordinates near \(x_0\). This is moderately whelming and seemingly pointless — we have stated a once-familiar theorem in profoundly redundant and convoluted language. Let us do this again with the implicit function theorem.

Theorem 6. Implicit Function Theorem

Let \(U\subseteq \mathbb{R}^n\) open and \(f:U\to \mathbb{R}\) a \(C^1\) map. Suppose \(\frac{\partial f}{\partial x_n}\left( y_0 \right)\neq 0\) at some point \(y_0\in U\). Then, there exist:

  • an open neighbourhood \(V\subseteq U\) of \(y_0\),
  • on open subset \(V’\subseteq \mathbb{R}^n\),
  • and a \(C^1\) change of coordinates \(\psi : V’\to V\)

such that \(f\circ \psi \left( y_1,\ldots, y_n \right) = y_n\) for all \(\left( y_1,\ldots, y_n \right)\in V’\).

This is a much more significant reinterpretation of the implicit function theorem: it’s saying that under the right circumstances, \(f\) looks like the projection onto a coordinate axis. This is a wildly different interpretation of the implicit function theorem presented last week; however, now the level sets of \(f\) are (up to \(C^1\) changes of coordinates) just \(n-1\)-dimensional hyperplanes near \(y_0\)!

A Generalisation

One might expect that (under suitable conditions) a \(C^1\) map \(\mathbb{R}^n\to \mathbb{R}^m\) should look like the projection onto the first \(m\) coordinates, perhaps after a \(C^1\) change of coordinates. This turns out to be true, and it’s a natural generalisation of the implicit function theorem as stated above!

This is a problem on the last homework assignment. The big idea is that you’re putting a \(C^1\) map \(\mathbb{R}^n\to \mathbb{R}^m\) into a “canonial form”, in much the same way that we’ve put linear transformations into canonical forms after a suitable change of basis. Of course, I can’t (and shouldn’t) give a full solution to this, but we’ll outline the main ideas.

Theorem 7. Homework 7, Problem 3

Let \(U\subseteq \mathbb{R}^n\) open, \(F:U\to \mathbb{R}^m\) a \(C^1\) function. Suppose \(y_0\in U\) such that \(\left. DF\right\rvert_{y_0}\) is surjective. Then \(n\geq m\), and there exist:

  • an open neighbourhood \(V\subseteq U\) of \(y_0\),
  • an open subset \(V’\subseteq \mathbb{R}^n\),
  • and a \(C^1\) change of coordinates \(\psi : V’\to V\)

such that \(f\circ \psi \left( y_1,\ldots, y_n \right) = \left( y_1,\ldots, y_m \right)\) for all \(\left(y_1,\ldots, y_n\right)\in V’\).

Let’s outline the proof in moderate detail.

  1. Using Theorem 2, show that \(n\geq m\).
  2. The rank of \(\left. DF\right\rvert_{y_0}\) is the dimension of the column space of \(\left. DF\right\rvert_{y_0}\). Deduce that there are \(m\) linearly independent columns of \(\left. DF\right\rvert_{y_0}\).
  3. Consider the map \(G:U\to \mathbb{R}^n\) given by \[G\left( x_1,\ldots, x_n \right) = \left( F\left( x_1,\ldots, x_n \right), x _{m+1},\ldots, x_n \right). \] Compute that the differential is \[\left. DG\right\rvert_{y_0} = \begin{pmatrix} &&&&&&\\ &\cdots&& \left. DF\right\rvert_{y_0} &&\cdots & \\ &&&&&&\\ 0 & \cdots & 1 & 0 & \cdots & 0 & 0 \\ 0 & & 0 & 1 & & 0 & 0 \\ \vdots & & \vdots & \vdots & & \vdots & \vdots \\ 0 & & 0 & 0 & & 1 & 0 \\ 0 & \cdots & 0 & 0 & \cdots & 0 & 1 \end{pmatrix}. \] Gosh, that’s an ugly matrix. The upper \(m\times n\) block is just the derivative of \(F\). The bottom \((m-n)\times n\) block begins with \(m-n\) columns of pure zeroes, followed by an \(m\times m\) copy of the identity matrix. Figure out how I got this!
  4. Show that all \(n\) columns of \(\left. DG\right\rvert_{y_0}\) are linearly independent. Deduce that \(\left. DG\right\rvert_{y_0}\) is nonsingular.
  5. Apply the inverse function theorem to \(G\). In fact, \(G\) is the change of coordinates we need, it just remains to show that the composition \(F\circ G ^{-1}\) is the right form! You need to write down \(G=\left( g_1,\ldots, g_n \right)\), where each \(g_i:\mathbb{R}^n\to \mathbb{R}\) is \(C^1\) for \(1\leq i\leq n\). These are the coordinate functions of \(G\). Then, use the fact that \(G\circ G ^{-1}\) is the identity near \(y_0\) to obtain some relations on the component functions of \(G\). Look back at our proof of the implicit function theorem!

One can replicate this proof to show that if \(F:U\to \mathbb{R}^m\) has an injective differential at the point \(y_0\), then there is a change of coordinates \(\psi \) such that \[F\circ \psi \left( x_1,\ldots, x_n\right) = ( x_1,\ldots, x_n, \overbrace{0,\ldots, 0}^{m-n \textrm{\ zeroes}} ).\] These are the most standard “canonical forms” of these functions.

Note that these “canonical forms” only work locally. Generally speaking, these changes of coordinates need not exist globally, in much the same way that the inverse function theorem cannot guarantee the global invertibility of a function.

As a concluding remark, these statements are all shadows of the rank theorem from differential geometry. This theorem says that as long as the rank of \(\left. DF\right\rvert_{x}\) is constant near a point \(y_0\), then such a canonical form exists. As a fun exercise, try replicating the above proofs when you are in this “constant rank” setting. Does the proof translate over one-to-one? Why do we only need to assume surjectivity or injectivity at a single point in the above argument, yet we need to assume constant rank in a neighbourhood for the rank theorem?