Hunter Liu's Website

12. Derivatives

≪ 11. Studying and Problem-Solving | Table of Contents | 13. Comments about Homework 6 ≫

Now that we’ve developed a lot of theory for continuous functions, we can look at two other closely related classes of functions — differentiable and (Riemann) integrable functions. We have the following tower of inclusions: \[\left\lbrace \substack{f:[0, 1]\to \mathbb{R} \\ \textrm{Riemann integrable}}\right\rbrace\supset \left\lbrace \substack{f:[0, 1]\to \mathbb{R} \\ \textrm{continuous}}\right\rbrace\supset \left\lbrace \substack{f:[0, 1]\to \mathbb{R} \\ \textrm{differentiable}}\right\rbrace\supset \left\lbrace \substack{f:[0, 1]\to \mathbb{R} \\ \textrm{continuously differentiable}}\right\rbrace \] We should point out that these inclusions are all strict. This is extremely important to keep in mind, as there are a lot of theorems that only hold for one of these sets of functions that cannot be “lifted” to something higher.

Derivatives as Linear Functions

Recall the definition of a derivative: a function \(f:\mathbb{R}^n\to \mathbb{R}^m\) is differentiable at some point \(x_0\) if there exists a linear function \(L:\mathbb{R}^n\to \mathbb{R}^m\) such that the limit \[\lim _{h\to 0} \frac{\left\lVert f\left( x_0+h \right)-f\left( x_0 \right) - L(h) \right\rVert}{\left\lVert h\right\rVert} \] exists and equals \(0\). \(L\) is the derivative of \(f\) at \(x_0\).

When \(n=m=1\), then you know from standard linear algebra that linear functions \(L:\mathbb{R}\to \mathbb{R}\) are given by multiplication by a real number! So this definition of the derivative coincides with the usual definition of a derivative in one dimension.

You should always think of derivatives as linear functions rather than as numbers or as limits. The point is that \(f\left( x_0+h \right)\approx f\left( x_0 \right)+L(h)\) when \(h\) is very small — that is, \(f\left( x \right)\) looks “very close” to the (affine) linear function \(f\left( x_0 \right)+L\left( x-x_0 \right)\) when \(x\) is close to \(x_0\)!

Exercise 1.

Suppose that \(f:\mathbb{R}^n\to \mathbb{R}^m\) and \(x_0\in \mathbb{R}^n\) is fixed. Suppose that there exist two linear functions \(L_1,L_2:\mathbb{R}^n\to \mathbb{R}^m\) such that \[ \lim _{h\to 0} \frac{\left\lVert f\left( x_0+h \right)-f\left( x_0 \right)-L_1(h) \right\rVert}{\left\lVert h \right\rVert}= \lim _{h\to 0} \frac{\left\lVert f\left( x_0+h \right)-f\left( x_0 \right)-L_2(h) \right\rVert}{\left\lVert h \right\rVert} =0. \] Show that \(f\) is differentiable at \(x_0\) and that \(L_1=L_2\) (i.e., the derivative of \(f\) is unique when it exists).

Beyond being a somewhat natural generalisation of the notion of a derivative to other dimensions, this characterisation of derivatives as linear functions happens to be much easier to work with than the usual difference quotient in certain situations.

Exercise 2.

  1. Suppose \(f:\mathbb{R}^n\to \mathbb{R}^m\) is differentiable at \(x_0\). Show that \(f\) is continuous at \(x_0\).
  2. Suppose \(f: \mathbb{R}^l\to \mathbb{R}^m\) is differentiable at \(x_0\) and that \(g:\mathbb{R}^m\to \mathbb{R}^n\) is differentiable at \(f\left( x_0 \right)\). Show that \[\left. D\left( f\circ g \right) \right\rvert _{x_0}= \left. Dg \right\rvert _{f\left( x_0 \right)} \circ \left. Df \right\rvert _{x_0}.\] This is the chain rule!

Exercise 3.

Let \(f:\mathbb{R}\to \mathbb{R}\) be a function. Let \(x_0\in \mathbb{R}\) be a point such that \(f\) is continuous at \(x_0\), \(f’\) is differentiable on \(\left( x_0-\epsilon, x_0+\epsilon \right)\setminus \left\lbrace x_0 \right\rbrace\) for some \(\epsilon>0\), and that \(\lim _{x\to x_0}f’\left( x \right)\) exists and equals some real number \(L\). Prove that \(f\) is differentiable at \(x_0\) and that \(f’\left( x_0 \right)=L\).

Derivatives and Limits

Following up on that last exercise, perhaps a natural question to ask is, when is the derivative of the limit of a sequence of functions equal to the limit of the derivatives? In the multi-dimensional setting it’s hard to make sense of the limit of a sequence of linear functions (more on this later), but even in the one-dimensional setting this question is not easy to settle. Even if one assumes the “nicest” possible convergence, one cannot guarantee the differentiability of the limit!

Exercise 4.

Define the sequence of functions \[f_n(x)=\sqrt{x^2+\frac{1}{n} }\] for \(x\in[-1, 1]\). Show that this sequence of functions has the following properties:

  1. \(f_n(x)\to \left\lvert x \right\rvert\) uniformly.
  2. \(\left\lvert f_n’(x) \right\rvert\leq 1\) for all \(x\) and \(n\).

The point of this is that there are sequences of functions that converge very, very nicely — the above is an example where the convergence is uniform and the functions even have bounded derivative. This is a disgustingly strong assumption on the convergence as well! The derivative works in mysterious ways…

One way to interpret this fact is that the operation of taking a derivative can often amplify areas of “roughness” (e.g. areas of high oscillation). There are many functions that are continuous on \([0, 1]\) and differentiable on \((0, 1)\), but whose derivative just is not a continuous function even on \((0, 1)\).

Exercise 5.

Show that the function \(g(x)=x^2\sin \left( \frac{1}{x^2} \right)\) is differentiable on \((-1, 1)\), but \(g’(x)\) is not Riemann integrable on any interval containing \(0\).

The moral of the story is: derivatives often make functions worse to work with. The nicer properties of differentiability, continuity, and even integrability may be lost. Because of these irregularities of differentiation, one may take this to be a very tenuous explanation for why one generally cannot swap limits with derivatives.

On the other hand, integration often makes functions a lot nicer to work with.

Exercise 6.

  1. Suppose \(f:[0, 1]\to \mathbb{R}\) is Riemann integrable. Show that the function \[F(x)=\int _{0}^{x}f(x) dx\] is continuous.

  2. Suppose \(f:[0, 1]\to \mathbb{R}\) is continuous. Show that the function \[F(x)=\int _{0}^{x}f(x) dx\] is differentiable on \((0, 1)\) and that \(F’(x)=f(x)\) for all \(x\). Use this to show that if \(f:[0, 1]\to \mathbb{R}\) is continuously differentiable \(k\) times, then \(F(x)\) is continuously differentiable \(k+1\) times.

In fact, one can show that if \(\left\lbrace f_n \right\rbrace\) are continuous functions such that \(f_n\to f\) uniformly on \([0, 1]\), then \(\int _{0}^{1}f_n(x) dx \to \int _{0}^{1}f(x) dx\). In other words, you are allowed to interchange the limit with the integral: \[\lim _{n\to\infty}\int _{0}^{1} f_n(x) dx = \int _{0}^{1}\lim _{n\to\infty}f_n(x) dx.\] This makes a lot of intuitive sense if you think about areas between curves and the definition of uniform convergence. However, proving this fact rigorously is somewhat involved, and one needs to develop a lot of intuitive but nonobvious facts about the Riemann integral. We may look into this later in the quarter.

The point of this remark is to provide intuition on the following problem:

Exercise 7.

Suppose \(\left\lbrace f_n \right\rbrace\) is a sequence of differentiable functions on \([0, 1]\) and \(f_\infty\) is a differentiable function on \([0, 1]\). Suppose \(f_n’\to f_\infty’\) uniformly on \([0, 1]\) and that \(f_n(0)\to f(0)\). Show that \(f_n\to f_\infty\) uniformly on \([0, 1]\).

You cannot use the fundamental theorem of calculus since these functions are not assumed to be continuously differentiable!

Bonus Content: More General “Convergence of Derivatives”

I’m not sure if we’ll have time to discuss this in class, but perhaps this will give a way to generalise some of the things that we’ve talked about to higher dimensions. We’ve been dodging the idea of partial derivatives this whole time, and though they are certainly relevant to the discussion of multi-dimensional derivatives, we do not have nearly enough time to include them this week.

We mentioned earlier that there were examples of continuously differentiable functions that converge uniformly with bounded derivatives to a non-differentiable function. One can even show that there are uniform limits of continuous functions that are nowhere differentiable! On the flip side, we showed that if a sequence of function’s derivatives converge uniformly to some limit, then the functions themselves also converge uniformly. We would like to generalise this to higher dimensions, but how does one make sense of the convergence of derivatives when they’re linear functions \(\mathbb{R}^n\to \mathbb{R}^m\)?

Perhaps you have an idea of this — the linear transformations \(\mathbb{R}^n\to \mathbb{R}^m\) are just \(m\times n\) matrices. This means that the \(\mathbb{R}\)-vector space of linear transformations \(\mathbb{R}^n\to \mathbb{R}^m\) is isomorphic (as \(\mathbb{R}\)-vector spaces) to \(\mathbb{R} ^{nm}\), and one can look at the Euclidean metric on the latter. This gets “pulled back” to a metric (in fact, a norm) on the set of linear transformations. But this description is quite opaque; the “Euclidean norm of a matrix” tells you very little about its properties a priori.

Exercise 8.

Define \(L(n, m)\) to be the set of linear transformation \(\mathbb{R}^n\to \mathbb{R}^m\). Define the metric \(d\) on \(L(n, m)\) as follows: given \(T, U\in L(n, m)\), \[d(T,U) = \sup \left\lbrace \left\lVert Tv-Uv \right\rVert : v\in \mathbb{R}^n, \left\lVert v \right\rVert=1 \right\rbrace.\] Here, \(\left\lVert \cdot \right\rVert\) refers to the Euclidean norm on \(\mathbb{R}^n\) and \(\mathbb{R}^m\). Show that \(d\) is a well-defined metric and that \(\left( L(n,m), d \right)\) is a complete metric space.

In addition, if \(T:\mathbb{R}^n\to \mathbb{R}^m\) and \(U:\mathbb{R}^m\to \mathbb{R}^l\), show that \(\left\lVert U\circ T \right\rVert\leq \left\lVert U \right\rVert\cdot \left\lVert T \right\rVert\). Is it true that \(\left\lVert U\circ T \right\rVert=\left\lVert T \right\rVert\cdot \left\lVert U \right\rVert\)?

I’ll remark that this can be generalised a lot; this in fact puts a metric structure on the set of linear functions between any pair of normed real vector spaces, though one cannot guarantee that it will be complete without some additional requirements. This norm is often referred to as the operator norm.

Now if \(f:\mathbb{R}^n\to \mathbb{R}^m\) is differentiable, one may think of its derivative as a map \(Df: \mathbb{R}^n\to L(n, m)\) as defined above. When \(Df\) is continuous, i.e. when \(Df\in C\left( \mathbb{R}^n, L(n, m) \right)\), one can think about putting the supremum norm on \(C\left( \mathbb{R}^n , L(n, m)\right)\) and wondering if there’s anything to say about convergence in this setting.

Of course, one cannot say that \(f_j\to f\) uniformly implies \(Df_j\to Df\) uniformly, even when one assumes that the \(f_j\)’s and \(f\) are all continuously differentiable with bounded derivatives! However, there is a way to show that if \(Df_j\to Df\) uniformly, then \(f_j\to f\) uniformly. There are technical limitations to this, however, which depend on the geometry of the set on which the \(f_j\)’s and \(f\) are differentiable.

Exercise 9.

Suppose \(U\subseteq \mathbb{R}^n\) is an open set. Suppose additionally that \(U\) is convex — if \(x, y\in U\), then for all \(t\in [0, 1]\), \(tx+(1-t)y\in U\) (i.e. \(U\) contains the line segment connecting \(x\) and \(y\)). Let \(f:U\to \mathbb{R}^m\) be differentiable, and suppose \[M=\sup \left\lbrace \left\lVert \left. Df \right\rvert _{x} \right\rVert : x\in U \right\rbrace < \infty.\] Show that for all \(x, y\in U\), \[\left\lVert f(x)-f(y) \right\rVert\leq M \left\lVert x-y \right\rVert.\] Give an explicit counterexample to show that this inequality fails when \(U\) is not assumed to be convex.

Exercise 10.

Suppose \(U\subseteq \mathbb{R}^n\) is a bounded convex open set, \(\left\lbrace f_j \right\rbrace\) a sequence of differentiable functions \(U\to \mathbb{R}^m\), \(f:U\to \mathbb{R}^m\) differentiable, and \(Df_j\to Df\) uniformly on \(U\). Suppose that \(f_j\left( x_0 \right)\to f\left( x_0 \right)\) for some \(x_0\in U\). Show that \(f_j\to f\) uniformly on \(U\).

Does this conclusion still hold if \(U\) is unbounded? What about if \(U\) is bounded but not convex?