Hunter Liu's Website

5. Week 4: Differentiable Functions

≪ 4. Week 3: Connected Sets | Table of Contents

Assuming you’ve taken calculus at one point in your life or another, you’ve probably seen differentiable functions and have probably had to compute a great many number of derivatives as well. And you almost certainly have some intuition about what kind of information derivatives encode about a function: for instance, if \(f’(x) > 0\), then \(f\) is increasing near \(x\); if \(f’(x) = 0\), then \(f\) has a local extremum or saddle point, etc.

Unfortunately, much of our intuition only applies to the case when \(f\) is continuously differentiable and not merely differentiable. Today, we’re going to focus on the latter class of functions: what are all the things that go wrong with our intuition there?

Functions Generally Do Not Resemble Their Derivatives

One is often taught that if a function \(f\) is differentiable at \(x\), then \(f’(x)\) is a “good linear approximation” to \(f\). But what “good linear approximation” really means is not what our intuition says; in particular, one cannot in general expect qualitative properties of the derivative to carry over to the original function.

Let’s warm up with showing that, when a function is continuously differentiable, our intuition usually checks out.

Problem 1.

Let \(f : (a, b)\to \mathbb{R}\) be continuously differentiable, and suppose \(x_0 \in (a, b)\) such that \(f’\left( x_0 \right) > 0\). Show that there is a \(\delta > 0\) such that for all \(x, y \in N_ \delta \left( x_0 \right)\) with \(x < y\), one has \(f(x) < f(y)\).

The idea here is that if \(x < y\) and \(f(x) \geq f(y)\), there must be some point \(\xi \) between \(x\) and \(y\) such that \(f’( \xi ) \leq 0\) (by the mean value theorem). But if both \(x\) and \(y\) are close to \(x_0\), then so must \(\xi \), and by continuity of \(f’\) this is a contradiction. More formally,

Proof

Since \(f’\) is continuous, there exists a \(\delta > 0\) such that \(f’(x) > 0\) for all \(x\in N_ \delta \left( x_0 \right)\).

Suppose \(x, y\in N_ \delta \left( x_0 \right)\) such that \(x < y\) and \(f(x) \geq f(y)\). By the mean value theorem, there exists \(\xi \in (x, y)\) such that \[f’(\xi ) = \frac{f(x) - f(y)}{x-y} \leq 0.\] But \(\xi \in (x, y) \subseteq N_ \delta \left( x_0 \right)\), so \(f’( \xi ) > 0\) by construction of \(\delta \). This is a contradiction. \(\square\)

Example 2.

Define the function \[f(x) = \begin{cases} \frac{x}{2} + x^2 \cos\left( \frac{1}{x}\right) & x\neq 0, \\ 0 & x = 0.\end{cases}\] Then \(f\) is differentiable everywhere, \(f’(0) = \frac{1}{2}\), and there is no open neighbourhood of \(0\) on which \(f\) is monotone.

For differentiability away from \(0\), one can simply apply the usual product rules and chain rules, and one finds quickly that \[f’(x) = \frac{1}{2} + 2x \cos \left( \frac{1}{x} \right) + \sin \left( \frac{1}{x} \right).\] This formula only works when \(x\neq 0\), and the limit \(\lim _{x\to 0} f’(x)\) does not exist. However, this does not preclude the existence of a derivative. One has \[f’(0) = \lim _{h\to 0}\frac{f(h)}{h} = \frac{1}{2} + \lim _{h\to 0} h\cos \left( \frac{1}{h} \right) = \frac{1}{2}.\] The last limit exists because \(\left\lvert h \cos \left( \frac{1}{h} \right) \right\rvert \leq \left\lvert h \right\rvert\) for all \(h\neq 0\), and crunching out some \(\epsilon \)’s and \(\delta \)’s gets you the rest of the way there.

However, \(f\) is not strictly monotone increasing in any neighbourhood of \(0\)! For any \(\delta > 0\), there exists \(x_0 \in N_ \delta (0)\) such that \(f’\left( x_0 \right) < 0\): one can make the \(sin\left( \frac{1}{x} \right)\) term close to \(-1\) while, for \(x\) sufficiently small, the \(2x \cos \left( \frac{1}{x} \right)\) term cannot cancel out the remaining \(-\frac{1}{2}\). But since \(f\) is continuously differentiable away from \(0\), this means \(f\) is strictly monotone decreasing near \(x_0\)!

Remark 3.

There are some great theorems in multivariable differential calculus that say that when a function \(f: \mathbb{R}^n\to \mathbb{R}^m\) is continuously differentiable, one can indeed steal qualitatitive features from the derivative of \(f\)! For instance, if \(Df\) is injective, so is \(f\); if \(Df\) is surjective, so is \(f\); if \(\ker Df\) has dimension \(k\), so do the level sets of \(f\); etc. These are more commonly known as the inverse function theorem and the implicit function theorem, and both are special cases of the rank theorem.

(These statements, of course, are only true locally.)

Continuous but Nowhere Differentiable

One might believe that “most” continuous functions are “mostly” differentiable; most examples of nondifferentiable functions only exhibit this behaviour at isolated points. But, as you’ll see on your homework and in a few sentences, some functions are nowhere differentiable despite being continuous! Here’s an example.

Define \(s(x) = \operatorname{dist} \left( x, \mathbb{Z} \right)\), the distance to the nearest integer, also known as the triangular wave function. Define the function \[f(x) = \sum _{n=0}^{\infty} 2 ^{-n} s \left( 2^nx \right).\] \(f(x)\) is well-defined: \(\left\lvert s(x) \right\rvert \leq \frac{1}{2}\) for all \(x\), so the infinite series converges at every point \(x\). Moreover, \(f\) is continuous! I’ll leave the specifics of the argument as an exercise, but I’ll loosely outline the steps: define \(f_N(x) = \sum _{n=0}^{N}2 ^{-n} s \left( 2^nx \right)\), the partial sums of \(f\). Then, do a three-point-estimate \[\left\lvert f(x) - f(y) \right\rvert \leq \left\lvert f(x) - f_N(x) \right\rvert + \left\lvert f_N(x) - f_N(y) \right\rvert + \left\lvert f_N(y) - f(y) \right\rvert.\] The first and third terms can be made small by cleverly leveraging the convergence of \(f_N\to f\); the middle term can be made small by the continuity of \(f_N\).

But \(f\) is nowhere differentiable, and this may be a bit tricker to prove. The idea is each \(2 ^{-n} s \left( 2^nx \right)\) has derivative \(\pm 1\) away from the dyadic rationals, and by naïvely adding together all the derivatives, there’s no way a bunch of \(\pm 1\)’s can add together to produce a meaningful number.

Let’s be more precise here. It’s enough to show \(f\) is nondifferentiable anywhere in \([0, 1)\) since \(f\) is \(1\)-periodic. Let \(x\in [0, 1)\); we wish to show \(\lim _{h\to 0} \frac{f(x+h) - f(x)}{h}\) does not exist. We begin by defining the binary expansion of \(x\).

We define by convention \(b_0 = 0\), \(S_0 = 0\). For \(n \geq 1\), we define inductively \[b_n = \begin{cases} 0 & x - S _{n-1} < 2 ^{-n}, \\ 1 & x - S _{n-1} \geq 2 ^{-n}.\end{cases}\] One can show inductively that \(S_n \leq x < S_n + 2 ^{-n}\) for all \(n\), and that \[x = \sum _{n=1}^{\infty} b_n 2 ^{-n}.\] (Binary expansions aren’t unique, and we’ve favoured infinitely many zeroes over infinitely many ones.)

We wish to investigate the quantity \[\frac{f(x+h)-f(x)}{h} = \frac{1}{h} \sum _{n=0} ^{\infty} 2 ^{-n} \left[ s \left( 2 ^{n}(x+h) \right) - s \left( 2 ^{n}(x) \right) \right].\] By choosing \(h\) so that \(x+h\) is a dyadic rational, we can simplify this computation considerably, as most of the \(s \left( 2 ^{n}\left( x+h \right) \right)\) will just be zero. By choosing \(h\) very small, the differences when \(s\) is nonzero will be easier to handle, as \(s\) will be linear between \(2^n(x+h)\) and \(2^nx\). By being careful with the remaining error term, we’ll be able to construct a sequence that cannot converge.

Let \(k\) be any integer. Pick \(h_k\) nonzero so that \(x + h_k\) is the closest number of the form \(\frac{a}{2^k}\) for \(a\in \mathbb{Z}\). Define \(N(k)\) as the largest integers so that no numbers of the form \(\frac{a}{2^{N(k)}}\) lie between \(x\) and \(x+h_k\), where again \(a\in \mathbb{Z}\). Then, we have \[\frac{f\left( x+h_k \right) - f(x)}{ h_k } = \frac{1}{h_k} \sum _{n< N(k)} 2 ^{-n} \left[ s \left( 2^n\left( x+h_k \right) \right) - s \left( 2^nx \right) \right] - \frac{1}{h_k} \sum _{n \geq N(k)} 2 ^{-n} s \left( 2^nx \right). \] In the first sum, when \(n < N(k)\), the functions \(s \left( 2^nx \right)\) are linear on the dyadic intervals of length \(2 ^{-n}\). In particular, the summand reduces to \(\pm h_k\), specifically \(\left(1 - 2b_n\right)h_k\), where again \(b_n\) comes from the binary expansion of \(x\)! This is because the binary coefficients describe whether \(x\) lies on the “downward slopes” or the “upward slopes” of the rescaled triangular waves.

On the other hand, one necessarily has \(\left\lvert h_k \right\rvert \geq 2 ^{-N(k)}\). Using the fact that \(0 \leq s\left( 2^nx \right) \leq \frac{1}{2}\) everywhere, one finially obtains after summing the geometric series that \[0 \leq \frac{1}{h_k} \sum _{n \geq N(k)} 2 ^{-n} s \left( 2^nx \right) < 1,\] with a strict inequality since \(s\left( 2^nx \right) = \frac{1}{2}\) cannot hold simultaneously for every \(n\). Thus, we arrive at \[\frac{f\left( x+h_k \right) - f(x)}{h_k} = \sum _{n < N(k)}\left( 1 - 2b_n \right) - R_k,\] where \(0 \leq R_k < 1\) for each \(k\). Since \(1 - 2b_n \in \left\lbrace \pm 1 \right\rbrace\) for all \(n\), and since clearly \(N(k)\to\infty\) as \(k\to\infty\), these values cannot possible converge as \(k\to\infty\).

Remark 4.

One can actually write down \(N(k)\) explicitly in terms of the binary expansion of \(x\), but I’m not juiced up enough to do so (I’m already struggling with all the off-by-one mistakes I’ve made while writing this up). The conclusion that the difference quotients can’t converge depends on this relationship with the binary expansion.