Hunter Liu's Website

8. Week 7: Metrising the Space of Continuous Functions

≪ 7. Week 6: Integrals and Meshes | Table of Contents

When we first learn about metric spaces, we describe the notion of “convergence” of a sequence of points with \(\epsilon \)’s and \(N_ \epsilon \)’s and what have you. What it means for a sequence of points to converge to a limit is dependent on the metric: different metrics change the underlying meaning of the word “converge”.

Remark 1.

When one constructs real numbers, one first defines a metric \(d_ \mathbb{Q} : \mathbb{Q}\times \mathbb{Q} \to \mathbb{Q}\), then “completes” \(\mathbb{Q} \) via Dedekind cuts or equivalence classes of Cauchy sequences. This metric, of course, is the familiar absolute value, but there’s nothing special about this metric. One can find other nontrivial metrics that produce other complete metric spaces when performing the same operation — \(p\)-adic numbers can be constructed in this manner.

It’s not uncommon for us to encounter sequences of functions — we’ve already seen several examples — “converging” to a limit in one sense or another. But not all of these come from metrics, and not all of those that do are well-behaved.

Let \(X\) and \(Y\) be metric spaces, and assume \(Y\) is complete. Define the set \[C(X; Y) = \left\lbrace f : X\to Y \mid f \textrm{ is continuous } \right\rbrace.\] When \(X\) is compact, every function \(f :X \to Y\) is bounded, and we may endow \(C(X; Y)\) with the metric \(d_\infty\) via \[d_\infty (f, g) = \sup _{x\in X} d_Y \left( f(x), g(x) \right). \] (When \(X\) is not compact, the above metric may take unbounded values. If you’re okay with the symbol \(\infty\), this is not a problem, but I’m avoiding it for my own sake.)

In this setting, \(\left( C \left( X; Y \right), d_\infty \right)\) is a complete metric space: every Cauchy sequence of functions (with respect to this metric) will converge to a limit. This metric is called the uniform metric; Cauchy sequences of functions with respect to \(d_\infty\) are called uniformly Cauchy; \(f_n \xrightarrow{d_\infty} f_\infty\) is called uniform convergence. The significance here is that a sequence of continuous functions will always converge to a continuous function under this metric.

More importantly, though, one of the thematic ideas of analysis is the idea of arbitrarily precise approximations. You’ve used this idea on your homework before: rather than proving a statement \(P(x)\) for all real numbers \(x\), you instead prove \(P(q)\) for all \(q\in \mathbb{Q}\), then extend this to \(x\in \mathbb{R}\) via some form of continuity. \(\mathbb{Q}\) can be replaced by any dense subset of \(\mathbb{R}\) and the same idea works.

Let’s specialise to \(C([0, 1]; \mathbb{R})\) and try to establish some groundwork for the following:

Dogma 2.

What is a “simple” family of functions \(D\subseteq C \left( [0, 1]; \mathbb{R} \right)\) such that any continuous function can be approximated by a sequence of functions \(f_n \in D\)?

This means that if you want to prove something tricky about an arbitrary continuous function \(f : [0, 1]\to \mathbb{R}\) (remember, these can be pretty gross), one can instead prove this about functions in \(D\) first, then perform some kind of continuity argument to extend this to \(f\).

You can probably make some reasonable guesses as to what \(D\) could be. I’ll present two answers:

1. Piecewise Linear Functions

Define \(D_L\) as the set of all piecewise linear functions (with finitely many pieces) on \([0, 1]\). To prove that \(D_L\) is dense in \(C([0, 1]; \mathbb{R})\), I need to show that for any \(f\in C \left( [0, 1]; \mathbb{R} \right)\) and any \(\epsilon > 0\), there exists a piecewise linear function \(l\) such that \(d_\infty(l, f) < \epsilon\), i.e. the graphs of \(l\) and \(f\) do not ever grow more than \(\epsilon \) apart.

Since \(f\) is continuous on \([0, 1]\), it’s uniformly continuous. Thus there exists \(\delta > 0\) such that for any \(x, y\in [0, 1]\) with \(\left\lvert x-y \right\rvert < \delta \), we have \(\left\lvert f(x) - f(y) \right\rvert < \epsilon \).

Let \(n\in \mathbb{N}\) such that \(\frac{1}{n} < \delta \), and let \(l\) be the piecewise linear function such that \[l\left( \frac{i}{n} \right) = f\left( \frac{i}{n} \right) \] for all \(i = 0,\ldots, n\). (We’re connecting \(n+1\) evenly spaced points on the graph of \(f\), each no more than \(\delta \) away from the previous.) Then, after some triangle inequalities, one can show that \[d_\infty \left( l, f \right) < 2 \epsilon .\] \(\square\)

2. Polynomials

Let \(D_P\) be the set of all (real) polynomials on \([0, 1]\). Then \(D_P\) is dense in \(C([0, 1], \mathbb{R})\).

It’s very tempting to mimick the preceeding proof: take \(n\) to be very large, then fit a very high degree polynomial to the points \(p\left( \frac{i}{n} \right) = f\left( \frac{i}{n} \right)\). Unfortunately, this is not a uniform approximation, and for some pathological functions, this won’t converge to \(f\) even pointwise. This is called Runge’s phenomenon.

This is a nontrivial statement to prove, and there’s a very fat hammer called the Stone-Weierstrass theorem from which this follows immediately. However, we’ll give a probabilistic proof that assumes nothing but the law of large numbers.

Fix a point \(x\in [0, 1]\), \(f: [0, 1]\to \mathbb{R}\) continuous, and a natural number \(n\). Define the quantity \(f_n(x)\) as follows.

Imagine you have an unfair coin which comes up heads with probability exactly \(x\). Flip this coin \(n\) times, and let \(H\) be the number of heads. Define \(Y = f \left( \frac{H}{n} \right)\). This is a random variable that takes different values depending on your luck. Let \(f_n(x)\) be the expected value of \(Y\). (Note \(H\) and \(Y\) depend on both \(n\) and \(x\), but this was suppressed from our notation.) One can write an explicit formula for \(f_n(x)\); it will be a degree \(n\) polynomial of \(x\).

Let \(\epsilon > 0\); we wish to show that for large enough \(n\) and arbitrary \(x\), \(\left\lvert f_n(x) - f(x) \right\rvert < \epsilon \). Since \(f\) is uniformly continuous, there exists \(\delta > 0\) such that if \(\left\lvert x-y \right\rvert < \delta \), then \(\left\lvert f(x) - f(y) \right\rvert < \epsilon \). Let \(M = \sup _{x\in [0, 1]} \left\lvert f(x) \right\rvert\).

There are two possibilities: either \(\frac{H}{n}\) falls in \(N_ \delta (x)\), or it doesn’t; the former becomes increasing more and more likely as \(n\to\infty\). In fact, for \(n\) sufficiently large, the former happens with probability at least \(1 - \epsilon \).

In the former case, \(f\left( \frac{H}{n} \right)\in N_ \epsilon (f(x))\), and we get that \[\left\lvert f_n(x) - f(x) \right\rvert < (1 - \epsilon ) \cdot \epsilon + \epsilon \cdot 2M.\] The first term says that with probability \(1 - \epsilon \), \(\left\lvert Y - f(x) \right\rvert < \epsilon \). In the remaining \(\epsilon \)’s probability of cases, \(\left\lvert Y - f(x) \right\rvert\) can only be as large as \(2M\). This upper bound is independent of \(x\), and we conclude \(f_n\to f\) uniformly. \(\square\)

There are many other choices — differentiable functions and smooth functions, for instance, are also dense in \(C([0, 1]; \mathbb{R})\), and some things are easier to prove for these restricted classes of functions before being extended to all continuous functions.

Space Filling Curves

Let’s give a nontrivial application of the completeness of \(C(X; Y)\) with respect to the uniform metric. On the practise midterm, there was a problem that stated that there is no continuous bijection between \([0, 1]\) and \([0, 1]^2\). However, it is true (surprisingly) that

Theorem 3.

There exists a surjective continuous function \(f: [0, 1]\to [0, 1]^2\).

The idea is simple: construct a sequence of curves \(f_n : [0, 1] \to [0, 1]^2\) that fills up more and more of the square \([0, 1]^2\) as \(n\) increases. If we construct \(f_n\) wisely, this will form a uniformly Cauchy sequence, hence converges to a continuous function \(f : [0, 1]\to [0, 1]^2\).

There is a large family of such constructions; we’ll use Hilbert’s curve.

We’ll be cutting up \([0, 1]^2\) into four squares, then \(16\) squares, then \(64\) squares, etc., and we’ll trace a path through the squares in a uniformly Cauchy way. However, not every path will work: naïvely zigzagging back and forth will produce more and more rapid oscillation and creates a sequence that isn’t uniformly Cauchy.

We’ll index our squares as follows: let \(S(\varnothing) = [0, 1]^2\). Let \(S(A), S(B), S(C), S(D)\) be the lower-left, upper-left, upper-right, and lower-right quarters of \([0, 1]^2\), respectively.

Let \(S(AA)\) be the lower-left corner of \(S(A)\), \(S(DD)\) the lower-right corner of \(S(D)\). Then, we can define \(16\) squares \(S(AA), S(AB), \ldots, S(DD)\) so that when listed in lexicographical order like this, each square is adjacent to those immediately before and after it. Additionally, we may arrange it so that \(S(AA),\ldots, S(AD)\) are the four quadrants of \(S(A)\), and likewise for the other letters. Repeat this every time you add a letter.

Put another way, given a string of \(n\) letters taken from \(\left\lbrace A, B, C, D \right\rbrace\), denoted \(l_1l_2l_3\ldots l_n\) (each \(l_i\in \left\lbrace A, B, C, D \right\rbrace\), we’ve selected one of the \(4^n\) sub-squares of \([0, 1]^2\) and denoted it \(S \left( l_1l_2l_3\ldots l_n \right)\). I’ll call these the squares on the “\(n\)-th layer”. This has the crucial property that \[S \left( l_1l_2\ldots l_n \right) \subseteq S \left( l_1l_2\ldots l _{n-1}\right) \subseteq\cdots \subseteq S \left( l_1l_2 \right) \subseteq S \left( l_1 \right) \subseteq S \left( \varnothing \right),\] and that whenever the \(4^n\) squares “at the same level” are listed in lexicographical order, they trace out a contiguous path of squares.

Now define \(f_n : [0, 1]\to [0, 1]^2\) as follows: list out the \(4^n\) squares in lexicographical order, and let \(f_n \left( \frac{i - 1}{4^n - 1} \right)\) be the centre of the \(i\)-th square for any \(i = 1, \ldots, 4^n\). Extend \(f_n\) to \([0, 1]\) as a piecewise linear function — just connect the dots!

Clearly each \(f_n\) is continuous; we need to show that \(\left\lbrace f_n \right\rbrace\) is uniformly Cauchy. Let \(\epsilon > 0\). When \(n\) is sufficiently large, the squares of the \(n\)-th layer and beyond will have diagonals shorter than \(\epsilon \). Let \(n> m\) both be this large.

Let \(x\in [0, 1]\). Let \(i = 1,\ldots, 4^m\) be such that \(\frac{i-1}{4^m-1}\) minimises the distance to \(x\), hence \(f_m(x)\) lies in the \(i\)-th square of the \(m\)-th sublayer. \(f_n(x)\) lies in the a smaller square from the \(n\)-th sublayer, contained in the very same square from the \(m\)-th sublayer. (This can be proven, but please don’t ask me to. I will cry.) Hence \(d\left( f_n(x), f_m(x) \right) < \epsilon\) whenever \(n, m\) are sufficiently large. Since \(x\) was arbitrary, \(\left\lbrace f_n \right\rbrace\) is uniformly Cauchy and admits a continuous limit \(f: [0, 1]\to [0, 1]^2\).

Finally, we need to verify that \(f\) is surjective. Let \(y\in [0, 1]^2\). There is a sequence of squares, arising from the sequence of letters \(\left\lbrace l_n \right\rbrace\), \[S \left( l_1 \right) \supseteq S \left( l_1l_2 \right) \supseteq S \left( l_1l_2l_3 \right)\supseteq S \left( l_1l_2l_3l_4 \right)\supseteq\cdots\] such that \(\bigcap _{n\in \mathbb{N}} S \left( l_1l_2\ldots l_n \right) = \left\lbrace y \right\rbrace\). Then, there is at least one point \[x\in \bigcap _{n\in \mathbb{N}} f_n ^{-1} \left( S \left( l_1l_2\ldots l_n \right) \right).\] Then \(f_n(x) \in S \left( l_1l_2\ldots l_n \right)\) for every \(n\) by construction, hence \[f(x) = \lim _{n\to\infty} f_n(x) \in \bigcap _{n\in \mathbb{N}} S \left( l_1l_2l_3\ldots l_n \right) = \left\lbrace y \right\rbrace. \] \(\square\)

I promise this proof is more convincing if you create graphics showing how each layer is organised on the interval and on the square.