3. Weierstrass' Approximation Theorem and Convolutions
≪ 2. The Diagonalisation Argument and Equicontinuity | Table of Contents | 4. Integration Review ≫Often times, when we are handling sequences of functions on the real line, it’s helpful to work with a dense subset of \(\mathbb{R}\) (such as \(\mathbb{Q}\)) and use some delicate estimates to extend statements to all of \(\mathbb{R}\). ArzelĂ -Ascoli, for instance, uses the diagonalisation argument on a countable dense subset to get a uniformly converging subsequence.
Quite frequently, we’ll be studying more complicated spaces than \(\mathbb{R}\), and a very common example is the space of continuous functions on a compact metric space. It can be very, very challenging to describe dense subsets of these spaces.
Consider, for instance, \(C\left( \left[ 0,1 \right] \right)\). This is separable (fortunately), and you have shown in the past that piecewise linear functions are a dense subset of this space (with respect to the uniform metric). But is this really a good dense subset to work with? These functions aren’t even continuously differentiable, so tools from calculus like the mean value theorem and the fundamental theorem of calculus become inaccessible!
We’ll see later today that smooth functions and polynomials are actually dense in this space — these form an extremely nice class of functions, and Weierstrass’ proof gives a very nice picture for why this should be true. However, this proof does not generalise well beyond compact subsets of \(\mathbb{R}^n\), for instance; Stone’s proof allows for a far more general description of the dense subsets of spaces of continuous functions.
Theorem 1. Stone-Weierstrass
Let \(X\) be a compact metric space, and denote by \(C\left( X,\mathbb{R} \right)\) the set of continuous functions \(f:X\to \mathbb{R}\). Let \(A\subseteq C\left( X,\mathbb{R} \right)\) satisfy the following:
- \(A\) is a \(\mathbb{R}\)-subalgebra of \(C\left( X, \mathbb{R} \right)\) (it is an \(\mathbb{R}\)-vector subspace and is closed under “vector multiplication”).
- \(A\) contains a nonzero constant function.
- \(A\) separates points — for every \(x\neq y\) in \(X\), there is some \(f\in A\) such that \(f(x)\neq f(y)\). Then \(A\) is dense in \(C\left( X, \mathbb{R} \right)\) with respect to the uniform metric.
Today, however, we’ll be aiming for something a lot less ambitious, though the proof is much more down-to-earth:
Theorem 2. Weierstrass Approximation
Polynomials are dense in \(C\left( \left[ 0,1 \right] \right)\) with respect to the uniform metric.
We won’t aim for a complete proof of this theorem; we’ll give an outline of the details. If you’re interested in the details however, you can find Weierstrass’ full proof here.
Convolution
The main tool for the proof will be convolutions.
Definition 3.
Let \(f, g : \mathbb{R} \to \mathbb{R}\) be two Riemann integrable functions. The convolution of \(f\) and \(g\), denoted \(f*g\), is given by \[f*g(t) = \int _{\mathbb{R}} f(x) g(t-x) dx.\]
You should think of a convolution as a continuous version of a weighted average. 3Blue1Brown has a great video on the topic if you’d like to develop some intuition. An analogy I heard once was that a convolution is like when two functions get married and have a kid, and the two parents passed off the best genes onto their convolved baby. A very important example of this is:
Lemma 4.
Let \(f\) be bounded and uniformly continuous. Let \(g\) be a smooth function on \(\mathbb{R}\) such that \(\int _{\mathbb{R} }g(x) dx < \infty.\) Then \(f*g\) is a smooth function on \(\mathbb{R}\).
We won’t prove this lemma, but it can be demonstrated straight from the definitions. In fact, the derivative just falls on the \(g\): \[\frac{d}{dt} f*g(t) = f*g’(t).\]
We should imagine that the \(g\) in the lemma is a smooth “bump function”, i.e. that it looks like a little hump on the real line. Again thinking of \(f*g\) as a weighted average of \(f\), \(f*g(t)\) ends up resembling the “average value of \(f\) near \(t\)”. Under the right conditions, as the support of \(g\) shrinks to a smaller and smaller interval, this weighted average gets closer and closer to the true value of \(f\).
More formally, we have:
Lemma 5.
Let \(\phi (x) =\exp\left( -\pi x^2 \right)\), and for \(\epsilon >0\) define \(\phi _ h (x) = \frac{1}{h} \phi \left( \frac{x}{h} \right)\). Let \(f: \mathbb{R} \to \mathbb{R}\) be a bounded and uniformly continuous function. Then \(f*\phi _ h\to f\) uniformly on \(\mathbb{R}\) as \(h\to 0\).
Let’s prove it! First, we’ll remark that for all \(h>0\), \[\int _{\mathbb{R}}\phi _h(x) dx = 1.\] You can do a change of variables \(u=\frac{x}{h}\) to see this!
Fix \(\epsilon > 0\), and let \(M\) be a constant such that \(\left\lvert f(x) \right\rvert \leq M\) for all \(x\in \mathbb{R}\). There is a \(\delta >0\) such that \(\left\lvert x-y \right\rvert < \delta \) implies \(\left\lvert f(x) - f(y) \right\rvert< \epsilon\), by uniform continuity.
Let \(R>0\) so large that \(\int _{-R}^{R} \phi (x) dx \geq 1-\epsilon.\) The limit of this quantity as \(R\to\infty\) is \(1\), so this is possible! In particular, the integral \(\int _{\left\lvert x \right\rvert > R} \phi (x) dx < \epsilon .\)
Consider the difference \(f*\phi _h(t) - f(t)\). Since \(\phi _h\) integrates to \(1\), we may write \(f(t) = \int _{\mathbb{R}}f(t) \phi _h(x) dx\). Expanding the definition of the convolution, we get \[f*\phi _h(t) - f(t) = \int _\mathbb{R} \phi _h(x) \left[ f(t-x) - f(t) \right] dx.\] Now perform the substitution \(u=\frac{x}{h}\), or \(x=hu\). This becomes \[f*\phi _h(t) - f(t) = \int _{\mathbb{R}}\phi (u) \left[ f\left( t-hu \right)- f(t) \right] du.\] By the triangle inequality, and after splitting up the integral into two pieces, we get \[\left\lvert f*\phi _h(t) - f(t) \right\rvert\leq \int _{-R}^{R}\phi (u) \left\lvert f\left( t-hu \right)-f(t) \right\rvert du + \int _{\left\lvert u \right\rvert > R} \phi (u) \left\lvert f\left( t-hu \right)-f(t) \right\rvert du.\] When \(h\) is so small that \(Rh < \delta\), we have that \(\left\lvert f(t-hu)-f(t) \right\rvert < \epsilon \) for all \(-R \leq u \leq R\) by the uniform continuity from earlier. On the other hand, we always have that \(\left\lvert f(t-hu)-f(t) \right\rvert\leq 2M\) for any \(\left\lvert u \right\rvert> R\). Thus, we may bound \[\left\lvert f*\phi _h(t) - f(t) \right\rvert \leq \epsilon \int _{-R}^{R}\phi (u) du + 2M \int _{\left\lvert u \right\rvert > R} \phi (u) du < \epsilon + 2M \epsilon.\] The first integral is bounded by \(1\) since \(\phi\) is nonnegative and integrates to \(1\); the second integral is bounded by \(\epsilon \) by construction of \(R\)! This is true for all \(h\) very small and for all \(t\in \mathbb{R}\), and the claim follows. \(\square\)
As a consequence of this somewhat lengthy computation,
Theorem 6.
Polynomials are dense in \(C\left( \left[ 0,1 \right] \right)\) with respect to the uniform metric.
We shall not prove this theorem, but let’s outline the steps.
- Start with some \(f\in C \left( \left[ 0,1 \right] \right)\), and extend its domain to \(\mathbb{R}\) continuously, and so that \(f\) is zero far away from \(\left[ 0,1 \right]\). This extension, if chosen correctly, will be uniformly continuous.
- Let \(\epsilon >0\), and let \(h\) so small that \(f*\phi _h\) uniformly approximates \(f\) within \(\epsilon\) on \(\mathbb{R}\).
- Take a Taylor polynomial approximation of \(\phi _h\), say \(P(x)\). Pick the degree of the Taylor polynomial so that \(\left\lvert P(x) - \phi _h(x) \right\rvert < \epsilon\) whenever \(f(x)\neq 0\).
- Bound \(\left\lvert f*\phi _h(t) - f * P(t) \right\rvert\) by a multiple of \(\epsilon \), which depends on both the continuous extension of \(f\) and its maximum size.
- Demonstrate that \(f*P\) is still a polynomial and conclude that \(f*P\) uniformly approximates \(f\) within a constant multiple of \(\epsilon \).
What’s most important is the idea that you can (smoothly) approximate functions using convolutions! This is a technique that is ubiquitous in analysis, and I guarantee that you’ll see some form of convolutions show up no matter what kind of analysis you study. Even algebraists have to deal with convolutions sometimes (though they do things a bit differently…).
Let’s look at an example of how we can apply Stone-Weierstrass.
Example 7.
Let \(f\) be a continuous function on \(\left[ 0,1 \right]\to \mathbb{R}\) such that \[\int _{0}^{1}f(x) x^n dx = 0\] for all \(n\geq 0\). Show that \(f(x)=0\) for all \(x\).
The idea behind this proof might be somewhat surprising if you’ve never seen it before. Ultimately, the argument we’ll be making is based on the fact
Exercise 8.
Let \(X\) and \(Y\) be two metric spaces, and let \(g, h : X\to Y\) be continuous functions. Let \(D\) be a dense subset of \(X\), and suppose \(g(x) = h(x)\) for all \(x\in D\). Then \(g(x)=h(x)\) for all \(x\in X\).
The idea is to now define the function \(I : C\left( \left[ 0,1 \right] \right)\to \mathbb{R}\) via \[I(\phi ) = \int _{0}^{1} f(x) \phi (x) dx\] for any \(\phi \in C \left( \left[ 0,1 \right] \right)\). This is continuous with respect to the uniform metric! Let \(C = \int _{0}^{1} \left\lvert f(x) \right\rvert dx\). If \(C=0\), we’re done, so suppose \(C\neq 0\). Then we have that whenever \(d_\infty(\phi, \psi ) < \frac{\epsilon }{C}\), then \[\left\lvert I(\phi )-I( \psi ) \right\rvert \leq \int _{0}^{1} \left\lvert f(x) \right\rvert\left\lvert \phi (x) - \psi (x) \right\rvert dx < \frac{\epsilon }{C}\int _{0}^{1} \left\lvert f(x) \right\rvert dx = \epsilon.\] By definition, it follows that \(I\) is continuous. But notice that \(I(p)=0\) whenever \(p\) is a polynomial! by Stone-Weierstrass, \(I=0\) on a dense subset of \(C\left( \left[ 0,1 \right] \right)\), hence \(I=0\) on all of \(C \left( \left[ 0,1 \right] \right)\). In particular, \[I(f) = \int _{0}^{1}f(x)^2 dx = 0,\] which is impossible unless \(f\) itself is \(0\). \(\square\)