Hunter Liu's Website

Divination via Linear Algebra

Date Written: April 24, 2025; Last Modified: April 24, 2025

When I taught numerical methods last summer, one of the central problems was: consider an unknown function \(f\), and suppose one knows that \(f\left( x_0 \right) = y_0\), \(f\left( x_1 \right) = y_1\), \(\ldots\), \(f\left( x_n \right)= y_n\). Using this data, can you estimate \(f(x)\)? Can you estimate \(f’(x)\)? Can you estimate \(\int _{a}^{b} f(x) dx\)?

Let’s begin with the first problem, often called

Interpolation.

An easy result is: given \(n+1\) points \(\left( x_0, y_0 \right), \ldots, \left( x_n,y_n \right)\), there exists a polynomial \(P\) of degree at most \(n\) such that \(P\left( x_i \right) = y_i\) for all \(i\). The proof is: \[P(x) = \sum _{i=0}^{n} y_i \prod _{k\neq i} \frac{x - x_i}{x_k - x_i}.\] Plugging in \(x = x_i\) zeroes out everything except for the term with \(y_i\), in which the product reduces to \(1\). (Of course one needs to ensure that the data are nondegenerate, e.g. \(x_i\neq x_j\) for \(i\neq j\), but please be reasonable with me.)

In practise, this formula is pretty terrible because (a) it’s ugly and (b) is a disgrace to its family. So one can squeeze this formula with an extraordinary amount of cleverness to factor and rearrange it to a more computationally friendly form, one which can be efficiently computed by algorithms which are taught in this class. These are taught to underlings as, for instance, Neville’s algorithm and divided differences.

In addition to being ugly, the formula above is proven somewhat serendipitously. One can instead formulate the polynomial interpolation problem from the framework of linear algebra and systematically construct the same solution.

Fix \(x_0,\ldots, x_n\). Let \(\mathcal{P}_n\) be the space of real polynomials of degree at most \(n\). Then \(\dim \mathcal{P}_n = n+1\), hence it’s isomorphic to its dual space \(\mathcal{P}_n^*\); in particular, \(\dim \mathcal{P} _{n}^{*} = n+1\).

Let \(\operatorname{ev} _{i}\in \mathcal{P} _{n}^{*}\) denote the linear functional \(\mathcal{P}_n \ni P \mapsto P\left( x_i \right)\). Assuming the \(x_i\)’s are pairwise disjoint, this is a dual basis. In particular, for any \(x\), one may define the function \(\operatorname{ev}_x \in \mathcal{P} _{n}^{*}\) via \(P\mapsto P(x)\) and write the linear combination \[\operatorname{ev}_x = c_0(x) \operatorname{ev}_0 + \cdots + c_n(x) \operatorname{ev}_n.\]

So the interpolation problem reduces to determining these coefficients, and to do so we can test either side against a “test basis” \(b_0, \ldots, b_n\) of \(\mathcal{P}_n\). Specifically, after applying each side of the above to \(b_0,\ldots, b_n\), one has the matrix equation \[ \begin{bmatrix} b_0(x) \\ \vdots\\ b_n(x) \end{bmatrix} = \begin{bmatrix} b_0 \left( x_0 \right) & \cdots & b_0\left( x_n \right) \\ \vdots & & \vdots \\ b_n \left( x_0 \right) & \cdots & b_n \left( x_n \right)\end{bmatrix} \begin{bmatrix} c_0(x) \\ \vdots \\ c_n(x) \end{bmatrix}. \] Inverting the fat matrix produces the coefficients we so desire and hence solves the interpolation problem. In particular, selecting the natural basis of \(\mathcal{P}_n\) of \(b_i(x) = x^i\) gives us a Vandermonde matrix; inverting it and doing the tedious computations yields exactly \(c_i(x) = \prod _{k\neq i} \frac{x - x_i}{ x_k - x_i }\), consistent with our serendipitous construction at the beginning.

When we’re prescribed the points \(\left( x_0, y_0 \right),\ldots, \left( x_n, y_n \right)\), we have \[P(x) = \operatorname{ev}_x(P) = \sum _{i=0}^{n} c_i(x) \operatorname{ev}_i(P) = \sum _{i=0}^{n} c_i(x) y_i.\] In particular, we’re interested in computing the matrix product \[\begin{bmatrix} y_0 & \cdots & y_n \end{bmatrix} \begin{bmatrix} c_0(x) \\ \vdots \\ c_n(x) \end{bmatrix} = \begin{bmatrix} y_0 & \cdots & y_n \end{bmatrix} \begin{bmatrix} b_0\left( x_0 \right) & \cdots & b_0\left( x_n \right)\\ \vdots && \vdots \\ b_n\left( x_0 \right) & \cdots & b_n\left( x_n \right) \end{bmatrix} ^{-1} \begin{bmatrix} b_0(x) \\ \vdots \\ b_n(x) \end{bmatrix}. \]

First, sticking with the standard basis of \(\mathcal{P}_n\) and doing some magic with block matrices allows us to factor the rightmost product and write \[P(x) = \begin{bmatrix} y_0 & \cdots & y_n \end{bmatrix} M_n M_{n-1} \cdots M_1,\] where each \(M_i\) is a bidiagonal \(i\times (i+1)\) matrix. This recovers Neville’s algorithm, and the subproducts contain interpolations through contiguous subsets of the input data.

We also have a great degree of freedom in selecting the basis \(b_0,\ldots, b_n\). Selecting the basis \(b_i(x) = \prod _{k < i} \left( x - x_k \right)\) makes the big matrix triangular (upper, lower, does it matter?), in which case forward or backwards substitution gives us a relatively straightforward algorithm of computing \[\begin{bmatrix} b_0\left( x_0 \right) & \cdots & b_0\left( x_n \right) \\ \vdots && \vdots \\ b_0 \left( x_n \right) & \cdots & b_n \left( x_n \right) \end{bmatrix} ^{-1} \begin{bmatrix} b_0(x) \\ \vdots \\ b_n(x) \end{bmatrix}.\] This recovers the divided differencing algorithm.

An advantage of this perspective and presence of machinery is its generalisability to other polynomial interpolation problems. Consider instead the problem of finding a polynomial \(P\) so that \(P\left( x_i \right) = y_i\) and \(P’ \left( x_i \right) = y_i’\). I’m not smart enough to observe a formula that just works, but I can apply this linear-algebraic argument and eventually arrive at a solution. (This is known as Hermite interpolation.)

So far, I have provided nothing of value to humanity. In fact, Wikipedia’s page on divided differencing explains exactly this idea of cleverly selecting a basis to make this matrix inversion computationally efficient, and the pages on polynomial and Hermite interpolation also describe the linear-algebraic formulation of the problem.

But there’s nothing special about the funcational \(\operatorname{ev}_x\), and we can extend this idea to the problem of

Numerical Differentiation.

Given \(n+1\) data points \(\left( x_0, y_0 \right),\ldots, \left( x_n, y_n \right) \) and some unknown function \(f\) passing through these points, what’s the best way to estimate \(f’(x)\)?

Conventionally, one is taught to take the Taylor expansion of \(f\) and play around with it for a while. Specifically, writing \[f(t) = f(x) + f’(x) \left( t-x \right) + \cdots + \frac{f ^{(n)}(x)}{n!}(t-x)^n + \mathcal{O}\left( (t-x) ^{n+1} \right),\] one can make the guess that \(f’(x) \approx \sum _{j=0}^{n}d_j(x) f\left( x_j \right)\) for the right choice of constants \(d_0(x), \ldots, d_n(x)\). Pushing this all through the Taylor expansion and collecting like terms, one gets \[\sum _{j=0}^{n} d_j(x) f\left( x_j \right) \approx \sum _{k=0}^{n} \frac{f ^{(k)}(x)}{k!} \sum _{j=0}^{n}d_j(x) \left( x_j - x \right) ^{k},\] plus some (hopefully) negligible error. Setting the inner sum to \(0\) when \(k\neq 1\) and equal to \(1\) when \(k=1\), we then obtain a system of \(n+1\) equations for \(n+1\) unknowns.

Like the problem of interpolation, this is a serendipitous move that’s quite tedious to perform by hand. And like the problem of interpolation, the machinery of linear algebra presciently laughs at the futile flailing of us humans below.

The setup is very similar: we’re exposed to the information of the \(n+1\) linear functionals \(\operatorname{ev}_0,\ldots, \operatorname{ev}_n \in \mathcal{P} _{n}^{*}\), which forms a basis. The functional \(D_x\in \mathcal{P}_n^*\) sending \(P\mapsto P’(x)\) (i.e. differentiation at \(x\)) is thus expressible as \(D_x = d_0(x) \operatorname{ev}_0 + \cdots + d_n(x) \operatorname{ev}_n\), and upon selecting a basis \(\left\lbrace b_0, \ldots, b_n \right\rbrace\) of \(\mathcal{P}_n\), we may set up the exact same matrix equation as before: \[\begin{bmatrix} b_0’(x) \\ \vdots \\ b_n’(x) \end{bmatrix} = \begin{bmatrix} b_0\left( x_0 \right) & \cdots & b_0 \left( x_n \right)\\ \vdots && \vdots \\ b_n \left( x_0 \right) & \cdots & b_n \left( x_n \right) \end{bmatrix} \begin{bmatrix} d_0(x) \\ \vdots \\ d_n(x) \end{bmatrix}.\] Picking the basis \(b_j(t) = \left( t-x \right)^j\) recovers exactly the same system of equations that we derived from the Taylor expansion.

Finally, we come to the example of

Numerical Integration.

Given the same data of \(n+1\) points and a function \(f\) passing through them, we instead want to estimate \(\int _{a}^{b} f(x) dx\). The “obvious” thing to do is to fit a degree \(n\) interpolating polynomial through these points (thereby approximating \(f(x)\)), then integrating the interpolating polynomial and reading off the dependency on the data \(\left( x_0,y_0 \right),\ldots, \left( x_n, y_n \right)\). One again ends up with an approximation \[\int _{a}^{b} f(x) dx \approx i_0(x) f\left( x_0 \right) + \cdots + i_n(x) f\left( x_n \right)\] for some good choice of coefficients \(i_0(x),\ldots, i_n(x)\). The machinery of linear algebra once again gives us exactly the same setup as we had for the previous two examples. Define \(I\in \mathcal{P}_n^*\) as the functional \(P\mapsto \int _{a}^{b}P(x) dx\), pick a basis \(b_0,\ldots, b_n\) of \(\mathcal{P}_n\), then solve the system \[\begin{bmatrix} I\left( b_0 \right) \\ \vdots \\ I \left( b_n \right) \end{bmatrix} = \begin{bmatrix} b_0\left( x_0 \right) & \cdots & b_0 \left( x_n \right)\\ \vdots && \vdots \\ b_n \left( x_0 \right) & \cdots & b_n \left( x_n \right) \end{bmatrix} \begin{bmatrix} i_0(x) \\ \vdots \\ i_n(x) \end{bmatrix}.\]

I think what’s surprising about these last two examples is that there are no consequential choices or educated guesses to make when using the machinery of linear algebra; all one does is pick some basis, and one will always end up with the same coefficients. Constrast this with the choices one must make in its absence.

When performing numerical differentiation, the Taylor polynomial gives a good local approximation of \(f\), and we’ve found a way to use our \(n+1\) data points to differentiate the Taylor polynomial itself. Likewise, with numerical quadrature, the interpolating polynomial is a good global approximation of \(f\), hence integrating it will approximately integrate \(f\). We’d expect flipping these choices around — using the interpolating polynomial for differentiation and Taylor polynomial for integration (however that works) — to fail spectacularly.

Except it doesn’t.

In the example of numerical differentiation, if \(P\) is the interpolating polynomial passing through \(\left( x_0, y_0 \right),\ldots, \left( x_n,y_n \right)\), we would not expect \(P’(x)\) to match our “best guess” for \(f’(x)\), especially because of pathologies like Runge’s phenomenon. But since \(P\) and \(f\) pass through the same \(n+1\) points, our best guess for \(f’(x)\) and our best guess for \(P’(x)\) ought to match, and due to the magic of linear algebra, we have enough information to determine — not just guess — the value of \(P’(x)\).

Likewise, one can perform the following profoundly inefficient algorithm to estimate \(\int _{a}^{b} f(x) dx\):

  1. Select a point \(x\in \left[ a, b \right]\).
  2. Use interpolation and numerical differentiation to estimate \(f(x), f’(x), \ldots, f ^{(n)}(x)\).
  3. Integrate the Taylor polynomial of \(f\) centred at \(x\).

Anyone who does this should be thrown off a cliff and violently stampeded by wildebeest. But despite being dead, they’ll still have the satisfaction of having arrived at the same answer you did, regardless of which point they centre their Taylor polynomial at, even if it’s far, far away from the interval \(\left[ a, b \right]\).

This is because if \(P\) is now the interpolating polynomial through these points, our best guesses for \(\int _{a}^{b}P(x) dx\) and \(\int _{a}^{b}f(x) dx\) ought to coincide. But we know enough information about \(P(x)\) to determine all of its values and its derivatives exactly at any point, and moreover, \(P\) coincides with its Taylor polynomial centred at any point.

And so, above all this, linear algebra laughs at all our ridiculous attempts, knowing the whole time that no amout of cleverness or stupidity can overcome the fact that there is one and only one way of divining one dimension of information from finite-dimensional data.