3. Analysing Order of Convergence and Steffensen's Method
≪ 2. Root-Finding Algorithms | Table of Contents | 4. Lagrange Polynomials and Their Problems ≫Last week, we talked about the order of convergence of a sequence, usually one produced by some iterative algorithm. Let’s revisit this in more detail today, talk about some strategies (i.e. Taylor’s theorem) for approaching related problems, and discuss accelerating convergence.
Before that, I have some comments about the first homework:
Question 3.1 concerned different forms of the quadratic formula and asked which pair of solutions was the most precise. Many people (42%) selected the wrong choice. As a rule of thumb, adding two nearly equal numbers of opposite signs carries a high relative error. Multiplications and divisions typically preserve or amplify relative error, so one must select the formulae that avoid performing such computations.
Question 3.2 asked for the smallest \(n\) such that \(\left\lvert 4\sum _{i=1}^{n} \frac{\left( -1 \right) ^{i+1}}{2i-1} - \pi \right\rvert < 10 ^{-3}\), where \(n=100, 200, 1000, 2000\). The answer was \(n=1000\), though this at first glance does not seem right. Here’s some sample code you might whip up:
The output of this code is 1.0000e-03
. However, one must be careful with rounding in the output: the true value of this absolute difference is just barely smaller than 1.0000e-03
, and it gets rounded up when displayed in MATLAB. Instead, one should use the line
1abs(4 * sum - pi) < 1e-03
This gives an output of 1
(as a logical value). We still gave credit if you selected \(n=2000\), but be aware of this for the future!
Revisiting Order of Convergence
Recall the definition of the order of convergence of a sequence:
Definition 1.
If \(\left\lbrace p_n \right\rbrace\) is a sequence converging to \(p\), we say the sequence has order of convergence \(\alpha \) if the quantity \[\frac{\left\lvert p_n-p \right\rvert}{\left\lvert p _{n-1}-p \right\rvert ^{\alpha }}\] is bounded as \(n\to\infty\). If \(\lambda = \lim _{n\to\infty} \frac{\left\lvert p_n-p \right\rvert}{\left\lvert p _{n-1}-p \right\rvert ^{\alpha }}\) exists, \(\lambda \) is called the asymptotic error constant.
Most often, \(\alpha \) will be a positive integer. Another way of checking that a sequence has order of convergence \(\alpha \) is by checking if \(\left\lvert p_n-p \right\rvert = O\left( \left( p _{n-1}-p \right) ^{\alpha } \right)\), though this hides the asympttic error constant.
Proposition 2.
Suppose \(f\) is an \(m+1\)-times continuously differentiable function with a fixed point \(p_*\). Suppose \(f ^{(i)}\left( p_* \right) = 0\) for each \(i=1,\ldots, m\). Fix any \(p_0\), and define \(p _{n+1} = f\left( p_n \right)\). Suppose \(p_n\to p_*\). Then \(\left\lbrace p_n \right\rbrace\) has order of convergence \(m\) and asymptotic error constant \(\frac{\left\lvert f ^{\left( m+1 \right)}\left( p_* \right)\right\rvert}{\left( m+1 \right)!}\).
Try proving this yourself first!
Proof
You should know how this proof works! This general outline applies to a lot of problems of this flavour, wherein you’re given some sort of iteration scheme to analyse together with some assumptions on the input function. Look for vanishing derivatives, as this allows you to spam Taylor’s theorem.
This process is exactly how one shows that Newton’s method has quadratic order of convergence! If \(f\left( p_* \right)=0\) and \(f’\left( p_* \right)\neq 0\), the function \(g(x) = x - \frac{f\left( x \right)}{f’\left( x \right)}\) has a fixed point at \(p_*\) and satisfies \(g’\left( p_* \right)=0\).
Aitken’s \(\Delta ^2\) Method and Steffensen’s Method
Linear convergence can be excrutiatingly slow, especially in scenarios where one needs a significant amount of precision. This is noticeable when using fixed point iteration on a pathological function: solving the equation \(1.3\cos x = x\) using fixed point iteration with an initial guess of \(x=1.4\) (chosen more or less randomly) takes \(1104\) iterations to find an \(x\) satisfying \(\left\lvert 1.3\cos x - x \right\rvert < 10 ^{-10}\). In contrast, Newton’s method only takes \(4\) iterations.
Aitken’s \(\Delta ^2\) method is a procedure that transforms a linearly converging sequence into a faster-than-linearly converging sequence. The setup is as follows: suppose one has a sequence \(\left\lbrace p_n \right\rbrace\) converging to a limit \(p_*\) with linear order of convergence and asymptotic error constant \(\lambda < 1\).
Heuristically, when \(n\) is sufficiently large, one has that \(\frac{p _{n+1}-p_*}{p_n-p_*}\) is approximately constant. So, upon setting \[\frac{p _{n+2}-p_*}{p _{n+1}-p_*} \approx \frac{p _{n+1}-p_*}{p _{n}-p_*}\] and pretending the signs don’t flip around, one can solve for \(p_*\) to obtain \[p_* \approx p_n - \frac{\left( p _{n+1}-p_n \right)^2}{p _{n+2}-2p _{n+1}+p_n}.\] This motivates the modified sequence \[\hat p_n = p_n - \frac{\left( p _{n+1}-p_n \right)^2}{p _{n+2}-2p _{n+1}+p_n},\] which can be shown to converge to \(p_*\) as well. In fact, one has \(\lim _{n\to\infty} \frac{\hat p_n -p_*}{p_n-p_*} = 0\) as long as the assumptions above are fulfilled! In the unlikely case that the denominator is zero, we’ll set \(\hat p_n = p_n\).
Remark 3.
I find it very difficult to remember the formula directly (and I rather dislike the \(\Delta \) notation that’s somewhat common). I personally find it easier to remember the heuristic, and it helps me avoid small mistakes on exams!
Again, the point is that Aitken’s modified sequence converges quantifiably faster than the original sequence \(\left\lbrace p_n \right\rbrace\), as long as \(\left\lbrace p_n \right\rbrace\) converges linearly and \(\lambda < 1\). When \(\left\lbrace p_n \right\rbrace\) converges faster than linearly, and when \(\lambda \geq 1\), the speed of convergence does not necessarily improve meaningfully.
One simple example is as follows: \(p_n = \frac{1}{n}\) converges to \(0\) linearly with \(\lambda = 1\). After a couple minutes of grinding teeth, Aitken’s \(\Delta ^2\) method produces \(\hat p_n = \frac{1}{2(n+1)}\). In this case, \(\lim _{n\to\infty} \frac{\hat p_n -p_*}{p_n-p_*} = \frac{1}{2}\). This is okay, but in practise this isn’t really meaningful. In fact, the additional pain and suffering of performing Aitken’s transformation likely outweighs this slight improvement.
As stated, Aitken’s method requires an entire sequence \(\left\lbrace p_n \right\rbrace\) as an input before producing an accelerated sequence. It would not make sense to run a slowly converging algorithm for an hour before accelerating it, and Steffensen’s method addresses this practical issue.
Steffensen’s method is what happens when you apply Aitken’s method to fixed point iteration, and it uses the fact that to compute \(\hat p_1\), one only needs to know \(p_1,p_2,p_3\). So, one can run three iterations of the slowly converging algorithm, then compute \(\hat p_1\) using Aitken’s method and pray that this will be closer to the limit than any of \(p_1,p_2,p_3\) are. Then, one uses \(\hat p_1\) to compute two more terms, and we can repeat this game.
For notation, we often write \(p _{1}^{(0)}, p _{2}^{(0)}, p _{3}^{(0)}\) for the first three terms we ever compute. Then, according to Aitken’s method, we have \[p _{1}^{(1)} = p _{1}^{(0)} - \frac{\left( p _{2}^{(0)} - p _{1}^{(0)} \right)^2}{p _{3}^{(0)}-2 p _{2}^{(0)}+ p _{1}^{(0)}},\] and we use \(p _{1}^{(1)}\) to compute \(p _{2}^{(1)}\) and \(p _{3}^{(1)}\). Afterwards, we use these latter three terms to compute \(p _{1}^{(2)}\), and we repeat. The index in the superscript counts the number of times we apply Aitken’s acceleration. Pictorially, \[ \begin{align*} & \underbrace{p _{1}^{(0)}\to p _{2}^{(0)}\to p _{3}^{(0)}} \\ & \hphantom{p _{0}^{(1)}\to} \underbrace{p _{1}^{(1)} \to p _{2}^{(1) }\to p _{3}^{(1)}} \\ & \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \underbrace{p _{1}^{(2)} \to p _{2}^{(2) }\to p _{3}^{(2)}} \\ & \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \underbrace{p _{1}^{(3)} \to p _{2}^{(3) }\to p _{3}^{(3)}} \\ & \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \hphantom{p _{0}^{(1)}\to} \cdots \end{align*} \] Conventionally, the sequence produced by Steffensen’s method is just \(p _{1}^{(0)}, p _{2}^{(0)}, p _{3}^{(0)}, \ldots\) (i.e., only the “lower-left diagonal”).
Something to note is that even if \(\lambda = 1\), it’s still possible for Steffensen’s method or the Aitken \(\Delta ^2\) method to accelerate convergence in the way we described earlier. For instance, consider using fixed point iteration to solve the equation \(g(x) = x\). (With \(f(x) = g(x) - x\), this is the same thing as solving \(f(x) = 0\). )
One begins with some initial guess \(p_0\), followed by \(p_1 = g\left( p_0 \right)\) and \(p_2 = g\left( p_1 \right)=g\left( g\left( p_0 \right) \right)\). Putting these into Aitken’s method yields the first term \[\hat p_0 = p_0 - \frac{\left( g\left( p_0 \right)-p_0 \right)^2}{g\left( g\left( p_0 \right) \right) - 2 g\left( p_0 \right) + p_0}.\] We can use the approximation \(g\left( g\left( p_0 \right) \right) - g\left( p_0 \right) \approx g’\left( p_0 \right)\left( g\left( p_0 \right)-p_0 \right)\) using a first-order Taylor expansion centred at \(g\left( p_0 \right)\). Then, we may express
\[\begin{align*} g\left( g\left( p_0 \right) \right) - 2g\left( p_0 \right) + p_0 &= g\left( g\left( p_0 \right) \right) - g\left( p_0 \right) - \left( g\left( p_0 \right) - p_0 \right) \\ & \approx g’\left( p_0 \right) \left( g\left( p_0 \right) - p_0 \right) - \left( g\left( p_0 \right)-p_0 \right) \\ &= \left(g’\left( p_0 \right) - 1\right) \left( g\left( p_0 \right)-p_0 \right). \end{align*}\] Using this in our earlier expression for \(\hat p_0\) yields \[\hat p_0\approx p_0 - \frac{g\left( p_0 \right)-p_0}{g’\left( p_0 \right)-1} = p_0 - \frac{f\left( p_0 \right)}{f’\left( p_0 \right)}.\] Hey, look at that! Steffensen’s method loosely recovers Newton’s method when applied to fixed point iteration, and it usually produces a much faster order of convergence in comparison.
Implementation Details!
So far, we’ve been talking a lot of theory; let’s switch gears for the remainder and look at some implementation details. Our goal will be to create a graphical representation of several different algorithms to compare them in MATLAB; however, to do so, we’ll need a way to plot multiple datasets on the same axes and distinguish between them.
Let’s start with something much less involved — suppose I want to compare the graphs of \(f_1(x) = \sin(x)\) and \(f_2(x) = \cos(x) + \exp \left( \sin(x) \right)\) as \(x\) ranges over the values \(\left[ 0, 2\pi \right]\). Let’s start by creating a set of points to plot.
If one were to run plot(x, f1);
followed by plot(x, f2);
, we would get two graphs on two separate plots. To change this behaviour, simply run the command hold on;
first. This tells MATLAB to “hold on” to each of the plots instead of discarding them.
To get a legend going, we’ll use the legend
command. It takes in a 1-dimensional vector as an input with labels for each dataset, listed in the order in which they were graphed.
1legend(["sin(x)", "cos(x) + exp(sin(x))"]);
Warning 4. Single Quotes in Arrays
This is one of the scenarios where using single vs. double quotes makes a difference in MATLAB. When putting strings in arrays, always use double quotes! MATLAB interprets single quotes as arrays of individual characters, and it attempts to convert arrays of arrays of characters into arrays by just concatenating them end-to-end.
Note that you can mix and match different types of plots, e.g. line plots and scatter plots. If you no longer wish to plot things on the same axes, simply type hold off;
.
Warning 5. Holding in a Script
If you set hold on
from within one script, this setting will persist even after the script terminates. In particular, if you run a second script without closing the graph from the first script, any plots created in the latter will be added onto the graphs of the first! I would generally recommend setting hold off
at the end of each script when applicable to avoid these headaches.
Sorry about the continuous MATLAB info dumps! I’m learning a lot of new things about MATLAB along with you guys.
Anyways, back to the task at hand: let’s try to numerically solve the equation \(1.3\cos x = x\) using
- fixed point iteration,
- Steffensen’s method, and
- Newton’s method,
each with the starting guess \(p_0 = 1\). Let’s run 100 iterations of each, then create scatter plots of the points \(\left( n, \left\lvert p_n-p_* \right\rvert \right)\) using \(p_* = 0.8540326135361767\).
You might have code from a previous homework assignment (or a current homework assignment) that implements one or more of these algorithms; if that’s the case, feel free to reuse it! Actually, please do reuse it.
For each of these methods, I’ll produce a row vector containing the first 100 elements of the sequences produces by each algorithm; their inputs will vary slightly.
Let’s start with fixed point iteration. g
is the function for which we are solving \(g(x) = x\), p0
is our initial guess, and N
is the number of iterates to compute.
1function [iterates] = fixed_point_iteration(g, p0, N)
2 iterates = zeros(1, N);
3 iterates(1, 1) = p0;
4 for i = 2:N
5 iterates(1, i) = g(iterates(1, i - 1));
6 end % of for loop
7end % of fixed_point_iteration
Steffensen’s method will be a bit more involved. The pseudocode is as follows:
- Initialise the output
iterates
to a \(1\times N\) row vector of zeroes. Set its first entry top0
. - For each
i = 2, ..., N
, repeat steps 3-5. - Set
p0
to thei - 1
-th entry ofiterates
. - Set
p1 = g(p0)
andp2 = g(p1)
. - Compute
i
-th entry ofiterates
according to the formula \(p_0 - \frac{\left( p_1-p_0 \right)^2}{p_2 - 2p_1+p_0}\). If the denominator is zero, set it instead top2
.
In MATLAB code, this function would be as follows:
1function [iterates] = steffensen(g, p0, N)
2 % step 1 - setting things up
3 iterates = zeros(1, N);
4 iterates(1, 1) = p0;
5
6 % step 2 - for loop
7 for i = 2:N
8 % steps 3 and 4 - setting up computation
9 p0 = iterates(1, i - 1);
10 p1 = g(p0);
11 p2 = g(p1);
12
13 % step 5 - computing the next iterate
14 % We compute the denominator this way to retain accuracy!
15 denominator = (p2 - p1) - (p1 - p0);
16 if denominator == 0
17 iterates(1, i) = p2;
18 else
19 iterates(1, i) = p0 - (p1 ^ 2) / denominator;
20 end % of if statement
21 end % of for loop
22end % of steffensen
Last up is Newton’s method. Try to implement this yourself before seeing the code!
Code for Newton's Method
For this function, we need to take the inputs f
, the function that we’re solving \(f(x) = 0\) for, and fp
, the derivative of f
.
Now let’s put everything together and plot them! Try doing this yourself as well before peeking at the code.
Code for plotting
1p_star = 0.8540326135361767;
2
3% fixed point iteration and Steffensen's
4g = @(x) 1.3 * cos(x);
5fixed_point_iterates = abs(fixed_point_iteration(g, 1, 100) - p_star);
6steff_iterates = abs(steffensen(g, 1, 100) - p_star);
7
8% Newton's method
9f = @(x) 1.3 * cos(x) - x;
10fp = @(x) -1.3 * sin(x) - 1;
11newton_iterates = abs(newton(f, fp, 1, 100) - p_star);
12
13% plotting everything as a scatter plot!
14x = 1:100;
15
16hold on;
17scatter(x, fixed_point_iterates);
18scatter(x, steff_iterates);
19scatter(x, newton_iterates);
20legend(["Fixed Point Iteration",
21 "Steffensen's Method",
22 "Newton's Method"]);
23hold off;
One last thing to say — interpreting this graph is quite difficult because of the scale. It’s generally better to use a logarithmic scale on the \(y\)-axis to get a better visual sense of how things are changing. To do this, simply use the command yscale('log')
. (You can use xscale
to adjust the \(x\)-axis as well.) To go back to a linear scale, use yscale('linear')
.
Exercise 6.
Perform 100 iterations of the secant method using the initial guesses \(p_0 = 0\) and \(p_1 = 1\). How does it compare to the other three methods above?
Exercise 7.
Repeat the above example, instead solving the equation \(e^x-x-1=0\) with the four methods above. In addition, implement a modified Steffensen’s method that uses Newton’s method to compute \(p _{1}^{(i)}\) and \(p _{2}^{(i)}\) instead of fixed point iteration. What’s difference?