2. Root-Finding Algorithms
≪ 1. MATLAB Crash Course | Table of Contents | 3. Analysing Order of Convergence and Steffensen's Method ≫In class, you learned about several different methods of solving equations, either using a root-finding algorithm or a fixed point algorithm. These algorithms are essentially equivalent in the sense that they address the same kind of problem (numerically solving an equation), but they differ vastly in their applicability. Specifically, you should have learned about:
- Fixed point iteration.
- The bisection method (rootfinding) and the method of false position.
- Newton’s method and the secant method
We’ll discuss the first and the third points, both from a theoretical and practical perspective, as well as some important concepts pertaining to rates and orders of convergence.
Inline Functions and Functions as Paremeters in MATLAB
Something I neglected to talk about during our discussion last week was the concept of inline functions or anonymous functions. These functions are declared on a single line without all the crazy syntax of creating a function.
The syntax for declaring an anonymous function is:
FUNCTION_NAME = @( INPUTS ) OUTPUT;
You are only allowed to have one output, and it must fit on a single line. For instance, if I wanted to create a function that takes a number as input and returns the value , I would write
f = @(x) exp(3 * x - sin(x));
You can sort of cheat with the “only one output” thing and return a vector or matrix, but this is an easy way to make mistakes if you’re not careful about the order of the entries!
You can have multiple inputs, and these would correspond to functions of many variables. For instance, if I wanted to define the function , I could write
g = @(x, y) x^2 + sin(x * y);
Finally, you can pass functions as inputs to other functions (even other anonymous functions)! This is helpful when you want to run the same algorithm (e.g. fixed point iteration) with multiple different functions. For instance, the following function creates a graph of the input on the domain using points.
Note that there are no outputs in this function, which is also something I neglected to mention last week. This is how MATLAB emulates void
functions from other languages!
Warning 1.
Be very careful around using *
vs. .*
(likewise, ^
vs. .^
and /
vs. ./
) when you define inline functions. It’s common to apply a function to a vector or matrix as in the above example, and using the wrong operator can cause your code to break.
Fixed Point Iteration
Given an equation of the form , we can equivalently solve . In words, finding a root of is the same as finding a fixed point of . One way to do this is to guess a solution , then recursively define for .
Theorem 2.
If is continuously differentiable, is a fixed point of , and , then converges to as if is sufficiently close to .
Let’s write some MATLAB code for this algorithm. For this discussion, we’ll be writing a function that takes (1) a function to iterate, (2) a starting guess , and (3) some number of iterations to perform. We will return a row vector containing the sequence .
1function sequence = fixed_point_iteration(g, p0, N)
2 sequence = zeros(1, N);
3
4 % computes p_1 and store it in the first index
5 sequence(1, 1) = g(p0);
6
7 for i = 2:N
8 sequence(1, i) = g(sequence(1, i - 1));
9 end % of for loop
10end % of function
Something to point out is that this method is not guaranteed to work if your function is somewhat nasty.
First, consider the equation , which has the solution . Equivalently, we would be solving the equation . So, we could attempt to solve this by running fixed point iteration on the function .
Exercise 3.
Write MATLAB code that runs 50 iterations of the fixed point iteration method on the function with the following initial guesses:
- .
- .
- .
- .
What happens?
You should find that in all four cases, the sequence blows up to . This makes us moderately depressed.
Of course, it’s better to run fixed point iteration on the equation . This does not have the convergence issues that the above example does. Although somewhat contrived, the point is that there are often many choices for which fixed point to look for, and this choice does matter a lot!
A sequence blowing up to is not the only way that fixed point iteration can fail.
Exercise 4.
Consider the equation Run 20 iterations of the fixed point iteration algorithm with and . What happens?
In both of these examples, the primary issue is that the derivative at the fixed point is too big. Formally, this phenomenon can be described by the following exercise:
Exercise 5. (Section 2.2, Exercise 19)
Let be continuously differentiable, and let be a fixed point of . Suppose . Show that for any sufficiently close to , one has . How does this show that fixed point iteration will generally fail under these assumptions?
Newton’s Method and the Secant Method
Let’s start with Newton’s method. Again, one wants to solve , and we need a starting guess . One can use the first-order Taylor approximation
Since we want , we end up with
This is not an exact solution, but it will generally get us closer to where we started. Thus, the algorithm iterates this idea to construct the sequence
Unlike fixed point iteration, the barrier to convergence occurs when close to the solution. In fact, this is more or less the only possible way convergence can fail:
Theorem 6.
If is (twice continuously differentiable), , and , then whenever is sufficiently close to .
The weakness to this method is that one does not always know what is in explicit form, and one may not be able to (efficiently) compute . In this case, we would prefer to use the secant method. This requires two initial guesses and , and it instead uses the recursive definition It’s rather difficult to pin down the rate of convergence of this method; generally, it’s slightly poorer than Newton’s method.
Exercise 7. (Section 2.3, Exercise 30)
The iteration method for the secant method can be expressed as Explain why this iteration equation is likely to be less accurate than the one given above in general.
You may have heard in lecture or read in the textbook that the secant method only needs one function evaluation per iteration while Newton’s method requires one function evaluation and one derivative evaluation. This matters because in some scenarios, the function and/or its derivative can be extremely costly or difficult to compute. For instance:
- If represents some difficult-to-measure quantity, like the depth of a crater on Mars.
- If you have an expensive and/or limited resource and computing drains said resource, such as measuring how people’s blood pressure responds to an expensive pharmaceutical treatment.
- If is computationally difficult to compute, e.g. , where is a matrix and is a row vector of dimension . (This may seem contrived, but such scales are not uncommon in many data-driven applications.)
You should keep these issues in mind when selecting which algorithm to use and when implementing these algorithms.
Order of Convergence and Big-O Notation
Our last topic today is order of convergence; closely related is big-O notation, which is a form of mathematical shorthand that more concisely conveys the statement, “ grows more slowly than ”.
Given two functions and , we say is as (pronounced “big-O of ”) if for all large enough, there exists some constant (independent of ) such that . Likewise, we say is as for some fixed quantity if for all sufficiently close to there exists some constant (independent of ) such that .
Equivalently, if the quantity stays bounded. Intuitively, this means that eventually, the graph of will end up below the graph of after rescaling.
Big-O notation is amazing for making mathematical arguments more concise, as you avoid the hassle of carrying around a bunch of constants that you don’t care about. However, there some common traps to avoid:
Exercise 8. Distributivity of Big-O over Division
Suppose and . Assuming for any , is it true that Prove your answer, or give a counterexample.
Exercise 9.
Suppose . Is ? Is ?
Exercise 10. Transitivity of Big-O
Suppose are functions such that and . Is it true that or that ? Prove it or give a counterexample.
Here is a chain of commonly seen big-O relationships that may be helpful. In each of the following, the big-O is taken as .
- For any real number , .
- Whenever , .
- For any real number and any real number , .
- Whenever , .
- For any real number , .
One way to describe this heirarchy is as follows:
Warning: this only applies when ! These relationships are no longer true if approaches something else (particularly, ).
Now let be a sequence converging to . Recall that the rate of convergence of the sequence is a function such that . Typically in order for this to be meaningful. For instance, the sequence converges to at a rate .
It’s often not helpful to analyse the rate of convergence of rootfinding algorithms, as this requires advanced knowledge of the limit point/solution that’s often unavailable. Instead, one analyses how quickly the sequence (\left\lvert p_n-p \right\rvert) decays from one term to the next.
Definition 11.
A sequence converging to has order of convergence if
Typically will be a whole number. One of the most important tools in analysing the rate or the order of convergence of an algorithm (such as the ones we’ve described today) is Taylor’s formula:
Theorem 12. Taylor's Formula
Suppose is an -times differentiable function. Then, for any fixed choice of , one has
You should think about using this whenever you want to relate some sum or difference of a function’s values to its derivatives.
Exercise 13.
Let . Show that the sequence converges to at a rate of .
Exercise 14.
Determine the order of convergence of the bisection method.
Exercise 15.
Let be a twice continuously differentiable function, and suppose is a root of . Show that Newton’s method has a quadratic () order of convergence if .
Exercise 16. (Section 2.4, Exercise 13)
Let a smooth function, and let be a root of . Let Show that for sufficiently close to , the sequence defined by converges to with order .
You may use without proof that .