Hunter Liu's Website

2. Root-Finding Algorithms

≪ 1. MATLAB Crash Course | Table of Contents | 3. Analysing Order of Convergence and Steffensen's Method ≫

In class, you learned about several different methods of solving equations, either using a root-finding algorithm or a fixed point algorithm. These algorithms are essentially equivalent in the sense that they address the same kind of problem (numerically solving an equation), but they differ vastly in their applicability. Specifically, you should have learned about:

We’ll discuss the first and the third points, both from a theoretical and practical perspective, as well as some important concepts pertaining to rates and orders of convergence.

Inline Functions and Functions as Paremeters in MATLAB

Something I neglected to talk about during our discussion last week was the concept of inline functions or anonymous functions. These functions are declared on a single line without all the crazy syntax of creating a function.

The syntax for declaring an anonymous function is:

FUNCTION_NAME = @( INPUTS ) OUTPUT; 

You are only allowed to have one output, and it must fit on a single line. For instance, if I wanted to create a function ff that takes a number xx as input and returns the value e3xsinxe ^{3x-\sin x}, I would write

f = @(x) exp(3 * x - sin(x)); 

You can sort of cheat with the “only one output” thing and return a vector or matrix, but this is an easy way to make mistakes if you’re not careful about the order of the entries!

You can have multiple inputs, and these would correspond to functions of many variables. For instance, if I wanted to define the function g(x,y)=x2+sin(xy)g(x, y) = x^2 + \sin(xy), I could write

g = @(x, y) x^2 + sin(x * y); 

Finally, you can pass functions as inputs to other functions (even other anonymous functions)! This is helpful when you want to run the same algorithm (e.g. fixed point iteration) with multiple different functions. For instance, the following function creates a graph of the input ff on the domain [x1,x2]\left[ x_1, x_2 \right] using NN points.

1function plot_graph(f, x1, x2, N) 
2    x = linspace(x1, x2, N); 
3    y = f(x); 
4    plot(x, y); 
5end 

Note that there are no outputs in this function, which is also something I neglected to mention last week. This is how MATLAB emulates void functions from other languages!

Warning 1.

Be very careful around using * vs. .* (likewise, ^ vs. .^ and / vs. ./) when you define inline functions. It’s common to apply a function to a vector or matrix as in the above example, and using the wrong operator can cause your code to break.

Fixed Point Iteration

Given an equation of the form f(x)=0f(x)=0, we can equivalently solve g(x)=f(x)+x=xg(x)=f(x)+x = x. In words, finding a root of f(x)f(x) is the same as finding a fixed point of f(x)+xf(x)+x. One way to do this is to guess a solution p0p_0, then recursively define pn+1=g(pn)p _{n+1}= g\left( p_n \right) for n0n\geq 0.

Theorem 2.

If gg is continuously differentiable, xx_* is a fixed point of gg, and g(x)<1\left\lvert g’\left( x_* \right) \right\rvert < 1, then pnp_n converges to xx_* as nn\to\infty if p0p_0 is sufficiently close to xx_*.

Let’s write some MATLAB code for this algorithm. For this discussion, we’ll be writing a function that takes (1) a function gg to iterate, (2) a starting guess p0p_0, and (3) some number of iterations NN to perform. We will return a 1×N1\times N row vector containing the sequence p1,,pNp_1,\ldots, p_N.

 1function sequence = fixed_point_iteration(g, p0, N) 
 2    sequence = zeros(1, N); 
 3
 4    % computes p_1 and store it in the first index 
 5    sequence(1, 1) = g(p0); 
 6
 7    for i = 2:N 
 8        sequence(1, i) = g(sequence(1, i - 1)); 
 9    end % of for loop 
10end % of function 

Something to point out is that this method is not guaranteed to work if your function is somewhat nasty.

First, consider the equation cosx=x\cos x = -x, which has the solution x=0.7390851332x_* = -0.7390851332…. Equivalently, we would be solving the equation cosx+x=0\cos x + x = 0. So, we could attempt to solve this by running fixed point iteration on the function g(x)=cosx+2xg(x) = \cos x + 2x.

Exercise 3.

Write MATLAB code that runs 50 iterations of the fixed point iteration method on the function g(x)=cosx+2xg(x) = \cos x + 2x with the following initial guesses:

  1. p0=0p_0 = 0.
  2. p0=1p_0 = 1.
  3. p0=0.7p_0 = -0.7.
  4. p0=0.73909p_0 = -0.73909.

What happens?

You should find that in all four cases, the sequence pnp_n blows up to \infty. This makes us moderately depressed.

Of course, it’s better to run fixed point iteration on the equation cosx=x-\cos x = x. This does not have the convergence issues that the above example does. Although somewhat contrived, the point is that there are often many choices for which fixed point to look for, and this choice does matter a lot!

A sequence blowing up to \infty is not the only way that fixed point iteration can fail.

Exercise 4.

Consider the equation 1.47665369898x2+1.66944251811x+1.41421356237=0.-1.47665369898x^2 + 1.66944251811x + 1.41421356237 = 0. Run 20 iterations of the fixed point iteration algorithm with g(x)=1.47665369898x2+2.66944251811x+1.41421356237g(x) = -1.47665369898x^2 + 2.66944251811x + 1.41421356237 and p0=0p_0 = 0. What happens?

In both of these examples, the primary issue is that the derivative at the fixed point is too big. Formally, this phenomenon can be described by the following exercise:

Exercise 5. (Section 2.2, Exercise 19)

Let gg be continuously differentiable, and let xx_* be a fixed point of gg. Suppose g(x)>1\left\lvert g’\left( x_* \right) \right\rvert > 1. Show that for any xx sufficiently close to xx_*, one has xx<g(x)x\left\lvert x - x_* \right\rvert < \left\lvert g(x) - x_* \right\rvert. How does this show that fixed point iteration will generally fail under these assumptions?

Newton’s Method and the Secant Method

Let’s start with Newton’s method. Again, one wants to solve f(x)=0f(x) = 0, and we need a starting guess p0p_0. One can use the first-order Taylor approximation f(x)f(p0)+f(p0)(xp0).f(x) \approx f\left( p_0 \right)+f’\left( p_0 \right)\left( x-p_0 \right). Since we want f(x)=0f(x) = 0, we end up with f(p0)+f(p0)(xp0)=0    x=p0f(p0)f(p0).f\left( p_0 \right)+f’\left( p_0 \right)\left( x-p_0 \right) = 0 \implies x = p_0 - \frac{f\left( p_0 \right)}{f’\left( p_0 \right)}.
This is not an exact solution, but it will generally get us closer to where we started. Thus, the algorithm iterates this idea to construct the sequence pn+1=pnf(pn)f(pn).p _{n+1} = p_n - \frac{f\left( p_n \right)}{f’\left( p_n \right)}. Unlike fixed point iteration, the barrier to convergence occurs when f0f’ \approx 0 close to the solution. In fact, this is more or less the only possible way convergence can fail:

Theorem 6.

If ff is C2C^2 (twice continuously differentiable), f(x)=0f\left( x_* \right) = 0, and f(x)0f’\left( x_* \right)\neq 0, then pnxp_n\to x_* whenever p0p_0 is sufficiently close to xx_*.

The weakness to this method is that one does not always know what ff is in explicit form, and one may not be able to (efficiently) compute ff’. In this case, we would prefer to use the secant method. This requires two initial guesses p0p_0 and p1p_1, and it instead uses the recursive definition pn+2=pn+1f(pn+1)(pn+1pn)f(pn+1)f(pn).p _{n+2} = p _{n+1} - \frac{f\left( p _{n+1} \right)\left( p _{n+1}-p _n \right)}{f\left( p _{n+1} \right) - f \left( p_n \right)}. It’s rather difficult to pin down the rate of convergence of this method; generally, it’s slightly poorer than Newton’s method.

Exercise 7. (Section 2.3, Exercise 30)

The iteration method for the secant method can be expressed as pn+2=f(pn+1)pnf(pn)pn+1f(pn+1)f(pn).p _{n+2} = \frac{f\left( p _{n+1} \right) p _{n} - f\left( p_n \right)p _{n+1}}{f \left( p _{n+1} \right) - f\left( p_n \right)}. Explain why this iteration equation is likely to be less accurate than the one given above in general.

You may have heard in lecture or read in the textbook that the secant method only needs one function evaluation per iteration while Newton’s method requires one function evaluation and one derivative evaluation. This matters because in some scenarios, the function and/or its derivative can be extremely costly or difficult to compute. For instance:

You should keep these issues in mind when selecting which algorithm to use and when implementing these algorithms.

Order of Convergence and Big-O Notation

Our last topic today is order of convergence; closely related is big-O notation, which is a form of mathematical shorthand that more concisely conveys the statement, “ff grows more slowly than gg”.

Given two functions ff and gg, we say f(x)f(x) is O(g(x))O\left( g(x) \right) as xx\to\infty (pronounced “big-O of gg”) if for all xx large enough, there exists some constant CC (independent of xx) such that f(x)Cg(x)\left\lvert f(x) \right\rvert \leq C \left\lvert g(x) \right\rvert. Likewise, we say f(x)f(x) is O(g(x))O\left( g(x) \right) as xx0x\to x_0 for some fixed quantity x0x_0 if for all xx sufficiently close to x0x_0 there exists some constant CC (independent of xx) such that f(x)Cg(x)\left\lvert f(x) \right\rvert \leq C \left\lvert g(x) \right\rvert.

Equivalently, f(x)=O(g(x))f(x) = O\left( g(x) \right) if the quantity f(x)g(x)\frac{\left\lvert f(x) \right\rvert}{\left\lvert g(x) \right\rvert} stays bounded. Intuitively, this means that eventually, the graph of ff will end up below the graph of gg after rescaling.

Big-O notation is amazing for making mathematical arguments more concise, as you avoid the hassle of carrying around a bunch of constants that you don’t care about. However, there some common traps to avoid:

Exercise 8. Distributivity of Big-O over Division

Suppose f1(x)=O(g1(x))f_1(x)=O\left( g_1(x) \right) and f2(x)=O(g2(x))f_2(x) = O\left( g_2(x) \right). Assuming f2(x)0f_2(x)\neq 0 for any xx, is it true that f1(x)f2(x)=O(g1(x)g2(x))?\frac{f_1(x)}{f_2(x)} = O\left( \frac{g_1(x)}{g_2(x)} \right) ? Prove your answer, or give a counterexample.

Exercise 9.

Suppose f(x)=O(g(x))f(x) = O\left( g(x) \right). Is 1f(x)=O(1g(x))\frac{1}{f(x)} = O\left( \frac{1}{g(x)} \right)? Is 1g(x)=O(1f(x))\frac{1}{g(x)} = O\left( \frac{1}{f(x)} \right)?

Exercise 10. Transitivity of Big-O

Suppose f1,f2f_1, f_2 are functions such that f1(x)=O(1)f_1(x)=O\left( 1 \right) and f2(x)=O(1)f_2(x)=O\left( 1 \right). Is it true that f1(x)=O(f2(x))f_1(x)=O\left( f_2(x) \right) or that f2(x)=O(f1(x))f_2(x) = O\left( f_1(x) \right)? Prove it or give a counterexample.

Here is a chain of commonly seen big-O relationships that may be helpful. In each of the following, the big-O is taken as xx\to\infty.

One way to describe this heirarchy is as follows: constants<logarithms<polynomials<exponentials<xx.\textrm{constants} < \textrm{logarithms} < \textrm{polynomials} < \textrm{exponentials} < x^x.

Warning: this only applies when xx\to\infty! These relationships are no longer true if xx approaches something else (particularly, x0x\to 0).

Now let pnp_n be a sequence converging to pp. Recall that the rate of convergence of the sequence is a function f(x)f(x) such that pnp=O(f(n))\left\lvert p_n-p \right\rvert = O\left( f(n) \right). Typically limxf(x)=0\lim _{x\to\infty}f(x)=0 in order for this to be meaningful. For instance, the sequence pn=1n+sin(n)p_n=\frac{1}{n+\sin(n)} converges to p=0p=0 at a rate f(n)=1nf(n)=\frac{1}{n}.

It’s often not helpful to analyse the rate of convergence of rootfinding algorithms, as this requires advanced knowledge of the limit point/solution that’s often unavailable. Instead, one analyses how quickly the sequence (\left\lvert p_n-p \right\rvert) decays from one term to the next.

Definition 11.

A sequence pnp_n converging to pp has order of convergence α\alpha if limnpn+1ppnpα<.\lim _{n\to\infty} \frac{\left\lvert p _{n+1}-p \right\rvert}{\left\lvert p_n -p\right\rvert ^ \alpha } < \infty.

Typically α\alpha will be a whole number. One of the most important tools in analysing the rate or the order of convergence of an algorithm (such as the ones we’ve described today) is Taylor’s formula:

Theorem 12. Taylor's Formula

Suppose ff is an nn-times differentiable function. Then, for any fixed choice of x0x_0, one has f(x)=i=0n1f(i)(x0)(xx0)ii!+O((xx0)n).f(x) = \sum _{i=0}^{n-1} f ^{(i)} \left( x_0 \right) \cdot \frac{\left( x-x_0 \right)^i}{i!} + O\left( \left( x-x_0 \right)^n \right).

You should think about using this whenever you want to relate some sum or difference of a function’s values to its derivatives.

Exercise 13.

Let f(x)=ex2f(x)=e ^{x^2}. Show that the sequence an=n2(f(11n)2f(1)+f(1+1n))a_n = n^2 \cdot \left( f\left( 1-\frac{1}{n} \right) -2f(1) + f\left( 1+\frac{1}{n} \right) \right) converges to 6e6e at a rate of 1n\frac{1}{n}.

Exercise 14.

Determine the order of convergence of the bisection method.

Exercise 15.

Let ff be a twice continuously differentiable function, and suppose xx_* is a root of ff. Show that Newton’s method has a quadratic (α=2\alpha = 2) order of convergence if f(x)0f’\left( x_* \right) \neq 0.

Exercise 16. (Section 2.4, Exercise 13)

Let ff a smooth function, and let xx_* be a root of ff. Let g(x)=xf(x)f(x)f’’(x)2f(x)(f(x)f(x))2.g(x) = x - \frac{f(x)}{f’(x)} - \frac{f’’(x)}{2f’(x)} \cdot \left( \frac{f(x)}{f’(x)} \right)^2. Show that for p0p_0 sufficiently close to xx_*, the sequence defined by pn+1=g(pn)p _{n+1}= g\left( p_n \right) converges to xx_* with order α=3\alpha = 3.

You may use without proof that g(x)=g’’(x)=0g’\left( x_* \right)=g’’\left( x_* \right)=0.