7. Gaussian Elimination and Matrix Factorisations
≪ 6. A Note about Divided Differences | Table of ContentsThe main goal of today is to discuss strategies for solving systems of linear equations of the form \[A \mathbf{x} = \mathbf{b},\] where \(\mathbf{x}\in \mathbb{R}^n\) is unknown and \(\mathbf{b}\in \mathbb{R}^n\) is given. There are several strategies that one can employ for this:
- Determine \(A ^{-1}\), then perform the matrix multiplication \(\mathbf{x} = A ^{-1} \mathbf{b} \).
- Perform Gaussian elimination to determine \(\mathbf{x}\).
- Factor \(A=LU\), then solve the systems \(L \mathbf{y} = \mathbf{b} \) and \(U \mathbf{x} = \mathbf{y} \).
Determining \(A ^{-1}\) is the most direct path of attack in symbols. However, using something like Cramer’s Rule is a terrible idea due to its computational complexity. One can solve for \(A ^{-1}\) using a variant of Gaussian elimination. However, not only is this extremely costly (it takes an additional \(O\left( n^3 \right)\) operations compared to the other two options), but one still has \(O \left( n^2 \right)\) operations left to actually compute \(A ^{-1} \mathbf{b} \). In practise, this is almost never the best way forward.
Gaussian elimination still takes \(O\left( n^3 \right)\) operations, but it saves both time and memory space compared to directly computing \(A ^{-1}\). This is often the best choice when trying to solve a system of equations exactly once, though it begins to suffer from computational load if you are solving systems of equations in bulk.
Finally, the \(LU\) factorisation has a larger upfront cost, where one must spend computational energy performing the factorisation (it comes out to \(O\left( n^3 \right)\) operations). However, the two systems of equations are far easier to solve and only take a total of \(O\left( n^2 \right)\) operations.
Exercise 1.
Show that it takes \(n\) more multiplications and \(n\) fewer divisions to compute \(A ^{-1} \mathbf{b} \) compared to solving \(LU \mathbf{x} = \mathbf{b}\).
Remark 2.
Divisions are generally speaking more computationally expensive than multiplications, though ultimately this makes the two methods above differ by an \(O(n)\) runtime. However, you can “cache” the computations: rather than dividing by a number \(c\) many many times, one can compute \(\frac{1}{c}\) in advance and replace all these divisions by \(c\) with multiplications by \(\frac{1}{c}\). This gives LU decompositions an edge over directly inverting a matrix in all aspects!
In addition to making our lives easier when it comes to solving systems of linear equations, LU decompositions make it far easier to compute the determinant of a matrix.
Our goal for the remainder of the class is as follows: get a good understanding of the practical limitations of Gaussian elimination and LU decomposition, as well as how to get around them, then produce implementations and compare the runtime costs of each method.
This introductory spiel avoids the topic of pivoting. One benefit of computing \(A ^{-1}\) is that one does not need to worry about permutation matrices (in either partial or complete pivoting); however, pivoting is a relatively rare occurrence. Implemented well, the additional computation involved is \(O(n)\) in both space and time, and this is usually negligible in the grand scheme of things.
Gaussian Elimination
As you should have discussed in class or read in the textbook, solving a system of \(n\) equations in \(n\) unknowns \(A \mathbf{x} = \mathbf{b} \) via Gaussian elimination takes \(O\left( n^3 \right)\) operations. This is extremely expensive, especially knowing that there are \(O\left( n^2 \right)\) alternatives later.
Let’s do an example in obnoxious generality with \(n=3\). Given a system \(A \mathbf{x} = \mathbf{b} \), one might construct the augmented matrix
\[\left( \begin{array}{ccc|c} a _{11} & a _{12} & a _{13} & b_1 \\ a _{21} & a _{22} & a _{23} & b_2 \\ a _{31} & a _{32} & a _{33} & b_3 \end{array} \right)\]
Zeroing out the leading entries in the second and third rows, one obtains
\[\left( \begin{array}{ccc|c} a _{11} & a _{12} & a _{13} & b_1 \\ 0 & a _{22} - \frac{a _{21}}{a _{11}} a _{12} & a _{23} - \frac{a _{21}}{a _{11}} a _{23} & b_2 - \frac{a _{21}}{ a _{11}} b_1 \\ 0 & a _{32}- \frac{a _{31}}{ a _{11}} a _{12} & a _{33} - \frac{a _{31}}{a _{11}} a _{13} & b_3 - \frac{a _{31}}{a _{11}} b_1 \end{array} \right)\]
Ew, that’s disgusting. Because I am such a generous person, I’ve gone ahead and performed the next step for us:
\[\left( \begin{array}{ccc|c} a _{11} & a _{12} & a _{13} & b_1 \\ 0 & a _{22}- \frac{a _{21}}{a _{11}}a _{12} & a _{23} - \frac{a _{21}}{a _{11}} a _{23} & b_2 - \frac{a _{21}}{ a _{11}} b_1 \\ 0 & 0 & a _{33}- \frac{a _{31}}{a _{11}} a _{13} - \frac{a _{11}a _{32} - a _{31}a _{12}}{a _{11}a _{22}- a _{21}a _{12}} \left( a _{23}- \frac{a _{21}}{a _{11}} a _{13} \right) & \cdots \end{array} \right)\]
I didn’t feel like writing down the very bottom-right corner.
The reason I wrote out these first several steps of Gaussian elimination is because these steps are always the same, no matter what \(\mathbf{b}\) is. This means that if we have to solve \(A \mathbf{x} = \mathbf{b}\) for many, many, many different vectors \(\mathbf{b} \), it doesn’t make sense to compute the reduced left portion of the augmented matrix over and over and over again, especially since this is where the \(O\left( n^3 \right)\) computations are coming from. Rather, it makes way more sense to just “remember” this form of the augmented matrix and “remember” the steps we performed for the right portion.
From here, only another \(O\left( n^2 \right)\) operations are required to finish solving the system of equations, as one needs only perform backwards substitution. Thus, what we are proposing is a “shortcut” to skip straight towards the backwards substitution step without repeating \(O\left( n^3 \right)\) operations.
Division by Zero, Floating Point Error, and Pivoting
If you were paying close attention, you may notice that we are possibly dividing by zero as we go; from a practical perspective, we could be dividing by a very small entry, which could produce a very large error due to finite precision.
One can circumvent both of these issues (to some extent, at least) by swapping the rows of the augmented matrix in such a way that maximises the numbers we are dividing by. This not only avoids division by zero, but it also ensures that we are introducing a minimal amount of floating point error.
In theory and on paper, we express this by using permutation matrices. We say \(P\) is a permutation matrix if (1) all of its entries are zeroes and ones and (2) every column and every row contains exactly one nonzero entry. \(P\) should be thought of as describing the way in which we are shuffling the rows. Rather than solving \(A \mathbf{x} = \mathbf{b} \), one instead solves \(PA \mathbf{x} = P \mathbf{b}\).
We will not go into more detail about permutation matrices. When running the Gaussian elimination algorithm, one typically just swaps around some entries of \(A\) and \(\mathbf{b}\) since it has no bearing on the resulting solution; there is no need to remember what \(P\) is. However, in the setting where we are performing bulk operations and attempting to “cache” some computations, we will have to remember which pivots we’ve performed. In this case, storing a full \(n\times n\) matrix \(P\) and then performing the multiplication \(P \mathbf{b}\) is horrendously inefficient.
We will return to this issue when we actually implement the \(LU\) factorisation with partial pivoting, but think about this as we go!
LU Decompositions as Streamlined Gaussian Elimination
Let’s repeat the above discussion, but from a slightly different perspective. We begin with a general \(3\times 3\) matrix equation \(A \mathbf{x} = \mathbf{b}\). Rather than constructing the augmented matrix and proceeding from there, let’s describe the first step in Gaussian elimination as follows: \[\begin{pmatrix} a _{11} & a _{12} & a _{13} \\ a _{21} & a _{22} & a _{23} \\ a _{31} & a _{32} & a _{33} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ \frac{a _{21}}{ a _{11}} & 1 & 0 \\ \frac{a _{31}}{a _{11}} & 0 & 1 \end{pmatrix} \begin{pmatrix} a _{11} & a _{12} & a _{13} \\ 0 & a _{22} - \frac{a _{21}}{a _{11}} a _{12} & a _{23} - \frac{a _{21}}{a _{11}}a _{13} \\ 0 & a _{32} - \frac{a _{31}}{ a _{11}} a _{12} & a _{33} - \frac{a _{31}}{a _{11}} a _{13} \end{pmatrix}.\] The second matrix in this factorisation is the left portion of the augmented matrix, and the first term is recording the row operations that we performed. From there, you can continue the process, and ultimately you will end up with a matrix decomposition of the form \[A = \begin{pmatrix} 1 & 0 & 0 \\ l _{21} & 1 & 0 \\ l _{31} & l _{32} & 1 \end{pmatrix} \begin{pmatrix} u _{11} & u _{12} & u _{13} \\ 0 & u _{22} & u _{23} \\ 0 & 0 & u _{33} \end{pmatrix}.\] This is called the \(LU\) decomposition, as it factors \(A\) into a lower-triangulare matrix \(L\) and an upper-triangular matrix \(U\). This analysis works for any size matrix, and the resulting upper-triangular component is the augmented matrix in row echelon form!
Using \(A \mathbf{x} = LU \mathbf{x} = \mathbf{b}\), what one is really doing in Gaussian elimination is inverting the \(L\) and then using backwards substitution to directly solve the system \(U \mathbf{x} = L ^{-1} \mathbf{b}\).
At this point, we ought to start thinking about practical concerns. While it’s easy to sort of follow our noses while working this out on paper, it’s rather difficult to implement, especially since we haven’t even thought about permutation matrices yet!
To describe the algorithm, we’ll be doing things recursively. Notice that the very first step of the LU decomposition is relatively straightforward and easy to describe, even with permutation matrices in the mix.
First, we select a permutation matrix \(P\) that swaps exactly two rows of \(A\) in a way that maximises the magnitude of the top-left entry. Then, we may express \(PA\) as a block matrix as follows: \[PA = \left( \begin{array}{c|c} a_1 & \mathbf{v}_1 \\ \hline \mathbf{v}_2 & A’\end{array} \right).\] Here, \(a_1\) is the upper-left entry of \(PA\), \(\mathbf{v} _1\) is the remainder of the top row, \(\mathbf{v}_2\) is the remainder of the first column, and \(A’\) is an \((n-1)\times (n-1)\) matrix.
Then, the first step of the LU decomposition (and in Gaussian elimination) is to zero out the entries in the \(\mathbf{v} _2\) block. This can be very concisely written as \[PA = \left( \begin{array}{c|c} a_1 & \mathbf{v}_1 \\ \hline \mathbf{v}_2 & A’\end{array} \right) = \left( \begin{array}{c|c} 1 & \mathbf 0 \\ \hline \frac{1}{a_1} \cdot \mathbf{v}_2 & I _{n-1}\end{array}\right) \left( \begin{array}{c|c} a_1 & \mathbf{v} _1 \\ \hline \mathbf 0 & A’ - \frac{1}{a_1} \mathbf{v}_1 \mathbf{v}_2 \end{array}\right) .\] Here, \(I _{n-1}\) is the \(n-1\)-dimensional identity matrix and the bolded zeroes represent column or row vectors filled with an appropriate number of zeroes. Note here that \(\mathbf{v}_1 \mathbf{v}_2\) is a product of a column vector by a row vector in that order, which produces a square matrix!
Now the key trick is as follows: we may factor \(A’-\frac{1}{a_1} \mathbf{v}_1 \mathbf{v}_2\) as \[P’\left( A’ - \frac{1}{a_1} \mathbf{v}_1 \mathbf{v}_2 \right) = L’ U’,\] where \(P’\) is a permutation matrix, \(L’\) is lower-triangular, and \(U’\) is upper-triangular. Then, after moving the \(P’\) to the other side, this whole thing becomes \[PA = \left( \begin{array}{c|c} 1 & \mathbf 0 \\ \hline \frac{1}{a_1} \cdot \mathbf{v}_2 & I _{n-1}\end{array}\right) \left( \begin{array}{c|c} a_1 & \mathbf{v}_1 \\ \hline \mathbf 0 & \left( P’ \right) ^{-1}L’U’\end{array} \right) = \left( \begin{array}{c|c} a_1 & \mathbf{v}_1 \\ \hline \mathbf{v}_2 & \left(P’\right) ^{-1}L’U’ + \frac{1}{a_1} \mathbf{v}_2 \mathbf{v}_1 \end{array}\right). \] Now you might say, “dude what the heck? This looks worse than before.” And yes, you are almost right.
However, we set \(P’’ = \begin{pmatrix} 1 & \mathbf 0 \\ \mathbf 0 & P’ \end{pmatrix}\), we may express \(\mathbf{v} _2 = \left( P’ \right) ^{-1}P’ \mathbf{v}_2\) and factor out a \(\left(P’\right) ^{-1}\) from this whole thing. In particular, we get \[PA = \begin{pmatrix} 1 & \mathbf 0 \\ \mathbf 0 & \left( P’ \right) ^{-1} \end{pmatrix} \begin{pmatrix} a_1 & \mathbf{v}_1\\ \frac{1}{a_1}P’\mathbf{v}_2 & L’U’ + \left( \frac{1}{a_1} P’ \mathbf{v}_2 \right) \mathbf{v}_1 \end{pmatrix}. \] Multiplying through by \(P’’\) cancels out the additional factor. Moreover, we can factor the second term as \[P’’ P A = \begin{pmatrix} 1 & \mathbf 0 \\ \frac{1}{a_1} P’ \mathbf{v}_2 & L’\end{pmatrix} \begin{pmatrix} a_1 & \mathbf{v}_2 \\ \mathbf 0 & U’\end{pmatrix}.\] We arrive at the decomposition \(P’’ P A = LU\), where \[\begin{align*} L = \begin{pmatrix} 1 & \mathbf 0 \\ \frac{1}{a_1}P’ \mathbf{v}_2 & L’ \end{pmatrix} && \textrm{and} && U = \begin{pmatrix} a_1 & \mathbf{v}_2 \\ \mathbf 0 & U’ \end{pmatrix}. \end{align*}\] Importantly, \(L\) and \(U\) are still lower- and upper-triangular!
This is the algorithm that we will end up implementing. Note that \(P’’ P\) is still a permutation matrix; this reflects how the most that happens downstream is that we are swapping a few rows around.
You can even think of this as an inductive proof of the existence of an \(LU\) decomposition, and the proof happens to be very practical.
Implementing LU Decomposition with Pivoting
The implementation of this algorithm is rather difficult, especially if you want to make things efficient with respect to space. For now, we’ll forgo these optimisations, and I’ll leave it as an exercise for the voracious student (with more details to come later).
One of the things we should be very keen on optimising is the presence of these permutation matrices. Instead of representing a permutation matrix as a matrix of ones and zeros (which is costly to both evaluate and store), we’ll represent a permutation as a row vector (or column vector, it doesn’t matter) \(P = \begin{bmatrix} i_1 & i_2 & \cdots & i_n \end{bmatrix}\), where the numbers \(i_1,\ldots, i_n\) take the values \(1,\ldots, n\) exactly once. The first row of \(PA\) will be the \(i_1\)-th row of \(A\); the \(k\)-th row of \(PA\) will be the \(i_k\)-th row of \(A\), etc. This compressed notation will aide in both space and efficiency.
In addition to these benefits, MATLAB has some really sweet built-in indexing tools to help us out.
MATLAB Trick 3.
If \(A\) is a \(3\times 3\) matrix, then one can type \(A([3 2 1], :)\) to get another \(3\times 3\) matrix. This selects all the elements of \(A\), but in a different order: it makes the third row the first row, leaves the second row alone, and the third row becomes the first. In general, if \(P\) is a permutation “list” as we described above, \(A(P, :)\) will permute the rows of \(A\).
So, our LU decomposition ought to accept a square matrix \(A\) as an input and return three things: \(P\), \(L\), and \(U\), with \(P\) written in the above format. In pseudocode, we have:
- INPUT: \(A\), an \(n\times n\) matrix.
- If \(n=1\), the LU decomposition is trivial.
- Create two \(n\times n\) matrices \(L\) and \(U\), initialised to all zeroes.
- Swap the first row of \(A\) with the row of \(A\) whose leading entry has the largest magnitude. Record this as a permutation \(P\).
- Copy the first row of \(A\) into the first row of \(U\) and the first column of \(A\) into the first column of \(L\). Then, divide the first column of \(L\) by its upper-left entry.
- Compute \(B= A’ - \frac{1}{a _{11}}\mathbf{v}_2 \mathbf{v}_1\) as described in the previous section.
- Compute \(P’ B = L’ U’\), the LU decomposition of \(B\) with pivoting.
- Copy \(L’\) and \(U’\) into the bottom-right blocks of \(L\) and \(U\). Then, un-shuffle the rows of the first column of \(L’\) according to the permutation \(P’\).
- Update the permutation \(P\) using \(P’\).
That’s…a lot. But it’s okay, I promise it goes much smoother than that in MATLAB. Here’s how the code goes:
1function [P, L, U] = LU_decomposition(A)
2 n = length(A);
3 if n == 1
4 P = [1]; L = [1]; U = A;
5 return;
6 end % of if
7
8 % select largest row of A
9 i_max = 1;
10 for i = 2:n
11 if abs(A(i, 1)) > abs(A(i_max, 1))
12 i_max = i;
13 end % of if
14 end % of for
15
16 % initialise P to swap first and i_max row of A.
17 P = 1:n; P(1) = i_max; P(i_max) = 1;
18 A = A(P, :); % swaps the rows!
19
20 % sets up L and U.
21 L = zeros(n); U = zeros(n);
22 U(1, :) = A(1, :);
23 L(:, 1) = A(:, 1) / A(1, 1);
24
25 % sets up B, the inductive LU guy, then decompose
26 B = A(2:n, 2:n) - L(2:n, 1) * U(1, 2:n);
27 [P_prime, L_prime, U_prime] = LU_decomposition(B);
28
29 % now update L, U, and P.
30 L(2:n, 2:n) = L_prime;
31 U(2:n, 2:n) = U_prime;
32
33 P_prime = [1, P_prime + 1];
34 L(:, 1) = L(P_prime, 1);
35 P_copy = P;
36 for i = 1:n
37 P(i) = P_copy(P_prime(i));
38 end % of for
39end % of LU_decomposition
Think about what lines 33-38 are doing!
One of the problems with this code is that it’s rather inefficient: every time we recursively call LU_decomposition
, it creates a copy of the bottom-right chunk of \(A\). For smaller values of \(n\), this is not a big deal, but when \(n\) is moderately large, the amount of memory needed to keep making more and more copies becomes unmanageable. In fact, my computer crashes if I try to run this on a \(1000\times 1000\) matrix with random entries.
Challenge 4. Optimise the above code
Optimise the code to avoid making excessive copies: you should only have one or two copies of \(A\), \(P\), \(L\), and \(U\) at a given time.
Finally, we should apply this LU decomposition technique to solve matrix equations. Let’s write a function that accepts four arguments: \(P\), \(L\), \(U\), and \(\mathbf{b}\). Then, it should return the vector \(\mathbf{x}\) such that \(A \mathbf{x} = \mathbf{b}\), where \(A\) satisfies \(PA=LU\).
In pseudocode, this algorithm is very simple:
- Solve \(L \mathbf{y} = P \mathbf{b}\).
- Solve \(U \mathbf{x} = \mathbf{y}\).
Try writing this in MATLAB yourself. In particular, avoid using the \
operator.
Code
1function x = solve_LU(P, L, U, b)
2 % solve Ly = Pb
3 b = b(P);
4 n = length(b);
5 y = zeros(n, 1);
6
7 % forward substitution!
8 for row = 1:n
9 y(row) = b(row);
10 for col = 1:(row - 1)
11 y(row) = y(row) - L(row, col) * y(col);
12 end % of for (col)
13 end % of for (row)
14
15 % now solve Ux = y with backwards substitution
16 x = zeros(n, 1);
17 for row = n:-1:1
18 x(row) = b(row);
19 for col = n:-1:(row+1)
20 x(row) = x(row) - U(row, col) * x(col);
21 end % of for (col)
22 x(row) = x(row) / U(row, row);
23 end % of for (row)
24end % of solve_LU