## Lecture 18: Krylov Methods and the Arnoldi Algorithm

### Summary

Gave simple example of power method, which we already learned. This, however, only keeps the most recent vector Anv and throws away the previous ones. Introduced Krylov subspaces, and the idea of Krylov subspace methods: find the best solution in the whole subspace đŠ_{n} spanned by {b,Ab,âŠ,A^{n-1}b}.

Presented theÂ ArnoldiÂ algorithm. Unlike the book, IÂ startÂ the derivation by viewing it as a modified Gram-Schmidt process, and prove that it is equivalent (in exact arithmetic) to GS on {b,b,Ab,AÂČb,âŠ}, so it is an orthonormal basis for đŠ_{n}. Then we showed that this corresponds to partial Hessenberg factorization: AQ_{n} = Q_{n}H_{n} + h_{(n+1)n}q_{n+1}e_{n}á” where H_{n} is upper-Hessenberg.

Discussed what it means to find the “best” solution in the Krylov subspace đŠ_{n}. Discussed the general principle of Rayleigh-Ritz methods for approximately solving the eigenproblem in a subspace: finding the Ritz vectors/values (= eigenvector/value approximations) with a residual perpendicular to the subspace (a special case of a Galerkin method).

For Hermitian matrices A, I showed that the max/min Ritz values are the maximum/minimum of the Rayleigh quotient in the subspace, via the min-max theorem. In fact, in this case H_{n} is Hermitian as well, so H_{n} is tridiagonal and most of the dot products in the Arnoldi process are zero. Hence Arnoldi reduces to a three-term recurrence, and the Ritz matrix is tridiagonal. This is called theÂ LanczosÂ algorithm.

Noted that n steps of Arnoldi requires Î(mnÂČ) operations and Î(mn) storage. If we only care about the eigenvalues and not the eigenvectors, Lanczos requires Î(mn) operations and Î(m+n) storage. However, this is complicated by rounding problems that we will discuss in the next lecture.

### Further Reading

- Read âLectures 31â34â in the textbook
*Numerical Linear Algebra*. - Online bookÂ
*Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods*by Richard Barrett et al. - Online bookÂ
*Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide*by Zhaojun Bai et al.

## Lecture 19: Arnoldi and Lanczos with Restarting

### Summary

Showed some computational examples (notebook above) of Arnoldi convergence. Discussed how rounding problems cause a loss of orthogonality in Lanczos, leading to “ghost” eigenvalues where extremal eigenvalues re-appear. In Arnoldi, we explicitly store and orthogonalize against all qj vectors, but then another problem arises: this becomes more and more expensive as n increases.

A solution to the loss of orthogonality in Lanczos and the growing computational effort in Arnoldi is restarting schemes, where we go for n steps and then restart with the k “best” eigenvectors. If we restart with k=1Â *every*Â step, then we essentially have the power method, so while restarting makes the convergence worse the algorithm still converges, and converges significantly faster than the power method for k>1.

Explained why restarting properly is nontrivial for k>1: we need to restart in such a way that maintains the Arnoldi (or Lanczos) property AQ_{n} = Q_{n}H_{n} + r_{n}e_{n}á” where H_{n} is upper-Hessenberg, r_{n} is orthogonal to Q_{n}, and e_{n}á” is only nonzero in the last column. In particular, showed that the “obvious” naive restarting algorithm using k Ritz vectorsÂ *almost*Â works, but messes up the e_{n}á” property. See the handout and notebook below.

- Lecture 19 handout: Why Restarting Arnoldi/Lanczos is not Trivial (PDF)
- Lecture 19 notebook:Â Experiments with Arnoldi Iterations

### Further Reading

- See the section onÂ Implicitly Restarted Lanczos Method in
*Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide*by Zhaojun Bai et al.

## Lecture 20: The GMRES Algorithm and Convergence of GMRES and Arnoldi

### Summary

Introduced theÂ GMRESÂ algorithm: compute the basis Q_{n} for đŠ_{n} as in Arnoldi, but then minimize the residual âAx-bâ_{2} for xâđŠ_{n} using this basis. This yields a small (n+1)Ăn least-squares problem involving H_{n}.

Discussed the convergence rate of GMRES and Arnoldi in terms ofÂ polynomialÂ approximations. Following the book closely, showed that the relative errors (the residual norm âAx-Îœxâ or âAx-bâ) can be bounded by minimizing the value p(Î») of a polynomial p(z) evaluated at the eigenvalues, where p has degreeÂ *n*Â after theÂ *n*^{th} iteration. In Arnoldi, the Î»^{n} coefficient of p(Î») is 1, whereas in GMRES the constant coefficient p(0)=1. (There is also a factor of the condition number of the eigenvector matrix in GMRES, so it is favorable for the eigenvectors to be near-orthogonal, i.e. for the matrix to be near-normal, seeÂ normal matrix.)

Using this, we can see that the most favorable situation occurs when the eigenvalues are grouped into a small cluster, or perhaps a few small clusters, since we can then make p(Î») small with a low-degree polynomial that concentrates a few roots in each cluster. This means that Arnoldi/GMRES will achieve small errors in only a few iterations. Moreover we can see that aÂ fewÂ outlying eigenvalues aren’t a problem, since p(z) will quickly attain roots there and effectively eliminate those eigenvalues from the errorâthis quantifies the intuition that Krylov methods “improve the condition number” of the matrix as the iteration proceeds, which is why the condition-number bounds on the error are often pessimistic. Conversely, the worst case will be when the eigenvalues are all spread out uniformly in some sense. Following examples 35.1 and 35.2 in the book, you can actually have two matrices with very similar small condition numbers but very different GMRES convergence rates, if in one case the eigenvalues are clustered and in the other case the eigenvalues are spread out in a circle around the origin (i.e. with clustered magnitudes |Î»| but different phase angles).

Contrasted convergence with Arnoldi/Lanczos. As in the book, showed that Arnoldi also minimizes a polynomial on the eigenvalues, except that in this case the coefficient of theÂ *highest*Â degree term is constrained to be 1. (We proceeded somewhat backwards from the book: the book started with polynomial minimization and derived the Rayleigh-Ritz eigenproblem, whereas we went in the reverse direction.) Showed that this set of polynomials is shift-invariant, which explains why (as we saw experimentally) Arnoldi convergence is identical for A and A+ÎŒI. This isÂ *not*Â true for GMRES, and hence GMRES convergence is not shift-invariantâthis is not surprising, since solving Ax=b and (A+ÎŒI)x=b can be very different problems.

Like Arnoldi/Lanczos, if GMRES does not converge quickly we must generallyÂ restartÂ it, usually with a subspace of dimension 1; restarting GMRES repeatedly after k steps is calledÂ GMRES(k). Unfortunately, unlike Arnoldi for the largest |Î»|, restarted GMRES isÂ *not guaranteed to converge*. If it doesn’t converge, we must do something to speed up convergence: preconditioning (next time).

In many practical cases, unfortunately, the eigenvalues of A areÂ *not*Â mostly clustered, so convergence of GMRES may be slow (and restarted GMRES may not converge at all). The solution to this problem isÂ preconditioning: finding an easy-to-invert M such that M^{-1}A has clustered eigenvalues (and is not nearly defective).

### Assignment

### Further Reading

- Read âLectures 34, 35, and 40â in the textbook
*Numerical Linear Algebra*. - The convergence of GMRES for very non-normal matrices is a complicated subject; see e.g.Â The Convergence of Krylov Subspace Methods for Large Unsymmetric Linear Systems by Zhongxiao Jia.
- Iterative methods for solving Ax=b, GMRES/FOM versus QMR/BiCG (by Jane Cullum) reviews the relationship between GMRES and a similar algorithm called FOM that is more Galerkin-like (analogous to Rayleigh-Ritz rather than least-squares).