« Previous: Lecture 18 Summary | Next: Lecture 20 Summary » |

Quick review of the "gradient": for a real-valued function f(x) for x in **R**^{m}, the gradient is defined by f(x+δ) = f(x) + δ^{T}∇f + O(δ^{2}). For a real-valued function f(x) of *complex* vectors x, the gradient ∇f can be defined by f(x+δ) = f(x) + Re[δ*∇f] + O(δ^{2}) = f(x) + [δ*∇f + (∇f)*δ]/2 + O(δ^{2}). For the case here of f(x)=x*Ax-x*b-b*x, just expanding f(x+δ) as in the previous lecture gives ∇f/2 = Ax-b = -(b-Ax) = -(residual).

Discussed successive line minimization of f(x), and in particular the steepest-descent choice of d=b-Ax at each step. Explained how this leads to "zig-zag" convergence by a simple two-dimensional example, and in fact the number of steps is proportional to κ(A). We want to improve this by deriving a Krylov-subspace method that minimizes f(x) over *all* previous search directions simultaneously.

Derived the conjugate-gradient method, by explicitly requiring that the n-th step minimize over the whole Krylov subspace (the span of the previous search directions). This gives rise to an orthogonality ("conjugacy", orthogonality through A) condition on the search directions, and can be enforced with Gram-Schmidt on the gradient directions. Then we show that a Lanczos-like miracle occurs: most of the Gram-Schmidt inner products are zero, and we only need to keep the previous search direction.

Discussed convergence of conjugate gradient: useless results like "exact" convergence in m steps (not including roundoff), pessimistic bounds saying that the number of steps goes like the square root of the condition number, and the possibility of superlinear convergence for clustered eigenvalues.