Algorithms: Krylov subspaces, Arnoldi iteration and extreme eigenvalues

The most powerful iterative solvers for linear equations and eigenvalues are based in part on the types of operations performed in the Lanczos algorithm. Once a matrix has been transformed by similarity transformations into a triangular form

    \[O\cdot M\cdot O^{-1}=T\]

the whole game is over; eigenvalues and inverses can be extracted easily (we did so in an earlier post). Unfortunately it is often not possible to get an arbitrary M into triangular form T by similarity transformations, even if many sequential transformations are used. This is discussed in our section on Jordan normal forms. However any matrix M can be put into Hessenberg form H by something much like a similarity transformation.

H is lower-Hessianberg if H_{ij}=0 for all j>i+1, and is upper-Hessianberg if H_{ij}=0 for all i>j+1. We will discuss Householder transformations in another post which provide you with a way of constructing orthogonal transformations that put M into Hessianberg form.

Arnoldi iteration uses Kyrlov subspaces to transform M into an upper-Hessianberg form H. Given an N\times N matrix M, start with a random vector \mathbf{v}_1, and create the vectors \{\mathbf{v}_1, \, M\cdot \mathbf{v}_1, \, M^2\cdot \mathbf{v}_1, \, \cdots, \, M^{m}\cdot \mathbf{v}_1\}. If the set of vectors is subjected to a Gram-Schmidt orthogonalization, the resulting unit vectors form a basis for the Krylov subspace K_m(M, \mathbf{v}_1), which may be a subspace of \mathbb{R}^N because typically m<<N.

Upper-Hessianberg matrices have properties that make inversion and eigenvalue extraction simple. Given an arbitrary, possibly complex M (N\times N) with N potentially very large, the direct approach of finding unitary O such that M=O\cdot H\cdot O^\dagger where H is upper-Hessianberg (full factorization M\cdot O=O\cdot H) might be out of the question, so instead one tries to work with the leftmost m columns where m<<N, then in the factorization

    \[M\cdot O_m=O_{m+1}\cdot H_m\]

the O_m matrix will be N\times m, being of the form O_m=(\mathbf{v}_1, \, \mathbf{v}_2, \, \cdots, \, \mathbf{v}_m) where each vector \mathbf{v}_i is an N-vector (after all on the left M acts on N-vectors as a transformation). On the right-hand side H_m is also non-square, it is the (m+1)\times m
upper left corner of the full H matrix. The whole equation looks like this;

    \[M\cdot (\mathbf{v}_1, \, \mathbf{v}_2, \, \cdots, \, \mathbf{v}_m)=(\mathbf{v}_1, \, \mathbf{v}_2, \, \cdots, \, \mathbf{v}_m, \, \mathbf{v}_{m+1})\cdot \left(\begin{array}{ccc} H_{1,1} & \cdots & H_{1,m}\\  H_{2,1} & \cdots & H_{2,m}\\ & \ddots & \vdots\\0 &\cdots & H_{m+1,m}\end{array}\right)\]

Solve this for the vector \mathbf{v}_{m+1};

    \[\mathbf{v}_{m+1}=(M \mathbf{v}_{m}-H_{1,m} \mathbf{v}_{1}-\cdots H_{m,m} \mathbf{v}_{m}\Big)/H_{m+1,m}\]

Does this look familiar; it says that \mathbf{v}_{m+1} is M\cdot\mathbf{v}_{m} subjected to Gram-Schmidt, made perpendicular to all \mathbf{v}_{k} for k<m+1 then normalized because these are columns of orthogonal or unitary matrices O. This is Arnoldi iteration, we simply Gram-Schmidt the Krylov vectors, starting with some initial \mathbf{v}_1 which we can choose arbitrarily. This whole process is inexpensive because we are working with vectors, and a relatively small collection of them since m<<N. The entire process looks very much like QR decomposition, with Q\rightleftharpoons O and R\rightleftharpoons H, except that the overall cost is lower.

Let’s do this in julia…

M=[1 2 3 4 5;2 3 4 5 6;3 4 5 6 7;4 5 6 7 8;5 6 7 8 9]
5×5 Matrix{Int64}:
 1  2  3  4  5
 2  3  4  5  6
 3  4  5  6  7
 4  5  6  7  8
 5  6  7  8  9


for j in 1:m
       tmp=M*O[:,j]
       for k in 1:j
       H[k,j]=(O[:,k])'*tmp
       tmp=tmp-O[:,k]*H[k,j]
       end
       H[j+1,j]=sqrt(tmp'*tmp)
       O[:,j+1]=tmp/H[j+1,j]
       end

Arnoldi iteration is the process of computing M^k\cdot\mathbf{v} for some random vector \mathbf{v}, for k=0,1,2,\cdots, m, and orthogonalizing these vectors via Gram-Schmidt. So what does this get us?
Note that if we remove the last row of H_m we get a square matrix which is a Hessianberg matrix similar to a chunk of M.

The eigenvalues of H_m sans row m+1 are excellent approximations to the largest eigenvalues of M.

Because H_m is quite small compared to M and has the advantage of being Hessianberg, application of the QR algorithm to H_m will converge rapidly. For example the eigenvalues of the matrix M used in the code above are 26.861, 1.8614, 0,0,0, trim away the last row of H for our m=3 example

HM=H[1:3,:]
3×3 Matrix{Float64}:
 125.0    -54.6472    -19.715
 224.165  -98.01      -35.2146
   0.0      0.140718   -1.99005

using LinearAlgebra
eigvals(HM)
3-element Vector{Float64}:
 -1.8614066163451404
  1.171336260510704e-13
 26.86140661634499

igvals(M)
5-element Vector{Float64}:
 -1.8614066163450715
 -1.4886697450382702e-15
  9.132579802232772e-18
  2.1028713993215595e-16
 26.86140661634507

and we have excellent agreement with the two largest eigenvalues of M. These estimates are called the Ritz values. The Ritz vectors converge rather quickly (as m increases) on the extremal eigenvalues of M, provided that these eigenvalues are isolated (isolated in the sense that the extremal eigenvalues are not too close together). Hessianberg matrices and especially tridiagonal matrices are every easy to do eigenvalue extractions on using the QR algorithm, only a few cycles are needed. A preliminary step in the eigenvalue business is to reduce a matrix to Hessianberg or tridiagonal form, then turn it over to QR. The Arnoldi iteration is very efficient in this preliminary step if one is only interested in extremal eigenvalues.

Home 2.0
error: Content is protected !!