The rate of convergence of the GMRES method

(1)

The rate of convergence of the GMRES method

H.A. Van der Vorst Mathematical Institute

University of Utrecht Budapestlaan 6 Utrecht, the Netherlands

•. .and C . Vuik

Department of Technical Mathematics and Computer Science Delft University of Technology

Julianalaan 132 Delft, the Netherlands

April 18, 1991

Abstract

The GMRES method is a well-known iterative method, for the solution of sparse linear systems with a non-symmetric matrix.yl. In this paper we discuss a relation between the GMRES method and the so-called FOM method. Subsequently, we show that the Ritz values are the roots of a polynomial, which is implicitly generated by the FOM method. In combination with the relation of FOM and GMRES this gives insight into the convergence of the GMRES method in dependence of the spectrum of A.

I

(2)

TECHNISCHE UNIVERSITEIT

laboraiorlum voor

ScheepshydrtNiMOhiniCS

Mek»Kwg2.282BCO M

- M L : 01S.7B8873 ' F K OtS'781838 U N I V E R S I T Y UTRECHT

THE RATE OF CONVERGENCE OF THE GMRES METHOD

BY H.A. V a n d e r V o r s t AND C . V u i k Budape<tl**n 6, 3506 TA Utrecht^ DEPARTMENT OF MATHEMATICS P R E P R I N T NR. 654 A p r i l 1991 2 3 m . m

(3)

Note that if A is symmetric then FOM is equivalent to C G (compare [6]: relation (2.3)). If Hk is nonsingular then it is easy to show that z f = Vjtyf, where HkV^ = \\rohei and ej is the first unit vector in R''. However, when Hk is singular, it can be proved that a solution i f does not exist. In Section 2 we shall show that this deficiency of the F O M method can be repaired rather easily.

In G M R E S the approximate solution x f = Xo + ^k with € K''( A] TQ) is such that

• '\\r^h = \\b-Ax^\U= rnin | | r o - A z | l 2 . (2)

As a consequence of (2) we have that r f is orthogonal to AK''{A] TQ), or ± K''(A; Aro). If A is symmetric then G M R E S is equivalent to MINRES [3}.

Defining the matrix Hk € .R^+^^* as

^'= = ( 0^*0 hk+y,k )

it follows that AVk = Vk+iHk- Using this equation it is shown in ([5]: Section 3.1) that i f = Io + VkVk, where y f solves the following least squares problem

ll/3ei - HkV^h = min p ^ i - Hkvh. (3)

with /? = ||ro||2 and d is the first unit vector in In contrast with F O M , the approximations i f obtained with G M R E S exist for all Ar ([5]: Section 3.4).

The Ritz values and the Ritz vectors are defined as follows (compare [6]: Section 2.3 and [1]: • P- 274):

Definition 1 The Ritz values e[''\..., 6^ are the eigenvalues of the matrix Hk = V^AVk (note that V^Vk = Ik). Ifyi^'^R' is a normalized eigenvector of Hk corresponding to then the vector z^ = Viy^*' is called a Ritz vector.

Definition 2 For all the iterative methods to be discussed, it follows that the residual rk at the k-th iteration step is a member of K''(A; r'o), and hence it can be written as a k-th degree polynomial in A, acting on vo^ In connection with this we will speak about the polynomials for method M as, the "M polynomial".

2 A relation between the F O M and the G M R E S residuals

In this section we give an important relation between | | r f Ha and | | r f U J , which is used in Section 4 to derive a relation between the convergence,of these iterative methods and the convergence of the Ritz values. Subsequently, we will show that the singularity problems encountered in F O M need not to be serious. The derived relation will also allow for a meaningful comparison between two classes of iterative methods, one equivalent to FOM and the other equivalent to G M R E S .

(4)

Introduction

In this paper we study the convergence behaviour of the G M R E S method. Our main motivation for this study are the results reported in [2], which suggest a relation between the convergence of the G M R E S iterates and the convergence of the Ritz values to the eigenvalues of the matrix. Inspiration for the proof of this relation comes from [6].

In Section 1 we define the GMRES method together with the related Arnoldi and Full Or-thogonalization Method ( F O M ) and summarize some relevant properties of these methods. We conclude this section with the definition of Ritz values and Ritz vectors. In Section 2 we prove a, for our purpose, crucial relation between the FOM and the G M R E S residuals and state some consequences of this relation. In Section 3 we will show that the, Ritz values are the roots of the "FOM polynomial" but not necessarily the roots ofthe "GMRES polynomial". In Section 4, the results proved in Sections 2 and 3 are shown to lead to a relation between the convergence of G M R E S and F O M , and the convergence of the Ritz values. The results of that section nicely agree with the observations reported in [2]. Finally, in Section 5, we describe some numerical I experiments, which illustrate our theory.

1 Definitions

In this section we describe the FOM and the GMRES method [5], which are iterative methods for solving linear systems with a non-symmetric matrix. Subsequently, we summarize some relevant properties of the methods. Finally, we define Ritz values and RLtz vectors.

Consider the linear system Ax — b with i , 6 e iZ" and with a nonsingular A G R^^'^. The Krylov subspace K''{A;rQ) is defined by K'' = apan{ro,j4ro,-.,i4*"*ro}. In both methods, F O M and G M R E S , Arnoldi's method is used for coniputing an orthonormaJ basis {ui, for the Krylov subspace K''{A;ro). The modified Gram-Schmidt version of Arnoldi's method can be described as follows ([5]:p.857 or [7]: p.5):

1. Start: Choose XQ and compute TQ = b — AXQ and vi = ro/|jro||2, 2. Iterate: For j = 1, ...,k do:

uy+i := Avj

for i = 1, do:

h{j : = vJ^iVi, Vj+i := Vj+i - /i.-jU;, (the non-defined hij are assumed to be zero).

With the n X it matrix.Vjt = [ui,...,Ujt] we have that Hk = V^AVk is the upper k x k Hessenberg matrix whose entries are the scalars hij.

, In the F O M method we construct an approximate solution i f of the form i f = lo + ^f where is an element of A'*^(i4; ro) with the following property

(5)

which (Qit/?ei)fc+i denotes the ^j + l-st component of Qjt;3ei. Likewise, | | r f _ i | | 2 = \{Qk-iPei)k\. With the definition of Qk and /? = ||ro[j2 we obtain | j r f U J = \sk\• • - l ^ i l H r o l b where the 5,- are

given by (5). •

(7) For the F O M method we prove the following result:

L e m m a 2 If Ck ^ 0 then the FOM residual satisfies the following equation l k f | | 2 = K | - - - | ^ l l M 2 / I c * | ,

where s; and Ci are defined.in (5).

P r o o f We start with the relation | | r f = \hk^iM)k\ (see [5]: p.858, (2)). To obtain the ( y f )* we note that HkVk =

f^^i-Multiplication with Qk-i gives

/

Rk-i

Pk J

Since Rk-i 6 R''^^^ ^) is upper triangular the last equation is pkiyk)k = Sk-i • •-SillroHj. It foUows from (5) and Ck 0 that pit # 0 so | | r f Ijj = \sk-i\• • • \si\\\ro\\2hk+i,k/\pk\..

Using (5) again we obtain | | r f ||2 = •••|3i||jro||2/|c<:[, which proves the lemma. • C o r o l l a r y 1 If Ck 0 then the FOM and ihe GMRES residual satisfy the following equation

. \\rfh=\ck\\lr^h (8)

R e m a r k 2.4

Note that the relation in Corollary 1 is similar to the relation between the C G and the MINRES residuals ([3]: p.626, (7.5)).

Lemma 2 and CoroUary 1 have the foUowing consequences:

- It foUows from the proof of Lemma 2 that y f exists if and only if Ck 0. However, if Cj = 0 for a certain j then the method can be continued because the computation of Hk and Vk can be continued and as soon as Cjt 7^ 0 for > j, j/f and thus i f can be constructed. In other words, not aU intermediate xf need to be defined or even to exist, but if one takes care for such situations then an unnecessary breakdown of the method can be avoided. When y f can be formed then the numerical stability óf F O M is about the same as the numerical stabiHty of G M R E S .

- Comparing the G M R E S and the FOM residual we first note that )| r f | | 2 < | | r f II2 for aU k. However, if the reduction of | | r f |f2 is-considerable in step k of G M R E S , which means that I -s;i: | < 1, then II r f H2 and |] r f j|2 are about the same, whereas in the special case that i f = I it foUows that Sk'= 0, which impUes i f = i . Note that, although || r f ||2<1| r f ||2 for every fc, it might be attractive to compute i f (note that the computational costs for i f from the G M R E S results is negligible) because it may be that || i - i f | I 2 < | | i - i f |[2 which is the case when A is symmetric and positive definite.

(6)

In ([3]: p. 626, relation (7.5)) it is shown, for the residuals in the method MINRES, that l l r f Ib = \ck\\\rn> where \sk\ = Wr^r^-^h and si + 4 = 1. '

This relation between and r f and the equivalence of FOM with C G and G M R E S with MINRES when A is symmetric, suggests a similar relation between r f and r f for non-symmetric A.

For the derivation of this relation we will need the following lemma [5]. We include the proof of this lemma because it plays a role in the remainder of this section.

L e m m a 1 ([5]: p. 862, Proposition 1)

The GMRES residual satisfies the following equation

• k i l l k o l b , (4) where the s,- are defined in (5) below.

P r o o f To solve the least squares problem (3) we factorize Hk into Q^Rk using Givens rotations, where Qjt G iZ(^+^)^('=+^),QfjQ* = and Rk G is an upper triangular matrix. The matrix Qk is taken as Qk = F i •••Fk, where the matrix Fj G is the foUowing Givens rotation ,

/ I 0 \

Fj =

V 0 1 , which eUminates the element in the { j •\- position of Fj-\Fj-2 •

--FiHk-\

The product Fk-i...FiHk= * * 0 0 pk \ 0 hk+i,k J

(where an asterix stands for a non-zero element) implies that Ck and Sk should be chosen as foUows: •

Cfc = pk/\/pk + f^k+i,k and Sk = -hk+i,k/y/pk + f^k+i,k • Using this factorization the least squares problem (3) is equivalent to

||/3e, - HkV^h = min \\Qk(3ei - RkVh- <6)

Since the last row of Rk is zero, y f is the solution of the linear system with the leading kx k submatrix of Rk as matrix and the first fc components of Qkfici as right-hand side. This combined with the equation jjrf ||2 = \\Pei - HkVkh impUes that | | r f jja = \{QkPei)k+i\, in

(7)

with an induction argument in :. It is easily seen that ( I I ) holds for i = 0 and i = 1. Starting from the right-hand side of ( 1 1 ) it follows by induction that

Hi+h,=V^AVkniei=V^AVkV[A'Vke,.

Since VkV^ is a projection onto K''{A; TQ) and A^VkCi G K^{A; TQ) for i < A: - 1 , we obtain that

VkV^A'Vke^ = A'VkCi, which impHes H'+hi = V^A^+^VkCi.

From V^p^{A)Vkei = 0 and (11) it follows that p f ( F t ) f i i = 0. Multiplication with Hl,i =

0,---,k-l gives that Hip^{Hk)ti = 0 and so p^{IIk)nlei = 0 for i = 0,-•-.it - 1. Since

/»t-i-i,i 7^ 0 for t = I , - - - . f c - 1 the space span{ei, Hkti,-• • iHk'^^i) is equal to jR*. This implies

thatpf(fl^fc) = 0. . • The geometric multiplicity of the Ritz values is given in the following lemma:

L e m m a 4 /ƒ ||rf|| 7^ 0 then the geometric multiplicity of 0^''^ is 1, for i = l,---,k.

Proof: Suppose that there are two independent eigenvectors yi'\y2'^ G R'' of Hk associated with 6^''K Then define u G JR^ as the linear combination v = aiy[^^ + a2y2*^ 'where o i . a j G R are such that the last component of u is 0 and a i 7^ 0 or 02 7^ 0. With HkV = öj^^v and the fact that A.+i,,- 7^ 0, J = 1, • • •, A: — 1 it follows that v = 0, which contradicts the independence of

yi'^^dyi'K, • •

The main result of this section is stated below:

T h e o r e m 1 Suppose the Ritz values for i = A; are eigenvalues of Hk. If Cfc 7^ 0,

where Ck is defined in (5) then flf ^ 0,i = I,-• •,k and

. . . _ {e['^-t)-(6f^-t)

Proof: Note that if 7^ 0 then Hk is the product of two non singular matrices, thus Hk is non singular and p f exists. Since the geometric multiplicity of öj*^ is equal to one it is known that the minimal polynomial of Hk equals a scalar times the characteristic polynomial of Hk- This

together with Pk(IIk) - 0 and p f G 11]^ gives thé desired result. • Note that Theorem 1 is similar to ([6]: Property 2.8).

4 T h e convergence of F O M and G M R E S

In this section we shall prove some relations between the Ritz values and the convergence of the FOM and the G M R E S method. Our main motivation to look for such a relation comes from the observations reported in [2]. For instance, it foUows from ([2]: p.18) that if a Ritz value converges to an eigenvalue then the convergence rate of the G M R E S iterates improves. This resembles the behaviour of the C G method with respect to the convergence of the Ritz values as has been reported in (6].

(8)

- The relation in Corollary 1 implies that a comparison between the convergence of the class of methods equivalent to FOM (FOM, O R T H O R E S ) and that equivalent to G M R E S (GM-R E S , O (GM-R T H O D I (GM-R , G C (GM-R , Axelsson's method) can be made. These classes are identified in ([5]: p. 858, 859).

- Finally, Corollary 1 will be used for deriving the relation between the Ritz values and the convergence of the F O M and the G M R E S iterands,

3 A relation between the Ritz values and the F O M method

For the characterization of polynomials as, e.g., " C G polynomial", see def. 2.

In ([6]: Theorem 5.1) a relation between the Ritz values and the convergence ofthe C G method is proved for A symmetric and positive definite. To prove this they use the property that- the Ritz values are the roots of the " C G polynomial" ([6]: Property 2.8). W-e use a similar property to prove a relation between the Ritz values and the convergence of F O M and GMRES in the case that A is unsymmetric. This section starts with a counter example, which shows that the Ritz values are not necessarily the roots of the "GMRES polynomial**. Subsequently, we prove that the Ritz values are the roots of the "FOM polynomial".

It is easy to verify that

1

'•f = Pki^yo and r f = P'^{A)TQ with p f , p f 6 , (9)

iwhere Ilj^ denotes the class of polynomials of degree at most it and constant, term 1. Note that p f only exists if Ck ^ 0, where Ck is defined in (5). •

In this paragraph we take A symmetric and positive definite. Since F O M is equivalent to C G it follows from ([6]: Property 2.8) that the roots of p f are the Ritz values öj''^ • • •, as described in Definition 1. Suppose that the Ritz values are also the roots of p f then we obtain r f = r f . This together with Lemma 1 and since = 1 impHes = 0 leads to ll h = II '•f Ib = .0. Hence, this counter example shows that the Ritz values can not be the roots of pf when Sk^Q. .

The equivalence between C G and FOM and ([6]: Property 2.8) suggests a relation between the Ritz values and the F O M polynomial. In the remainder of this section we assume that II r f l b 7^ 0, which implies that 0 f o r i = 1, • • •, fc - 1 (use (4) and (5)).

The foUowing lemma says that the projected system is a root of p f .

L e m m a 3 If Ck defined in (5) is noi zero ihen the FOM polynomial is such thai

pl{Hk) = 0. , (10)

Proof: Using (1), Lemma 2 and (9) it foUows that p f ( A ) r o ± ü : * ( / l ; r o ) .

This impUes that V^p^{A)Vkei = 0, where d is the first unit vector in /ï*. Now we show that

(9)

Definition 4 Let r f (or r f ) denote theresidual in the k-th step of GMRES (or FOM) applied

to Ax = b, with starting residual TQ = Xf i.e., r f = p^{A)ro (or r f = p^(^A)ro).

For k such that Hk is nonsingular we define a comparison GMRES (or FOM) process which starts with = p^{A)XY, where 7^ = 0 for j < £ and 7] = f j for j > t. I.e., the comparison processes start with the k-th residual of FOM, after having deleted ihe components corresponding

to Al- The residual ai the ki-th step of the comparison GMRES (or FOM) process is denoted by (or rf;;.

T h e o r e m 2 Let k be such that Hk is nonsingular, ihen for all ki > Q we have ihat

\\rf+kM2<\\XD(')X-%\\r^;h. •

Proof: For the GomparisoB process we have the relation

rZ = 9^:(A)p^(AH.

. We can write rf^' with respect to AT as

rZ = Xqf;{J)pUJW.

Now we define the auxihary polynomial /ifc e as:

(note that p f (i) is defined since Hk is nonsingular).

It follows from the optimality property of G M R E S that

l | r f + . J b < kZ(/^)hk{A)roh

= WXD^'^X-'XD^MiJh'h

< \\XD^')X-%\\rf;\U, (14)

which completes the proof. • With Corollary 1 we have immediately:

C o r o l l a r y 2 If k and ki are such ihat Hk, Hk+ki ^'^^ I^ki '"'^ nonsingular, ihen

We obtain the following bound for \\XD<^)X-%:

\\XD('^X-% < K2iX)\\D('%

(10)

is symmetric and positive definite. In tiiat proof two important properties of the C G method are exploited: an optimality property and the property that the Ritz values are the roots of the " C G polynomial". However, when A is non-symmetric neither the F O M method nor the G M R E S method has both properties. This has led us to base our proofs on the optimality property of GMRES (see ( 2 ) ) together with the fact that the Ritz values are the roots of the F O M polynomial (see Theorem 1).

For each matrix A there exists a nonsingular nxn matrix X which reduces the matrix A to its Jordan normal form, i.e.

X-^AX = J =

\

J2

\ Jr.) where each of the Tij X nj submatrices Jj has the form

/ Xj 1 \ Jj =

V

1

We assume that the Jj have been ordered so that if A, = A< for s > t then A,- = A( for all t <i < s, i.e., blocks with equal A have been grouped together.

Furthermore we assume that if Xt = Xt+t then nt > nj+i, i.e., blocks with equal A have been ordered in descending size. Note that these orderings can be realized by snnple symmetric permutations to A.

Occasionally we will write a vector with respect to the basis represented by the columns of X, and this representation will be partitioned" correspondingly to the partition of J , in particular ro = X 7 , with 7 = ( 7 1 , . . . , 7m)^.

Definition 3 I f l is such that Ai = A^ andXt+i Xe, then the blockdiagonal matrix D^^), with Uj X Hj blocks along its diagonal corresponding to the blocksizes of J, is defined by

D 0,

i f j < l

Furthermore we define a blockdiagonal matrix D^^) with a blockstructure corresponding to J as Df^ = 9Jt, {J}), forj = l,..., m, and qk, € H J , .

It is readily verified that D^^^ and D<2) commute.

Theorem 2 relates the residuals in a certain phase of GMRES to the residuals of a comparison process in which the influence of Ai has been removed. This comparison process is defined as follows.

(11)

For existence and uniqueness of tliis polynomial we refer to ([4]: p. 115). Since qihk E I l j ^ , we obtain \\rtth < MA)hk{A)roh = WXD.D^X-'X I p f (Al) V 0 0 Pf(An) < K2(X)F,£(^)||rf ||2 = K 2 ( X ) F , £ < 0 ^ | | r f

which proves the theorem. • Theorem 3 can be generalized as follows (compaxe (6}: Theorem 5.1]):

T h e o r e m 4 Lti A ^ he diagonalizable and let k be such that Hk is nonsingular. Let denote a set of r different eigenvalues of A, and A= = {A,|A,- 5^ Xj for all Xj € A^}. f^'-'') is defined as = min^gni max |g(A,)| and

Then

Proof: The proof follows the lines set forth by the proof of Theorem 3. The only dilferences are in the choice of hk and qe. Here they are chosen as

= n -jrVjt—^P^ii) and qt G Tl} is such that = max \q({Xj)\.

•

By Corollary 1 we obtain the following residt for F O M :

C o r o l l a r y 3 Let k and I be such that Hk and Hk+t are nonsingular, then

5 Numerical experiments

In this section we give some numerical experiments to illustrate the theory. First of all we compare the F O M and the G M R E S method. This comparison shows a good agreement with the theory given in Section 2. Subsequently, an example is given where the G M R E S method is superlinear convergent. The theory of Section 4 is used to understand this convergence behaviour. Finally, we state an example where the Ritz values do not converge. In this example the G M R E S method is only linear convergent, which corresponds with the results given in

(12)

f

In the derivation of this bound we liave used that {al — Jj)~^ and (/?/ - J j ) commute, and, furthermore, that \\{al - Jj)-% < \\{al - Jj)-^^ (see [1]).

When A is diagonalizable, i.e., n,- = 1, for aU i, then this rather unpleasant expression reduces to

\ \ X D ( ' ) X - % < K2iX) (16) This expression is quite similar' as the expression derived in [6] for the C G process. In [6] the factor K2(X) vanishes, since for symmetric A we have that X is orthogonal.

When there is a Jordan block with Ai of diinension ni 1 then we have to wait for a phase in the G M R E S process in which there are ni Ritz values close to Aj. From then on the factor i||XZ'(^)X~^||2 is bounded by K2{X) times a modest constant, and we may expect that the

method will further converge about as fast as for a problem in with Aj is absent. This is in quite ,good agreement with an experiment in ([2]: p.22 and Fig.23).

Note that Theorem 2 gives a relation between G M R E S in a certain phase, and a comparison process based upon F O M . Therefore the theorem is only of practical value in situations where

¥k\\

\ck\ is not too far from 1, since ||rf||2 = TT-ilt'-fc Ik, and, in fact, we wish to compare the

continued G M R E S process with a process starting with a slightly modified G M R E S residual. The assumption about |cjt| does not put a too severe restriction on the applicabihty of the theorem, since it holds as soon as there is a noticeable reduction in the norms of the residuals in G M R E S (cf. Lemma 1).

Theorem 2 may also be used in combination with convergence estimates for the G M R E S inethod, and then it is a powerful tool to analyse the actual convergence behaviour of G M R E S . Such convergence estimates for G M R E S process are given in ([5]: Section 314). Note that, with CoroUary 1 and CoroUary 2 this procedure can also .be foUowed for the FOM process.

However, straight-forward application of Theorem 2 with, for instance, ([5]: p. 866, Proposition f^) leads to a bound for the continued GMRES process which contains the factor K2{X)^ instead

of the n2{X). The foUowing theorem does not have this disadvantage and it also relates the residuals of the continued GMRES process with the residuals of a related G M R E S process. For the sake of ease this theorem has been formulated for the situation that all nj = 1, i.e., that A is diagonalizable. The extension to the Jordan form case is rather straight-forward.

T h e o r e m 3 Let A^ .R*"*" be diagonalizable. Lei k be such thai Hk is nonsingular and let

be defined as follows

^^'^ = ™ n m|jc|g(A.-)|.

(compare [5]: p. 866, Proposition J{) ihen

h U h < ^^'^2{X)e('^\\r^\\2,

where Fk has been defined in equation (16).

Proof: We use hk as in (13) and we define another auxiUary polynomial € Hj as foUows: qt is the polynomial for which

r(0 = max \qi{Xi)\. 10

(13)

k l a 14 15 16 17 18 19 20 3.26 2.77 2.35 1.99 1.71 1.49 1.34 1.23

24 24 7 4 2.6 2.0 1.6 1.4

Table 1: The convergence of the first Ritz value for a = 0

some iterations tiie second Ritz value converges to A3 (see Table 2, where A2 has been taken as

k 20 21 22 23 24 25 26 27 28 29 30 4.19 3.98 3.8 3.66 3.54 3.45 3.38 3.3 3.25 3.2 3.17 Fk,2 9 85 8 4 2.8 2.2 1.9 1.6 1.5 1.4 1.3

Table 2: The convergence of the second Ritz value for a = 0

M = {Al, A 3 } ) . After k = 23 the process converges as if the eigenvalues Ai (as well as A2 = Ai) and A3 are absent.

This explains quite well the superlinear convergence behaviour of GMRES as observed from Figure 5. The conditionnumbers, corresponding to the comparison processes for the respective phases of the G M R E S iteration process, are Ajoo/Ai = lOO.Aioo/Aa = 33^ ,and A100/A4 = 25, respectively. From the reduction factors, displayed in Figure 5,we see that their decrease is, indeed, much larger near fc = 16 than near fc — '23.

In Figure 6 we have plotted the reduction factors for G M R E S applied to the linear system with a = 0.1 and P = 0.9. In this example we talce Ai = 1, A2 = 1.1 and A; = t for t = 3, • • •, 100. Until fc = 24 the convergence behaviour is virtually the same as for GMRES applied to the system with a = 0 and /3 = 0.9. From fc = 24 to fc = 30 we observe only linear convergence. We note that öp^) = 1.107 and 0^^^ = 1.091, thus from fc = 25 the smallest Ritz value is in [Ai, A2]. From fc = 31 until fc = 36 the reduction factor increcises. The second Ritz value is 3.096 for fc = 30 and 2.951 for fc = 31, so in that phase G M R E S "discoveres" a second eigenvalue less than A3 = 3. From fc = 37 the reduction factor decreases again, which is in agreement with the

k 30 31 32 33 34 35 36 37 38 39 40

3.03 2.95 2.87 2.77 2.63 2.46 2.24 1.98 1.74 1.54 1.39 Fk,2 227 112 40 22 13 8 5.4 3.6 2.5 1.9 1.5

Table 3: The convergence of the second Ritz value for a = 0.1 results given in Table 3 (where A2 .= {Ai, A2}).

-In Figure 7 we show the G M R E S residuals for the choice a = 0 and a = 0.1. It appears that the close eigenvalues Aj = 1 and A2 = 1.1 for a = 0.1 has a decelerating effect. However

(14)

Section 4.

The following problems have been taken from ([2]: p.16, 17). The matrix is of the form A =

SBS-^ with A,S,Be We have selected S to be equal to

S = 1 P 1 P 0 0 _P 1 and B = 1 1 + a 0 3 0 100 The system Ax = b is solved for right-hand sides, such that a; = (1,, methods st3u-t with I Q = ( 0 , . . . , 0)^.

, ,1)-^. The iterative

In our first example we choose a = 0.1 and p = 0.9. In Figure 1 we have plotted the reduction factor if rf^^ ||2 / || r f II2 for FOM and in Figure 2 the reduction factor for G M R E S . Note that the graphs in these figures are almost identical. Both figures show bulges near fc = 15 and fc = 35.

In Figure 3 the residuals of FOM and G M R E S are given. Comparison shows that the F O M residual is always larger than the GMRES residual, which is in agreement with Corollary 1. However when the G M R E S method converges fast, for instance for : > 45 then the difference between both residuals is negligible, which is still in nice agreement with Corollary 1.

At the end of Section 2 we have argued the possibility that the F O M error, || i - i f II2, is smaller than the G M R E S error || i - i f ||2- In order to check this we have plotted these errors in Figure 4. Here we see that, for this problem, the F O M error is indeed smaller than the G M R E S error. Furthermore, note that a small residual can be rather misleading: e.g., near fc = 15, where we observe slow convergence, the F O M residual is much larger than the GMRES residual, but the F O M error is much less than the G M R E S error. Thus, for obtaining a small error, F O M may be preferable over G M R E S .

We compare the convergence behaviour for the linear system with a = 0 and P = 0.9 in view of Theorem 2. In this example we choose the following numbering Ai = 1, A2 = 1

and A,- = j , t = 3, • • •, 100, and the Ritz values, which are real, are numbered such that

In Figure 5 the reduction factors for GMRES are shown. From fc = 16 on, G M R E S appears to be superlinear convergent. For this problem the convergence behaviour can be related to the condition number. Since the eigenvalues, except for the first one and the last one, are equidistantly distributed the condition number changes more if we delete a small eigenvalue instead of a large eigenvalue. Fof this reason we restrict our attention to the lower part of the spectrum. Note that the eigenvalue A2 = 1 does not play a role in this example.

The following table shows the convergence of the smallest Ritz value. It appears that Fk,i with A l = { A l } , has a moderate value from fc = 16, and, from the discussion to Theorem 2, this implies that the convergence behaviour is comparable with a process for a system in which the residual has no component in the direction of the eigenvector corresponding to Ai = 1. After

(15)

(16)

I

the number of steps that G M R E S for a = 0.1 lags behind GMRES for a = 0 is rather small (compare [6]: Section 6.7). It is not clear whether this phenomenon also occurs at problems with a more general spectrum.

In our final experiment we take the matrix B as follows:

B = a 0 1 ••.

0

0 1 0 ••- 0 1 a

We choose a = 1.2, /3 = 0, the right-hand side 6 = (1,0, • • •, 0)^ and start vector Xo = 0. It is easily seen that the Hessenberg matrix Hk, obtained by Arnoldi's process, is equal to the k X k upper part of B. So for fc < 99 the Ritz value 9[^^ = 1.2 does not move to one of the eigenvalues Ajb = a -f- ea:p(2fc7ri7l00), fc = 0, • • •, 99 of A. This is in agreement, of course, with our numerical results.

In Figure 8 it can be seen that FOM has a linear convergence behaviour, which means that 'the reduction factors are constant. Applying GMRES the reduction factor changes only in the first iterates. Experiments with other values of 0 and 6 show more or less the same convergence behaviour.

6 Conclusions

jWe have analysed in some detail two known methods, namely FOM and G M R E S , for the iterative solution of a linear system with an unsymmetrict matrix. In practice, GMRES seems to be the more popular method, because with F O M we might encounter singularity problems.

In this paper we give a relation between F O M and G M R E S , which enables us to compare both methods. From this relation it follows that the singularity problems in F O M are not necessarily serious and can be repaired (be it at the cost of some extra programming effort). Furthermore, numerical experiments suggest that the iteration error with FOM may be less than the iteration error with the G M R E S method.

We have also seen that the Ritz values are the roots of the FOM polynomial. This property is exploited for the proof that when a Ritz value has converged, in some modest degree, to an eigenvalue then F O M , as well as G M R E S , converges in the following phase of the iteration process as if the residual has no longer a component in the corresponding eigenspace.

Our analysis helps us to explain fairly well the results obtained from some numerical experiments.

(17)

[2] Huang, H. and Vorst, H.A. van der. Some observations on the convergence behavior of GMRES, Delft University of Technology, Report 89-09, 1989

[3] Paige, C C . and Saunders, M.A., Solution of sparse indefinite systems of linear equations, SIAM J . Num. Anal.-,12, 617-629, 1975

[4] Saad, Y . , Krylov subspace methods for solving large unsymmetric linear systems. Math. Comp.,37, 105-127, 1981

[5] Saad, Y . and Schultz, M.H., GMRES: a generalized minimal residual algorithm for solving non symmetric linear systems, SIAM J . Sci. Stat. Comput., 7, 856-869, 1986

[6] Sluis, A. van der and Vorst, H.A. van der, The rate of convergence of conjugate gradients, Numer. Math., 48 543560, 1986

-[7] Vorst, H.A. van der. The convergence behavior of some iterative solution methods,De\it University of Technology, Report 89-19, 1989

(18)