Exploiting BiCGstab(?) strategies to induce dimension reduction

(1)

REPORT 09-02

Exploiting BiCGstab(ℓ) strategies to induce dimension

reduction

Gerard L.G. Sleijpen and Martin B. van Gijzen

ISSN 1389-6520

Reports of the Department of Applied Mathematical Analysis Delft 2009

(2)

No part of the Journal may be reproduced, stored in a retrieval system, or transmit-ted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from Department of Applied Mathe-matical Analysis, Delft University of Technology, The Netherlands.

(3)

DIMENSION REDUCTION

GERARD L.G. SLEIJPEN∗ _AND _{MARTIN B. VAN GIJZEN}†

Abstract. IDR(s) [9] and BiCGstab(ℓ) [5] are two of the most efficient short recurrence iterative methods for solving large nonsymmetric linear systems of equations. Which of the two is best depends on the specific problem class. In this paper we derive a new method, that we call IDRstab, that that combines the strengths of IDR(s) and BiCGstab(ℓ). To derive IDRstab we extend the results that we reported on in [7] where we considered Bi-CGSTAB as an IDR method. We will analyse the structure of IDR in detail and introduce the new concept of the Sonneveld space. Through numerical experiments we will show that IDRstab can outperform both IDR(s) and BiCGstab(ℓ).

Keywords: Bi-CGSTAB, Bi-CG, iterative linear solvers, Krylov subspace methods, IDR. AMS(MOS) subject classification: 65F15, 65N25.

1. Introduction. Bi-CGSTAB [10] is the most popular short-recurrence method for solving large nonsymmetric systems of equations

Ax = b

of dimension N × N . Bi-CGSTAB can be seen as a combination of two different techniques: Bi-CG on the one hand and a one step minimal residual method (or GM-RES(1)) on the other hand. The Bi-CG part of the algorithm ensures, at least in exact arithmetic, termination at the exact solution in a finite number of steps. This property causes super-linear convergence during the iterative process. Bi-CG, on the other hand, does not poses an error minimization property, the residual norm may go up during the process. To introduce some kind of error minimization, Bi-CGSTAB performs a minimal residual norm step in every iteration. This residual minimization step is performed without extra cost, thus using the computational work more effi-ciently than in Bi-CG, resulting in a faster convergence. The combination of these two complementary methods explains to large extend the success of Bi-CGSTAB.

IDR(s) [9] is a new short-recurrence Krylov subspace method for solving large nonsymmetric linear systems. The IDR method generates residuals that are forced to be in subspaces Gj of decreasing dimension. These nested subspaces are related by Gj= (I − ωjA)( eR

⊥

0 ∩ Gj−1), where eR ⊥

0 is the orthogonal complement of the range of a fixed N × s matrix eR0, and the ωj’s are non-zero scalars. This working principle is different from the more conventional Krylov subspace methods like Bi-CGSTAB that construct residuals in the growing Krylov subspace

Kk(A, r0) = Span

r0, Ar0, A2r0, · · · , Ak−1r0

The numerical examples that are described in [9] show that IDR(s), with s > 1 and not too big (as s = 2, 4, or 6) is quite competitive, and outperforms Bi-CGSTAB for important classes of problems.

Although the idea behind IDR(s) is quite different from that of Bi-CGSTAB, the two are mathematically closely related, specifically, IDR(1) is mathematically equiv-alent with Bi-CGSTAB, and IDR(s) with s > 1 is related (but not mathematically

∗_{Mathematical Institute, Utrecht University, P.O. Box 80010, 3508 TA Utrecht, the Netherlands,} e-mail: sleijpen@math.uu.nl

†_{Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD} Delft, The Netherlands, e-mail: M.B.vanGijzen@tudelft.nl

(4)

equivalent) to the Bi-CGSTAB generalisation ML(k)BiCGSTAB [13] of Yeung and Chan. This latter method is a block version of Bi-CGSTAB for systems with one right-hand-side vector, which uses multiple shadow residuals. We refer to [9] for the details.

Like Bi-CGSTAB, IDR(s) lacks a global error minimization property. In order to smoothen the convergence, the ωj’s are chosen such that the next residual is minimized in norm. As is explained in [9], a new ω can be chosen every s + 1-st step (matrix-vector multiplication). Note that for s = 1 this is exactly the same as in Bi-CGSTAB, where a new ω is selected after every second matrix-vector multiplication such that the next residual is minimized in norm.

For problems with a strongly nonsymmetric matrix and in which all the data are real, a linear minimal residual step does not work well. It is easy to see that the ω’s are zero for skew-symmetric matrices and must be small for nearly skew-symmetric matrices, which will lead to (near) breakdown of the algorithm. Consequently, Bi-CGSTAB does not work well for this class of problems. In order to overcome this problem, Gutknecht proposed to combine Bi-CG with quadratic stabilization poly-nomials (i.e. with GMRES(2)), which led to BiCGstab2 [2]. Sleijpen and Fokkema combined Bi-CG with stabilization polynomials of degree ℓ, which yields BiCGstab(ℓ), leading not only to a more effective residual minimization (order ℓ minimization), but also to more stability in the computation of the Bi-CG coefficients of the process [6]. The linear minimal residual step in IDR(s) also makes this method less suited for problems with a strongly non-symmetric matrix. Example 3 in [9] presents such a problem. The example shows a very poor performance for IDR(1). The performance of the method can be improved significantly by choosing s larger than 1. However, BiCGstab(ℓ), with ℓ > 1, shows a vastly superior performance, and outperforms IDR(s) irrespective of the choice of s.

It is therefore a natural idea to combine IDR(s) and BiCGstab(ℓ) to a method that combines the strong points of both. In order to derive such a method, we will extend the results that we reported on in [7]. In this paper, we consider the mathematical relation between Bi-CGSTAB and IDR and propose a block version of CGSTAB that is mathematically an IDR-method. As we will show, this block Bi-CGSTAB variant can be generalised in a straightforward way to include higher order stabilization polynomials. We will call the resulting method IDRstab. The method unifies IDR(s) and BiCGstab(ℓ): by choosing ℓ = 1 it is mathematically equivalent to IDR(s), and by choosing s = 1 to BiCGstab(ℓ).

To answer the question whether the combination of IDR(s) and BiCGstab(ℓ) yields a method that is more efficient than either IDR(s) or BiCGstab(ℓ) we will present numerical experiments on test problems with different characteristics. We will show that IDRstab with s and ℓ larger than 1 can be significantly faster than both IDR(s) and BiCGstab(ℓ).

This paper is organised as follows: The next section reviews the IDR principle, and also provides new insights. In particular it introduces the concept of the Sonne-veld subspace. Section 3 explains how BiCGstab(ℓ) is derived from BiCG. Section 4 presents an IDR variant that allows for easy incorporation of ℓ-th degree stabilization polynomials. In order to derive this variant we give insightful schemes on how the different vector quantities in IDR(s) are related. Section 5 brings together the results of the previous sections to derive the new iterative solver IDRstab. The excellent performance of the method is illustrated with numerical experiments in Section 6.

(5)

Some remarks on the notation.

Notation 1.1. If eR is an n × s matrix, and v is a vector, then we put v ⊥ eR if v is orthogonal to all column vectors of eR and we say that v is orthogonal to eR. The linear subspace of all vectors v that are orthogonal to eR is denoted by eR⊥.

Notation 1.2. When we mention the number of iterations, or the number of steps, we always refer to the number of matrix-vector multiplications (MVs), where each step or iteration always corresponds to one (one-dimensional) MV. A multipli-cation AV, with V an n × s matrix is counted as s MVs.

Notation 1.3. Updates of the form v − C~β will play a crucial role in this paper. Here, v is an n-vector and C is an n × s matrix. When considering such updates, both v and C are available. Often, the s-vector ~β is determined by an orthogonality requirement v − C~β ⊥ eR where eR is some given n × s matrix. For ease of notation, we will simply put

“v − C~β ⊥ eR” if we mean “Let ~β be such that v − C~β ⊥ eR”.

Note that, with σ ≡ eR∗C, ~β can be computed as ~β = σ−1_{( e}_R∗_{v) (or with a more} stable variant as repeated or modified). The operator I−Cσ−1_R_e∗_{is a skew projection} onto the orthogonal complement of eR.

Notation 1.4. We wil follow a number of MATLAB conventions: — if W = [w1, . . . , ws], then W(:,1:q)≡ [w1, . . . , wq] and W(:,q)≡ wq (q ≤ s); — [U0; . . . ; Uj] ≡ [U T 0, . . . , U T j] T .

2. The IDR principle. The IDR method exploits the following result from which we learn how to construct an strictly increasing sequence of nested linear sub-spaces. For a proof we refer to [9, 7].

Theorem 2.1. Let eR0= [er1, . . . , ers] be an n × s matrix and let µk be a sequence in C. WithG0≡ Cn, define

Gk+1≡ (µk+1I − A)(Gk∩ eR ⊥

0) (k = 0, 1, . . .).

If eR⊥0 does not contain an eigenvector of A, then, for all k = 0, 1, . . ., we have that 1) Gk+1⊂ Gk and 2) dim Gk+1< dim Gk unlessGk = {0}.

IDR updates an approximation with residual in Gk to an approximation with residual in Gk+1. In view of the above theorem, we know that the residual eventually will be zero and the exact solution will be detected. From the statement that the residual is contained in Gk, we can not conclude anything about the size of the norm unless Gk = {0}. In practice, however, the residual norms tend to converge fast (towards zero). The convergence is (often much) faster for larger s.

To update the residual in Gk to a residual in Gk+1 requires s + 1 MVs. The following result (Th. 2.3) relates the ‘IDR’ spaces Gk to Krylov subspaces and it also explains why s + 1 MVs are required. For a proof, we refer to [7]. We first introduce some notation and terminology.

Definition 2.2. For a polynomial P and an n × s matrix eR0consider the linear subspace

(6)

Here, k is the (exact) degree of the polynomial P and Kk(A∗, eR0) is the kth Krylov subspace (or block Krylov subspace, if one wish) generated by A∗ _{and e}_R

0: Kk(A∗, eR0) = {

X j<k

(A∗)jRe0~γj | ~γj∈ Cs}.

If there is no ambiguity, we will write S(P ) or SP instead of SP(A, eR0). For reasons explained in Note 2.4 we cal SP the (P th) Sonneveld subspace (generated by A and

e

R0), k is the order of the Sonneveld subspace.

Theorem 2.3. With eR0,(µj), and Gk be as in Theorem 2.1, and Pk defined by Pk(λ) ≡Qkj=1(µj− λ) (λ ∈ C), we have the

Gk= S(Pk, A, eR0). (2.1)

To increase the order of the Sonneveld space, the order of the ‘shadow’ Krylov subspace Kk(A∗, eR0) has to be increased by one, which requires s MVs. An additional MV is needed to increase the degree of the polynomial Pk.

The theorem also shows that, any hybrid Bi-CG method can be viewed as an IDR method: hybrid Bi-CG methods produce residuals that can be expressed as Pk(A)rBiCGk , where r

BiCG

k ⊥ Kk(A∗, er0) is the kth Bi-CG residual (apply the theorem with s = 1 and eR0= [er0]), and some polynomial Pkdetermined by the specific hybrid variant (for more details and references, see §3). In some sense, the polynomials Pk define the hybrid Bi-CG method. The selection Pk(λ) = Qj≤k(1 − ωjλ) of Bi-CGSTAB and of BiCGstab(ℓ) will be of special interest to us in this paper.

Alg. 2.1 displays a simple, but elegant, implementation of the IDR method. In practice, additional stability steps may be required, for instance, to cope with nearly singular σ (cf. [9, 7]). Note that this implementation produces the ‘primary’ residuals mentioned above (the ones that arrive first in the S(Pk)), but also 2s + 1 ‘secondary’ residuals (denoted by v and r in Alg. 2.1): two residuals are computed at every MV. In this algorithm, ωk= 1/µk has been selected to minimize the norm of one residual update every s + 1 MV: to be more precise, the primary residual r is computed as r = v − ωAv with ω = minargωkv − ωAvk. This approach of selecting ωk follows the CGSTAB strategy: in fact, if s = 1, the primary residuals coincide with Bi-CGSTAB residuals (that is, the residuals at the the end of a Bi-Bi-CGSTAB cycle). Other selections for ωk (and µk) are possible and can be more effective, as we will learn in this paper.

Note 2.4. Given the fact that (sub)spaces are named after mathematicians, we feel that the name ‘Sonneveld subspace’ is appropriate. Peter Sonneveld introduced the IDR concept of Theorem 2.1 in [12]. In this paper from 1980, hij also gave an algorithm that is equivalent to Bi-CGSTAB. The relation to hybrid Bi-CG expressed in Theorem 2.3, and the other principle in hybrid Bi-CG (see the summary in §3 of hybrid Bi-CG methods), that any polynomial of appropriate degree can be used to avoid multiplication by A∗ _{was introduced in the CGS paper [8] of 1989. Later} more stable and effective variants of Bi-CGSTAB ([10, 2, 5, . . . ] and CGS [1, 14, . . . ] were published and made hybrid Bi-CG methods popular solvers for non-symmetric systems.

3. From Bi-CG to BiCGstab(ℓ). In this section, we recall the derivation of BiCGstab(ℓ) in order to clarify the derivation of our IDR variants in §§4–5. It also

(7)

Select an x0.

Select n × s matrices eR0 and U Compute S = AU x = x0, r = b − Ax i = 1, j = 0 while krk > tol σ = eR∗0S, ~ρ = eR ∗ 0r, ~α = σ−1~ρ v = r − S~α, c = Av If j = 0, ω ← c∗_v/c∗_c, U(:,i)← U~α + ωv, x ← x + U(:,i) r1← v − ωc, S(:,i)= r − r1, r ← r1 i ← i + 1, if i > s, i = 1 j ← j + 1, if j > s, j = 0 end while

Alg. 2.1: IDR(s). The ith column S(:,i)and U(:,i)of S and U are replaced in the loop by the newly computed vectors. The initial matrices U and eR0have to be such that eR

∗

0S is non-singular.

explains how the IDR variants relate to BiCGstab (ℓ): our IDR variant can be viewed as BiCGstab(ℓ) with an s dimensional initial shadow residual eR0 instead of a one dimensional one er0.

3.1. Bi-CG. Bi-CG uses coupled two term recurrences to produce a residual rk at step k that is orthogonal to the kth ‘shadow’ Krylov subspace Kk(A∗, er0):

uk= rk− βk−1uk−1, rk+1= rk− αkAuk. (3.1)

It exploits the fact that to achieve this orthogonality, it suffices to put the vectors Auk and rk+1 orthogonal to one vector only: with er0, . . . , erj spanning Kj+1(A∗, er0) for all j ≤ k, the coefficients βk−1 and αk are such that Auk⊥ erk−1 and rk+1 ⊥ erk. The approximate solutions xk+1 are obtained by updating xk by +αkuk: here, as elsewhere in this paper (and in hybrid Bi-CG mehods), residuals are updated by vectors of the form −Au with u always explicitly available.

The following formulation, equivalent to the one (3.1), will be of interest for later discussion.

rk+1 = rk− αkAuk ⊥ erk,

Auk+1 = Ark+1− βkAuk ⊥ erk, uk+1= rk+1− βkuk. (3.2)

Here, the vectors Auk are obtained by recursive update rather than by MV. Since Ark is needed the number of MVs is not affected: the coupled recurrences in (3.2) require only one vector update more per step (per MV) than the recurrences in (3.1). Both updates require orthogonality to the same vector erk, which will turn out to be convenient. The third recurrence relation provides the update for the approximate solution xk+1. Note that the way of computing βk is not affected: if ϑk ∈ C is such that erk+1− ϑkA∗erk ∈ Kk+1(A∗, er0), then, with σk≡ er∗kAuk and ρj ≡ er∗jrj,

αk = ρk σk and βk =er ∗ kArk+1 er∗kAuk = ρk+1 ϑkσk .

(8)

3.2. Hybrid Bi-CG. For each k, erk = Pk(A∗)er0 for some polynomial Pk of degree k. With Pk≡ Pk(A), (3.2) implies

Pkrk+1 = Pkrk− αkAPkuk ⊥ er0, APkuk+1 = APkrk+1− βkAPkuk ⊥ er0,

Pkuk+1 = Pkrk+1− βkPkuk. (3.3)

There are a number of effective strategies for selecting the polynomials Pk, see. e.g., [8, 10, 2, 5, 1, 14]

3.3. Bi-CGSTAB. In Bi-CGSTAB, the polynomial Pk+1is of the form Pk+1(λ) = (1 − ωk+1λ) Pk(λ), and a combination of (3.3) and

Pk+1rk+1= Pkrk+1− ωk+1APkrk+1 Pk+1uk+1= Pkuk+1− ωk+1APkuk+1 (3.4)

updates the residual Pkrkfrom S(Pk, A, er0) to the residual Pk+1rk+1in S(Pk+1, A, er0). The Pkrk are primary residuals, Pkrk+1 are secondary ones. To complete the loop consisting of (3.3) and (3.4), APk+1uk+1 has to be computed. This can be done by multiplying Pk+1uk+1 by A, or by multiplying APkUk+1 by A after the updates in (3.3), followed by the update

APk+1uk+1= APkuk+1− ωk+1A2Pkuk+1. (3.5)

This last approach makes each loop one vector update more expensive: the number of MVs (two per loop) is not affected. Though a bit more expensive, we feel that this approach helps to clarify the derivation of our IDR variant in §5.

Bi-CGSTAB selects ωk+1 to minimize the norm of the right-hand side of the first expression in (3.4). Note that this minimizing scalar is real if A is real and the residuals are real. BiCGstab(ℓ) can be described by stating that it selects ωj in C such that at the end of each cycle of ℓ of these loops, the norm of the expression

Pkrk+ℓ− γ1APkrk+ℓ− . . . − γℓAℓPkrk+ℓ (3.6)

is minimal and Pk+ℓrk+ℓ equals this expression. Here, the γi are such that 1 − γ1λ − . . . − γℓλℓ= (1 − ωk+ℓλ) · . . . · (1 − ωk+1λ) (λ ∈ C).

In actual computations, the γiare computed. Since they can only be computed at the end of the cycle of ℓ loops, the ωj are not available in intermediate loops. Therefore, to make this ℓth degree minimization applicable, the cycle of ℓ loops needs some rearrangement as is done in [5] in the derivation of a BiCGstab(ℓ) algorithm. For completes, the resulting algorithm is included, see Alg. 3.2.

In our IDR variant we follow a degree ℓ minimization. If s = 1, then our variant is equivalent to the original BiCGstab(ℓ) algorithm in [5].

The polynomials Pk in CGS [8], GCGS [1], and GPBiCG [14] are generated by three term recurrences βkPk+1(λ) = (λ − αk)Pk(λ) − βk−1Pk−1(λ) (with βk+ αk+ βk−1= 1 to have ‘residual’ polynomials Pk). Therefore, although Pkcan be expressed as Pk(λ) = Πj≤k(1 − ωjλ), these hybrid variants can not be described in the above Bi-CGSTAB fashion: the ωj (j ≤ k) change if k increases (ωj= ωj,k).

(9)

Select an x0 and an er0. Compute r0= b − Ax0

x = x0, r = [r0], u = [r0; Ar0] while kr0k > tol

for j = 1, . . . , ℓ % The Bi-CG step

σ = er∗0uj, α = σ−1(er∗0rj−1)

x = x + u0α, r = r − [u1; . . . ; uj]α, r = [r; Arj−1] β = σ−1_(er∗

0rj), u = r − uβ, u = [u; Auj] end for

% The polynomial step

~γ = (γ1; . . . ; γℓ) = minarg~γkr0− [r1, . . . , rℓ]~γk x = x + [r0, . . . , rℓ−1]~γ, r = r0− [r1, . . . , rℓ]~γ u = [u0−Pℓj=1γjuj; u1−Pℓj=1γjuj+1] end while

Alg. 3.2: BiCGstab(ℓ). The uj and rj in the computation of σ, α, and β, respectively, are related to u, and r according to u = [u0; . . . ; uj] and r = [r0; . . . ; rj], respectively. Note that the sizes of r and u change during the loop: at the start of the jth step of the ‘for j = . . .’ loop, r = [r0; . . . ; rj−1] and u = [u0; . . . ; uj].

4. IDR. In this section, we formulate a variant of IDR that, in §5, allows easy incorporation of degree ℓ polynomial minimization for ℓ > 1. In this variant, we distinguish two parts in one cycle of the method.

• In the first part (see §4.1), which we call the IDR step, we update a residual r− in Gk (a primary residual) and an n × s matrix AU− with columns also in Gk to a residual r in Gk∩ eR

⊥

0 and an n × s matrix AU with columns in Gk ∩ eR ⊥

0. We use the subscript ‘−’ as a reference to quantities that are available at the end of the past cycle, the kth cycle.

The IDR step corresponds to the Bi-CG part in one cycle of standard Bi-CGSTAB. • Then in the second part (see §4.2), the polynomial part, we select a degree one polynomial P (λ) ≡ 1 − ωk+1λ and update r to r+ ≡ r − ωk+1Ar in Gk+1 ≡ (I − ωk+1A)(Gk∩ eR

⊥

0) (our next primary residual) and AU to AU+ with columns also in Gk+1. U is updated to U+.

Suppose, at the beginning of the kth cycle, n × s ‘update’ matrices U− = [u1, . . . , us] and AU− ≡ [Au1, . . . , Aus] are available. In addition, we also have an approximate solution x− and its associated residual r− ≡ b − Ax− (assuming exact arithmetic).

4.1. The IDR step. Let Πi be the projections defined by Πi≡ I − AiU−σ−1Re ∗ 0A1−i with σ ≡ eR ∗ 0AU− (i = 0, 1). (4.1) Note that e R∗0Π1= 0 and Π1A = AΠ0. (4.2)

(10)

In particular we have that Π1w ⊥ eR0for all vectors w. With ρ ≡ eR∗₀r− and α = σ−1ρ, we generate r and x by

r ≡ Π1r− = r−− AU−α and x ≡ x−+ U−α. (4.3)

Then, we have that b − Ax = r ⊥ eR0. Now, we obtain the vector Ar by matrix multiplication and we can form a new residual by determining the scalar ω that minimizes the norm of r − ωAr. This last expression is the new residual associated with the approximate solution x + ωr. However, before we perform the minimization step, we first update U− to U, and AU− to AU such that AU is orthogonal to eR0 and we compute A2U. We generate AU such that the columns form a basis of the Krylov subspace Ks(Π1A, Π1Ar). As a ‘side product’ we obtain A2U and U. For instance (for variants, see §4.4), if

AU = [Π1Ar, (Π1A)2r, . . . , (Π1A)sr] ⊥ eR0, (4.4)

then, with s ≡ AUeqfor some q < s, the vector AUeq+1can be computed as c ≡ As, which gives the qth column of A2U,

A2Ueq= c, where c ≡ As (4.5)

followed by the projection:

AUeq+1= c − AU−β, where β ≡ σ−1ρ and ρ ≡ eR ∗ 0c. (4.6)

Now, the (q + 1)th column of U can be computed as Ueq+1= s − U−β. (4.7)

The following scheme summarizes this IDR step

U− x− → x Ue1 Ue2 . . . Ues Π0 ր ↓ A Π0 ր ↓ A ↓ A AU− r− Π1

→ r AUe1 AUe2 . . . AUes

↓ A Π1 ր ↓ A Π1 ր ↓ A ↓ A Ar A2Ue1 A2Ue2 . . . A2Ues (4.8)

The boxed vectors have been obtained by explicit multiplication by the matrix A. The other ‘new’ vectors have been obtained by vector updating as indicated in (4.3), (4.6), and (4.7).

The ‘new’ vectors on the second row of vectors in Scheme (4.8), the vectors r, AUe1, . . . AUes, are orthogonal to eR0. To use IDR-terminology, if r− and AU− belong to Gk, then these new vectors belong to Gk∩ eR

⊥

0. Multiplication by A leads to the vectors on the last row of (4.8): the multiplication maps the vectors to a space Gk+1. Of course, what Gk+1 will be depends on the choice of µk+1: Gk+1 ≡ (µk+1I − A)(Gk∩ eR

⊥

0). The third row corresponds to µk+1 = 0, which will not be a practical choice. Nevertheless, this observation may clarify how this approach relates to IDR. In the next subsection, we discuss the choice µk+1= 1/ωk+1.

(11)

4.2. The polynomial step. Now, we have the ingredients to perform the mini-mization step to obtain our next approximate solution, residual, and update matrices, denoted by x+, r+, U+ and AU+, respectively. Select a complex scalar ωk+1. Then,

x+≡ x + ωk+1r, r+≡ r − ωk+1Ar, U+≡ U − ωk+1AU, AU+≡ AU − ωk+1A2U This step can be viewed as a Richardson step with parameter ωk+1.

In practice, ωk+1 is selected to minimize the norm of the new residual r+ and then the step can be viewed as one step in restarted GMRES with restart every step (GMRES(1)). But other choices are possible as well: instead for minimal residuals one can also aim for accurate IDR updates similar to the strategy for Bi-CGSTAB as explained in [6].

Note 4.1. If, in Scheme (4.8), the column with x, r and Ar is replaced by x+and r+(r+at the location of Ar, and the columns with x−and U−interchanged), then the scheme for the subsequential steps can be chained. The horizontal movement (from left to right) can be viewed as representing the update of the order k of the shadow Krylov subspace Kk(A∗, er0), while the vertical movement (from top to bottom) represents the increase of the polynomial degree (cf., [4, 3]).

4.2.1. Saving storage. Note that the ωk+1 can be determined after computa-tion of r and Ar, but before the formacomputa-tion of Ue1. If ωk is available, x+ and r+ can be computed before forming Ue2 and AUe2. Similarly U+eq and AU+eq can be computed before forming Ueq+2 and AUeq+2. These vectors can be stored at the location of storage of Ueq and AUeq. This saves the storage of an n × s matrix (the matrix A2U).

4.3. The relation to IDR. The following proposition expresses the fact that the above approach defines an IDR method. The proof has been indicated above when combined with Theorem 2.3: we leave details to the reader.

Proposition 4.2. Consider one cycle of the method as described above in §4.1 and §4.2. Suppose that at the beginning of the cycle the residual r− as well as the columns of the matrixAU− belong to a Sonneveld subspace S(Pk, A, eR0). Here, Pk is a polynomial of exact degreek. Then, with Pk+1(λ) ≡ (1−ωk+1λ)Pk(λ), the residual r+ and the columns ofAU+ at the end of the cycle belong toS(Pk+1, A, eR0).

4.4. Alternative formulations of the IDR step. In the Scheme (4.8) above, we generated the basis vectors of AU ≡ Ks(Π1A, Π1Ar) by multiplication by the operator Π1A. With larger s, this approach will be unstable. More stable approaches, as Arnoldi, can be exploited. The matrix of basis vectors with such another approach is of the form AUT , where T is some non-singular s × s matrix. For instance, with Arnoldi, we find the n × s orthonormal matrix Q from the QR-decomposition of AU: AU = QR, the s×s matrix R is upper triangular and T = R−1_{. Herewith associated,} we should compute eU ≡ UT and A2U ≡ Ae 2UT instead of U and A2U. Then Q = A eU. In practise, whenever we compute A eUeq byPj<qγjA eUej+ γqAUeq, we immediately compute eUeqand A2Uee q byP_j<qγjUee j+ γqUeq andP_j<qγjA2Uee j+ γqA2Ueq, respectively, and we put the new values at the location of the old ones.

An orthogonal Krylov basis for AU is generated by s steps of ORTHODIR applied to the equation

Π1Ay = r (4.9)

(12)

with initial guess 0, or, equivalently, to the equation Π1Ay = b with initial guess x. These basis vectors are the ORTHODIR direction vectors that are used to update the residuals. They are equal to the ‘Arnoldi’ vectors except for scalar multiples. Since Π1A = AΠ0, solving (4.9) is equivalent to solving AΠ0y = r. Hence, OR-THODIR can be used to update residuals as well as approximate solutions for the system Ax = b (with initial guess x). Except for scalar multiples, and assuming that GCR does not break down, ORTHODIR and GCR produce the same orthogonal basis of direction vectors for AU . The direction vectors are obtained in ORTHODIR by multiplication of the preceding direction vector by the operator, Π1A in this case, whereas in GCR the direction vectors are obtained by multiplication of the present residual by the operator. In both methods, the new direction vectors are obtained then as the orthogonal components by applying Gram-Schmidt.

The IDR(s) variant that is described in [11] follows a different approach. This variant computes basis vectors for AU that satisfy bi-orthogonality relations with the columns of R0. This yields a stable and quite efficient IDR-implementation, and this approach can be followed here as well.

4.5. Saving vector updates in the IDR step. Note that for the computation of σ (cf., (4.1)) the matrix AU is not needed: σ can be computed as (A∗_R_e

0)∗U. The matrix A∗Re0 has to be computed once and stored. This allows us to following alternative for performing the IDR step.

With ρ ≡ eR∗0r−and α = σ−1ρ, we generate r and x by first computing u ≡ U−α. Then, r ≡ Π1r−= r−−Au and x ≡ x−+u, where Au has been obtained by explicitly multiplying u by A. As before, we obtain the vector Ar by matrix multiplication. Now, we update U− to U and compute AU, with AU as in (4.4), as follows. First, we compute Ue1 and AUe1: Ue1 ≡ r − U−β, where β = σ−1((A∗Re0)∗Ar). Then AUe1 is obtained by matrix-vector multiplication. Then, similarly, with s ≡ AUeq for some q < s, the vector Ueq+1 is computed as Ueq+1 ≡ s − U−β, where now β = σ−1_{( e}_R∗

0s) and AUeq+1 is obtained by matrix-vector multiplication. The scheme below summarizes this approach.

U− x− → x Ue1 Ue2 . . . Ues ↓ A Π0 ր ↓ A Π0 ր ↓ A ↓ A

r− r AUe1 AUe2 . . . AUes ↓ A

Ar (4.10)

Note that this approach saves storage. We now have to store the following five n × s matrices ( eR0, A∗Re0, U−, U, and AU, or the four matrices eR0, A∗Re0, U−, U − ωAU), whereas before we had to store six matrices of this size ( eR0, U−, AU−, U, AU, and A2U, or, the five matrices eR0, U−, AU−, U − ωAU, and AU − ωA2U). We can save a bit on this: when the last columns of U and AU are available, the matrix U− (and AU−) is not needed anymore.

As compared to the previous approach, there are s2_{+s less vector updates needed} per full recursion step, but instead, we need an additional matrix vector multiplication per full step.

If s = 1, then we can save this additional MV. Because, for the update r = r−− AU−α we need AU−, which is one vector if s = 1. This vector can be reused

(13)

in the computation of AUe1= A(r − U−β) = Ar − AU−β: in case s = 1, both Ar and AU− are available at this stage.

5. IDRstab. As explained in §4.2, after computation of x, r, Ar, U, AU and A2U a scalar ωk can be determined and the vectors can be updated to x+, r+ = r − ωkAr, U+= U − ωkAU, etc.. As an alternative, the ‘IDR step’ can be repeated ℓ times before selecting appropriate scalars ωk (see, §5.1 below). This leads to a BiCGstab(ℓ) version of IDR (see §5.2). We will refer to this variant as IDRstab.

5.1. The IDR step that allows degree ℓ minimization. We give details on how to ‘repeat’ an IDR step without polynomial updates.

Performing j −1 repetitions make a residual r−, say, available with associated approx-imation x−, plus the vectors Ar−, . . ., Aj−1r−, and the n × s matrices U−, AU−, A2U−, . . ., AjU−.

The next ‘repetition step’ can be described by the projections Πi defined by Πi≡ I − AiU−σ−1Re ∗ 0Aj−i (i = 0, 1, . . . , j) with σ ≡ eR ∗ 0AjU− (5.1) Note that e

R∗0Πj = 0 and Πi+1A = AΠi (i = 0, 1, . . . , j − 1). (5.2)

First, we obtain Air for i < j by projection: with α ≡ σ−1_{( e}_R∗

0Aj−1r−), Air ≡ Πi+1(Air−) = Air−− Ai+1U−α (i < j), x = x−− Uα. (5.3)

Next, we obtain Ajr by multiplying Aj−1r by A: Ajr = A(Aj−1r). (5.4)

Then, we update AiU− to AiU (i = 0, . . . , j) such that the columns of AjU are orthogonal to eR0 and form a basis of the Krylov subspace Ks(ΠjA, ΠjAjr). As a side product we obtain AiU for i = 0, . . . , j − 1, j + 1. For instance, with sj ≡ AjUeq for some q < s, the vector Aj_Ue

q+1 can be computed as Asj, which gives the qth column of Aj+1U, followed by the projection Πj: AjUeq+1 ≡ Asj− AjU−σ−1β, where β ≡ σ−1_{ρ and ρ ≡ e}_R∗

0Asj. Then the (q + 1)th column of AiU can be computed as AiUeq+1 = si+1− AiU−β (i = 0, . . . , j − 1), involving vector updates only (no additional MVs, no additional dot products). Here, si ≡ AiUeq. The following scheme summarizes this IDR step in case j = 2.

U− x− → x Ue1 Ue2 . . . Ues Π0 ր ↓ A Π0 ր ↓ A ↓ A AU− r− Π1

→ r AUe1 AUe2 . . . AUes Π1 ր ↓ A Πր1 ↓ A ↓ A A2U− Ar− Π2 → Ar A2Ue1 A2Ue2 . . . A2Ues ↓ A Πր2 ↓ A Πր2 ↓ A ↓ A A2r A3Ue1 A3Ue2 . . . A3Ues (5.5)

(14)

As before, the boxed vectors have been obtained by explicit multiplication by the matrix A, while the other ‘new’ vectors have been obtained by vector updates.

The ‘new’ vectors on the last but one row of Scheme (5.5), the vectors Ar, A2Ue1, . . ., A2Ues, are orthogonal to eR0, and, to use IDR-terminology, they belong to Gk+j−1∩ eR

⊥

0, if Ar−and the columns of A2U− belong to Gk+j−1. Multiplication by A leads to the vectors on the last row of (5.5): they are mapped to a space Gk+j (assuming µj= 0).

To prepare for the next repetition of an IDR step, rename x to x−, Air to Air− (i = 0, . . . , j), and AiU to AiU− (i = 0, . . . , j + 1).

The variants of §4.2.1, §4.4, and §4.5 have their analogues for the present situation. In our actual implementation, we followed the Arnoldi variant of §4.4 to orthonor-malize AjU (orthonormalizing the vectors at the last but one row of Scheme (5.5)).

5.2. Minimization using degree ℓ polynomials. After ℓ repetitions of the IDR step, we have a residual, say r, available, plus the vectors Air for i = 0, . . . , ℓ, the associated approximation x and ℓ + 2 matrices AiU (i = 0, . . . , ℓ + 1) of size n × s. Now, to finish one cycle of IDRstab, determine scalars γ1, . . . , γℓ, for instance, such that the norm of

r+≡ r − γ1Ar − . . . − γℓAℓr (5.6) is minimal. Compute x+≡ x + γ1r + . . . + γℓAℓ−1r (5.7) and AiU+≡ AiU − γ1Ai+1U − . . . − γℓAi+ℓU (i = 0, 1). (5.8)

5.3. The relation to IDR. The following theorem allows IDRstab to view as an IDR method.

Theorem 5.1. Consider one cycle of IDRstab as described above in §5.1 and §5.2. Suppose none of the rootsλ1, . . . , λℓ of the polynomial

Q(λ) ≡ 1 − γ1λ − . . . − γℓλℓ (λ ∈ C)

is zero. Suppose at the beginning of the cycle the residualr−as well as the columns of the matrixAU− belong to the Sonneveld subspaceS(Pk, A, eR0), with Pk a polynomial of exact degree k.1 _{Then the residual}_r

+ and the columns ofAU+ at the end of the cycle belong toS(QPk, A, eR0): Pk+ℓ≡ QPk is of exact degree k + ℓ

Proof. To prove the theorem, factorize the polynomial Q as

Q(λ) = 1 − γ1λ − . . . − γℓλℓ= (1 − ωk+1λ) · . . . · (1 − ωk+ℓλ)

and use induction to j (j = 1, . . . , ℓ) adding one factor Qj+1(λ) ≡ Qj(λ)(1 − ωk+jλ) per induction step; Q0≡ 1. We leave details to the reader.

The theorem states that the quantities of (5.6)–(5.8) would also have been ob-tained by performing ℓ steps of the IDR variant of §4 provided the scalars ωk+j would

1_{The transition of r}

−and AU−at the beginning of the cycle to r+ at the end of the cycle moves through ℓ stages where r−is updated to r, AU−to AU as explained in §5.1, and where r is renamed to r−and AU to AU−. At the end r+, . . . are formed as explained in §5.2.

(15)

Select an x0 and an n × s matrix eR0. Compute r0= b − Ax0 % Generate U= [U0; U1] = [U0; AU0] for q = 1, . . . , s if q = 1, u0= r0, else u0= u1 ~ µ = (U0 (:,1:q−1)) ∗ u₀, u= [u0; Au0] − U(:,1:q−1)~µ u_{= u/ku}₀_{k, U}_(:,q)_{= u} end for while kr0k > tol for j = 1, . . . , ℓ % an IDR step σ = eR∗₀Uj, α = σ~ −1( eR ∗ 0rj−1) x= x + U0α, r = r − [U~ 1; . . . ; Uj]~α, r = [r; Arj−1] for q = 1, . . . , s if q = 1, u = r, else u = [u1; . . . ; uj+1] ~ β = σ−1_{( e}_R∗ 0uj), u = u − U~β, u = [u; Auj] ~ µ = (Vj(:,1:q−1))∗uj, u= u − V(:,1:q−1)~µ u_{= u/ku}_j_{k, V}_(:,q)_{= u} end for U= V end for

% The polynomial step

~γ = (γ1; . . . ; γℓ) = minarg~γkr0− [r1, . . . , rℓ]~γk

x= x + [r0, . . . , rℓ−1]~γ, r = r0− [r1, . . . , rℓ]~γ

U_{= [U}₀₋Pℓ

j=1γjUj; U1−Pℓj=1γjUj+1]

end while

Alg. 5.3: IDRstab. The Uj, rj−1, uj, and Vj in the computation of σ, ~α, ~β, and ~µ, respectively, are related to U, r, u, and V according to U = [U0; . . . ; Uj], r = [r0; . . . ; rj−1], u = [u0; . . . ; uj] and V = [V0; . . . ; Vj+1], respectively. Note that the sizes of r, U and u change during the loop: at the beginning of the ‘for j = . . .’ loop, r = [r0; . . . ; rj−1] and U = [U0; . . . ; Uj].

have been appropriately selected in each polynomial step (cf. §4.2). Obviously, it is impossible in practice to select the ωk+j‘appropriately’ at step j +k (j < ℓ): the roots of the minimizing polynomial Q are not known at the this stage. In particular, in one cycle of BiCGstab(ℓ), we move from the primary residual rk to the primary residual rk+ℓ and do not explicitly compute the intermediate residual rk+j (j = 1, . . . , ℓ − 1). Nevertheless the theorem is of interest: since it shows that IDRstab is an IDR version. 5.4. The IDRstab algorithm. The procedure described in §5.1 and §5.2 leads to the algorithm in Alg. 5.3. The notation in this algorithm follows MATLAB con-ventions: if W = [w1, . . . , ws], then W(:,1:q)≡ [w1, . . . , wq] and W(:,q)≡ wq (q ≤ s); [U0; . . . ; Uj] ≡ [U T 0, . . . , U T j] T .

With ℓ = 1, this algorithm is (mathematically) equivalent to IDR(s), with s = 1 we have BiCGstab(ℓ).

The line in the two ‘for q = . . .’ loops involving ~µ, form an implementation of Arnoldi’s procedure to obtain an orthonormalized matrix U0 and Vi for i = j,

(16)

respectively. We use Arnoldi since it is more stable than the ‘power basis’. If the power method is sufficiently stable (for instance if s < 3), then these lines can be skipped. However, for larger s, a more stable form of the Gram–Schmidt, as modified Gram– Schmidt or repeated Gram-Schmidt, may be required. We have good results with s ≤ 8 and, to our experience, for this size of s classical Gram–Schmidt is sufficiently stable.

Arnoldi may be applied to obtain an orthonormalized matrix Vifor any i = 0, . . . , j + 1. When applied to i = 1, we are actually performing ORTHODIR without updat-ing the ORTHODIR residuals and approximations. Of course these ORTHODIR quantities can be computed as well. This can be useful, for instance, to allow for an ‘intermediate break’ when an ORTHODIR residual is small also for j < ℓ. Of course updating the ORTHODIR residuals and approximation for each j = 1, . . . , ℓ requires additional vector updates. Hence, this strategy will be attractive only when the matrix vector multiplication form the dominant costs per step.

As in BiCGstab(ℓ), the ~γ can be computed as the solution of the normal equation (R∗R)~γ = R∗r0 where R ≡ [r1, . . . , rℓ],

leading to a minimal residual (which can be viewed as the residual after ℓ steps of GMRES with initial residual r0). As for BiCGstab(ℓ), a suitable combination with the Galerkin residual (that is, the residual after ℓ steps of FOM) might lead to more stability in the IDR step (see [6]).

Assembling matrices as U0, U1 = AU0, . . ., Uj = AjU into one tall matrix U = [U0; . . . ; Uj] and vectors as r0, r1 = Ar0, . . . , rj−1 = Aj−1r0 into one tall vector r = [r0; r1; . . . ; rj−1], allows the compact representation of vectors updates in the IDR step that is used in Alg. 5.3. Implementation of this, may also lead to more efficiency on certain machines. By solving Ax+b = 0 instead of Ax = b and defining rk ≡ b + Axk, the two updates x = x − U0~α and r = r − [U1, . . . , Uj]~α can be further combined into one (higher dimensional) update: [x; r] = [x; r] − U~α.

6. Numerical experiments. This section presents numerical experiments on three test problems that, for different reasons, are difficult to solve by iterative meth-ods. The difficulty of the first test problem is that the system matrix is highly indefi-nite, whereas the difficulty in the second and third problem comes form the fact that the system matrix is almost skew symmetric.

As an implementation independent measure for the amount of work (or solution time) to solve the systems we will take the number of matrix-vector operations. This measure is realistic for applications where the (preconditioned) matrix-vector multi-plication is far more expensive than a vector operation, which is typically the case, and as long as the number of vector operations per matrix-vector operation is modest. To limit the overhead of the vector operations we only consider relatively small values for s, and ℓ, both ranging between 1 and 8.

The experiments are performed with MATLAB 7.5.

6.1. A test problem from the MATRIX-MARKET collection. For our first test problem we consider the SHERMAN5 matrix from the MATRIX-MARKET collection. The size of this matrix is 3312 × 3312. The right-hand side vector is chosen such that the solution is the vector consisting of ones.

The SHERMAN5 matrix is, in contrast to the well-known SHERMAN4 matrix from the same collection, notoriously difficult to solve by iterative methods. This is due to the fact that the SHERMAN5 matrix is highly indefinite, as is illustrated by the plot of the spectrum given in Figure 6.1.

(17)

−200 −100 0 100 200 300 400 500 600 −0.04 −0.03 −0.02 −0.01 0 0.01 0.02 0.03 0.04 Re(λ) Im( λ ) Spectrum SHERMAN5

Fig. 6.1: Spectrum of SHERMAN5

The system is solved with IDRstab, with different combinations of the parameters s and l. For the the columns of eR0 we take the orthogonalization of s real random vectors. No preconditioner is applied. The iterative processes are terminated once the residual norm, divided by the norm of the right-hand side vector, drops below 10−9_. Figure 6.2 shows the convergence behaviour for the following combinations of s and ℓ: s = 4, ℓ = 1 (equivalent with IDR(4)), s = 1, ℓ = 4 (equivalent with BiCGstab(4)), and s = ℓ = 4.

For comparison, we have also included full GMRES in our experimental evalu-ation. The comparison with GMRES is interesting for theoretical reasons only: it shows how close the required number of IDRstab matrix-vector multiplications is to the optimal number of GMRES matrix-vector multiplications. For practical appli-cations, where the typical problem size is orders of magnitudes larger than for our test problem (millions of unknowns are quite common), it is of course impossible to perform hundreds of GMRES iterations due to the excessive memory use and large overhead for vector operations. Table 6.1 gives for different combinations of s and ℓ the required number of matrix-vector multiplications to achieve the desired accuracy.

s\ℓ 1 2 4 8

1 n.c. 3617 3033 2529 2 2969 2546 2174 2066 4 2734 2154 1944 1884 8 2303 1970 1700 1808 Table 6.1: SHERMAN5: matrix-vector products for IDRstab

For s = ℓ = 1 (i.e. Bi-CGSTAB), no convergence occurs within 4000 matrix-vector multiplications. For higher values of s or ℓ IDRstab always converges. Moreover, Table

(18)

0 500 1000 1500 2000 2500 3000 3500 −10 −8 −6 −4 −2 0 2 4 Number of MATVECS |r|/|b| Convergence SHERMAN5 GMRES IDR(4) BiCGstab(4) IDR(4)STAB(4)

Fig. 6.2: SHERMAN5: convergence for IDRstab and for GMRES

6.1 shows that the number of matrix-vector multiplications is reduced by increasing either s, or ℓ, and, more importantly, that by increasing both s and ℓ the number of matrix-vector multiplications is reduced to a level lower than is achieved by IDR(s) and BiCGstab(ℓ).

6.2. A 2D convection-diffusion problem with a positive shift. The second test problem we consider is the finite volume discretization of the following partial differential equations on the unit square [0 , 1] × [0 , 1]:

−uxx− uyy+ 1000(xux+ yuy) + 10u = f .

Dirichlet conditions are imposed on the boundaries. This problem is discretised with the finite volume method on a 65 × 65 grid, which yields a matrix of size 4096 × 4096. The right-hand side vector f of the discrete system is chosen such that the solution is the vector consisting of ones. This problem is taken from [6].

As in the previous example, eR0is taken to be the orthogonalization of s real ran-dom vectors, no preconditioner is applied, and the iterative processes are terminated if the relative residual norm drops below 10−9_{. Figure 6.3 shows the convergence of} three IDRstab variants, and for comparison also of full GMRES. This figure shows that also for this problem a significant gain can be achieved by taking both s and ℓ higher. The convergence curve for IDRstab with s = 4 and ℓ = 4 follows the optimal convergence curve of GMRES closely. Table 6.2 gives for different combinations of s and ℓ the required number of matrix-vector multiplications to achieve the desired accuracy.

Also for this example, Bi-CGSTAB does not converge within 4000 matrix-vector multiplications. IDR(s) (i.e. IDRstab with ℓ = 1) converges badly for all values of s, which could be expected since the system matrix is almost skew symmetric. However, we do remark that the convergence improves significantly by taking s higher. BiCGstab(ℓ) convergences after around 600 matrix-vector for all values of ℓ larger than 1. The most important conclusion is that by combining IDR and BiCGstab(ℓ), i.e.

(19)

0 200 400 600 800 1000 1200 1400 −10 −8 −6 −4 −2 0 2 Number of MATVECS |r|/|b|

Convergence convection−diffusion problem with shift GMRES IDR(4) BiCGstab(4) IDR(4)STAB(4)

Fig. 6.3: 2D Convection-diffusion problem: convergence for IDRstab and for GMRES

s\ℓ 1 2 4 8

1 n.c. 625 601 625

2 1718 464 458 458

4 1299 434 404 404

8 917 386 368 440

Table 6.2: 2D convection-diffusion problem: matrix-vector products for IDRstab taking both s and ℓ larger than 1, it is possible to almost close the gap with GMRES, and almost get optimal convergence.

6.3. A 3D convection-dominated problem. The third test problem is taken from [5], and is also included in [9]. This problem was use to illustrate that Bi-CGSTAB does not work well, due to the strong nonsymmetry of the system matrix. As was shown in [9], IDR(s) also performs poorly for this problem (if the shadow space is chosen real, the bad convergence can be overcome by chosing the shadow space complex).

The test problem is the finite difference discretization of the following partial differential equations on the unit cube [0 , 1] × [0 , 1] × [0 , 1]

uxx+ uyy+ uzz+ 1000ux= F .

Homegeneous Dirichlet conditions are imposed on the boundaries. The vector F is defined by the solution u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). The partial differential equation is discretized using central differences for both the convection and diffusion terms. We take 52 gridpoints in each directions (including boundary points) which yields a system of 125,000 equations.

As in the previous examples, eR0is taken to be the orthogonalization of s real ran-dom vectors, no preconditioner is applied, and the iterative processes are terminated

(20)

if the relative residual norm drops below 10−9_{. Figure 6.4 shows the convergence} behaviour of three IDRstab variants and of GMRES.

0 200 400 600 800 1000 1200 1400 −10 −8 −6 −4 −2 0 2 4 Number of MATVECS |r|/|b|

Convergence convection−diffusion problem

GMRES IDR(4) BiCGstab(4) IDR(4)STAB(4)

Fig. 6.4: 3D convection-diffusion problem: Convergence of IDRstab and of GMRES

The figure shows that the convergence curve of the IDRstab with ℓ = 4, s = 1 (i.e. BiCGstab(4)) is close to the optimal convergence curve of full GMRES. As a result, the reduction that is obtained by increasing both ℓ and s is modest. This is also apparent from the results tabulated in Table 6.3. The BiCGstab(ℓ) variants are close to optimal, therefore only a modest improvement can be achieved by also choosing s > 1. Still, we remarks that the number of matrix-vector multiplications for ℓ = 2, s = 8 is smaller than for the four BiCGstab(ℓ) variants that we investigated. Also for this example, Bi-CGSTAB does not converge within 4000 matrix-vector multiplications.

s\ℓ 1 2 4 8

1 n.c. 321 265 321

2 2087 266 266 266

4 1224 264 264 284

8 638 242 260 296

Table 6.3: 3D convection-diffusion problem: matrix-vector products for IDRstab 7. Conclusions. The goal of the research that is described in this paper was to unify two of the most powerful short recursion Krylov method for nonsymmetric systems into one method that inherits the strong points of both. To this end we have derived the method IDRstab. We have illustrate the potential of this method with numerical experiments that show that IDRstab for certain problems outperforms both IDR(s) and BiCGstab(ℓ).

The quest for IDRstab has provided us with considerable new insight. In [9], it was assumed that a new IDR-theorem was needed to make an IDR-method that uses

(21)

higher order stabilisation polynomials. In this paper, we have shown that this is not the case: any hybrid Bi-CG methods can be considered as an IDR method.

In the algorithm that we have presented in this paper many specific choices were made. The resulting implementation is in our experience efficient and numerically stable. However, we have no doubt that the method can be further optimised. To this end we plan to apply similar ideas as in [11].

Recently we received a private communication that a group at the University of Tokyo are also developing a generalisation of IDR(s) and BiCGstab(ℓ).

Acknowledgements: We thank Peter Sonneveld for introducing us to IDR and sharing his deep insight with us.

Part of this research has been funded by the Dutch BSIK/BRICKS project.

REFERENCES

[1] Diederik R. Fokkema, Gerard L. G. Sleijpen, and Henk A. van der Vorst, Generalized conjugate

gradient squared, J. Comput. Appl. Math. 71 (1996), no. 1, 125–146. MR 97h:65040

[2] Martin H. Gutknecht, Variants of BICGSTAB for matrices with complex spectrum, SIAM J. Sci. Comput. 14 (1993), no. 5, 1020–1033. MR 1232173

[3] Martin H. Gutknecht and Klaus J. Ressel, Look-ahead procedures for Lanczos-type product

methods based on three-term Lanczos recurrences, SIAM J. Matrix Anal. Appl. 21 (2000),

no. 4, 1051–1078 (electronic). MR 1 745 519

[4] Klaus J. Ressel and Martin H. Gutknecht, QMR smoothing for Lanczos-type product methods

based on three-term recurrences, SIAM J. Sci. Comput. 19 (1998), no. 1, 55–73 (electronic),

Special issue on iterative methods (Copper Mountain, CO, 1996). MR 99d:65143 [5] Gerard L. G. Sleijpen and Diederik R. Fokkema, BiCGstab(l) for linear equations involving

unsymmetric matrices with complex spectrum, Electron. Trans. Numer. Anal. 1 (1993),

11–32 (electronic only). MR 94g:65038

[6] Gerard L. G. Sleijpen and Henk A. van der Vorst, Maintaining convergence properties of

BiCGstab methods in finite precision arithmetic, Numer. Algorithms 10 (1995), no. 3-4,

203–223. MR 96g:65037

[7] Gerard L.G Sleijpen, Peter Sonneveld, and Martin B. van Gijzen, Bi-CGSTAB as an induced

dimension reduction method, Preprint 1369, Department of Mathematics, Utrecht

Univer-sity, Utrecht, the Netherlands, 2008.

[8] P. Sonneveld, CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Statist. Comput. 10 (1989), 36–52.

[9] Peter Sonneveld and Martin van Gijzen, IDR(s): a family of simple and fast algorithms for

solving large nonsymmetric systems of linear equations, SIAM J. Sci Comput. 31 (2008),

no. 2, 1035–1062.

[10] Henk A. van der Vorst, Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the

solution of nonsymmetric linear systems, SIAM J. Sci. Statist. Comput. 13 (1992), no. 2,

631–644. MR 92j:65048

[11] Martin van Gijzen and Peter Sonneveld, An elegant IDR(s) variant that efficiently exploits

bi-orthogonality properties. Delft University of Technology, Reports of the Department of

Applied Mathematical Analysis, Report 08-21, 2008

[12] Piet Wesseling and Peter Sonneveld, Numerical experiments with a multiple grid- and a

pre-conditioned lanczos type method, Lecture Notes in Mathematics, vol. 771, pp. 543–562,

Springer Verlag, Berlin, Heidelberg, New York, 1980.

[13] Man-Chung Yeung and Tony F. Chan, Ml(k)bicgstab: A bicgstab variant based on multiple

lanczos starting vectors, SIAM Journal on Scientific Computing 21 (1999), no. 4, 1263–

1290.

[14] Shao-Liang Zhang, GPBi-CG: generalized product-type methods based on Bi-CG for solving

nonsymmetric linear systems, SIAM J. Sci. Comput. 18 (1997), no. 2, 537–551. MR