Fast and robust solvers for pressure correction in bubbly flow problems

(1)

DELFT UNIVERSITY OF TECHNOLOGY

REPORT 08-01

Fast and Robust Solvers for Pressure Correction

in Bubbly Flow Problems

S.P. MacLachlan,

J.M. Tang,

C. Vuik

ISSN 1389-6520

Reports of the Department of Applied Mathematical Analysis Delft 2008

(2)

Copyright  2008 by Department of Applied Mathematical Analysis, Delft, The Netherlands.

No part of the Journal may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from Department of Applied Mathematical Analysis, Delft University of Technology, The Netherlands.

(3)

Fast and Robust Solvers for Pressure Correction

in Bubbly Flow Problems ?

S.P. MacLachlan, J.M. Tang, C. Vuik

Delft University of Technology, J.M. Burgerscentrum, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft Institute of Applied Mathematics, Mekelweg 4, 2628 CD Delft, The Netherlands

Abstract

We consider the numerical simulation of two-phase fluid flow, where bubbles or droplets of one phase move against a background of the other phase. Such flows are governed by the Navier-Stokes equations, the solution of which may be approximated using a pressure-correction approach. Within such an approach, the computational cost is often dominated by the solution of a linear system corresponding to a discrete Poisson equation with discontinuous coefficients. In this paper, we explore the efficient solution of these linear systems using robust multilevel solvers, such as deflated variants of the preconditioned conjugate gradient method, or robust multigrid techniques. We consider these families of methods in more detail and compare their performance in the simulation of bubbly flows. Some of these methods turn out to be very effective and reduce the amount of work to solve the pressure-correction system substantially, resulting in efficient calculations for two-phase flows on highly resolved grids.

Key words: deflation, multigrid, bubbly flows, Poisson equation, discontinuous coefficients, conjugate gradient 1991 MSC: 65F10, 65F50, 65M06, 65M55, 65N22, 76T10

1. Introduction

Computation of two-phase flows and, in particular, bubbly flows is a very active research topic in com-putational fluid dynamics (CFD), see, for instance, [7, 12, 37, 44–47] and, more recently, [17, 23, 35, 38]. Understanding the dynamics and interaction of bubbles and droplets in a large variety of industrial pro-cesses is crucial for economically and ecologically optimized design. Two-phase flows are complicated to simulate, because the geometry of the problem typically varies with time and the fluids involved can have very different material properties. A simple example is that of air bubbles in water, where the densities vary by a factor of about 800.

Mathematically, bubbly flows are modelled using the Navier-Stokes equations, which can be approximated numerically using operator-splitting techniques. In these schemes, equations for the velocity and pressure are solved sequentially at each time step. In many popular operator-splitting methods, the pressure correction is

? This research was supported by the Dutch BSIK/BRICKS project and by the European Community’s Sixth Framework Programme, through a Marie Curie International Incoming Fellowship, MIF1-CT-2006-021927.

Email addresses: S.P.MacLachlan@tudelft.nl (S.P. MacLachlan), J.M.Tang@tudelft.nl (J.M. Tang), C.Vuik@tudelft.nl (C. Vuik).

(4)

formulated implicitly, requiring the solution of a linear system at each time step. This system takes the form of a Poisson equation with discontinuous coefficients. Solving these systems typically consumes the bulk of the computing time, even though the coefficient matrix is elliptic.

The solution of the pressure-correction equation within such operator-splitting approaches has long been recognized as a computational bottleneck in fluid flow simulation, see, e.g., [6, 17]. In the case of single-phase fluids, a common approach to overcoming this bottleneck is the use of multigrid solvers for this equation [43]. Standard geometric multigrid techniques (cf. [43]) offer optimal-scaling solution properties for the pressure-correction equations in a single-phase fluid. For multiphase fluids, however, large differences in the fluid densities can lead to dramatic deterioration in the multigrid performance. In this paper, we consider alternate approaches to solving the pressure-correction equation that do not exhibit the same sensitivity to jumps in the material properties.

Krylov subspace iterations, such as the preconditioned conjugate gradient (PCG) algorithm, offer one possibility for the efficient solution of these systems. However, traditional preconditioners, such as the in-complete Cholesky (IC) factorization [24], typically do not lead to optimal-scaling solution algorithms, even for single-phase fluids. An alternative to PCG with single-level preconditioners is a deflated variant of PCG, called DPCG, see, e.g., [29, 51]. The addition of a deflation technique serves to remove components associ-ated with small eigenvalues that cause the slow convergence of PCG. In many applications, DPCG has been shown to be an efficient method, such as in porous-media [50] and ground-water flows [27]. Recently, DPCG has also been applied successfully to bubbly flows [40–42].

Another option for the efficient solution of the pressure-correction equation is the use of multigrid tech-niques that are applicable in more general settings. The black box multigrid technique, first introduced in [2], uses geometrically structured coarse grids in combination with an interpolation operator designed to account for the effects of jumps in the diffusion coefficients to achieve fast multigrid convergence in many situations [4, 8, 10]. Algebraic multigrid, or AMG, is also known to be effective for elliptic problems with jumps in their coefficients [33, 36], achieving this efficiency by tailoring both the coarse-grid structure and interpolation operator to account for the jumps in the coefficients. While both of these solvers have been applied successfully in many cases, the modelling of bubbly flows provides some unique challenges. In partic-ular, in simulations with bubbles that appear at (or below) the finest resolution of the grid, these techniques may encounter difficulties in treating such small-scale effects through adapting the coarse-scale models. In these cases, we have found that using the above multigrid algorithms as preconditioners within PCG easily restores their optimal convergence behavior at a minimal extra cost.

Algebraically, DPCG and PCG with a MG preconditioner are strongly related to each other, see [27, 39]. In both algorithms, the same choices must be made. The fine-grid smoother or preconditioner is chosen to give effective treatment of certain modes of error. A complementary space is defined, in terms of a set of deflation vectors or the range of the multigrid interpolation operator, and an optimal correction over this space is computed using a variational approach. However, the preferred choices for these components are quite different between the two methods. While deflation techniques are typically based on a strong fine-scale preconditioner (e.g., IC) in combination with a coarse-scale correction over a very small space, multigrid techniques typically make use of a rather weak fine-scale smoother (e.g. a Jacobi or Gauss-Seidel iteration) in combination with a coarse-scale correction over a space that is a large fraction of the fine-scale problem size. Furthermore, the treatment of the linear systems associated with the coarse scale are handled differently. Deflation techniques typically solve these systems using a direct or iterative method, whereas a recursive procedure is used in the multigrid approach.

In this paper, we make a detailed comparison between deflation and multigrid methods with their own typical parameters for problems of bubbly flow. While several previous papers have detailed the application of deflated PCG methods to bubbly flows [40–42], the use of advanced multigrid methods for these flow simulations has, to our knowledge, not been previously considered in the literature. As well, the systems of equations that arise in three-dimensional multiphase flows offer a good opportunity for comparison of these two families of solvers. In contrast to previous, theoretical comparisons, we focus here on evaluating each solver using its most advantageous selection of options.

The remainder of this paper is organized as follows. In Section 2, we review the solution of the Navier-Stokes equations using the standard operator-splitting approach. Section 3 presents the details of the

(5)

defla-tion and multigrid methods considered here. Some implementadefla-tional details of the two families of solvers are compared in Section 4; Section 5 presents numerical comparisons of the different solvers for three-dimensional simulations of bubbly flows in practical settings. Conclusions are given in Section 6.

2. Problem Setting

Bubbly flows are governed by the Navier-Stokes equations, ∂u ∂t + u · ∇u = − 1 ρ∇p + 1 ρ∇ · µ ∇u + ∇u T_{+ f,} ₍₁₎

subject to an incompressibility constraint,

∇ · u = 0, (2)

where u = (u, v, w)T _{is the velocity vector, and ρ, p, µ and f are the density, pressure, viscosity and source}

vector (consisting of, for example, gravity and interface tension forces), respectively. We assume the density and viscosity are constant within each fluid. At the boundaries of the domain, we impose Dirichlet boundary conditions for the velocity.

Equations (1) and (2) are solved on an equidistant Cartesian grid in a rectangular domain using a pressure-correction method [48]. These equations are discretized using finite differences on a uniform staggered grid with n cells in each direction, where the grid points of the pressure variables are located at the cell centers, and the grid points associated with velocity components are located at the cell-face centers.

In the pressure-correction method, a tentative velocity vector, u∗, is first computed from: u∗− ul

∆t = −∇ · u

l_ul₊1

ρ∇ · µ ∇u

∗_{+ (∇u}l₎T_, ₍₃₎

where ul_{denotes the value of u at time step l. The resulting system of equations for unknown vector u}∗ _is

solved, for example, using the PCG method. The velocities at the new time step, l + 1, are computed from ul+1_{− u}∗

∆t = −

1 ρGp + f, under the constraint of (2). This yields

     ul+1 = u∗+ ∆t −1 ρGp + f , Dul+1 = 0, (4)

where D represents the discretization of the divergence operator, and G is the discrete gradient operator. Finally, Equation (4) gives

D1

ρGp = D ₁

∆tu

∗_{+ f}_, ₍₅₎

which is known as the pressure-correction equation and also as the Poisson equation with a discontinuous coefficient. Equation (5) can again be solved using, for example, the PCG method. It can, however, be seen that solving (5) requires significantly more computing time than the solution of (3), since the convergence of the iterative process suffers from the highly discontinuous behavior of the coefficient, ρ, but is not ameliorated by the small time step, ∆t. Further details about the pressure-correction method applied to bubbly flows can be found in [46, 47].

Due to the staggered grid, we do not have pressure points at the boundaries of the domain. Explicit pressure boundary conditions are, however, not required in the method, since, in Eq. (5), the velocity boundary conditions are naturally included in the discrete divergent operator, D. It follows implicitly that Neumann boundary conditions hold for the pressure. In this case, the pressure is a relative variable, since the differences in pressure and not its absolute values are meaningful in the pressure-correction method.

Eq. (5) can be written as a symmetric and positive semi-definite (SPSD) linear system

(6)

for N = n3_{, and a singular matrix, A. The solution, x, to (6) is, then, only determined up to a constant;}

i.e., if x1 is a solution of (6), then x1+ c is also a solution for any arbitrary constant vector, c ∈ RN. This

situation presents no real difficulty for the iterative solver, as long as

b ∈ Col(A), (7)

see [19]. It appears that Eq. (7) is always satisfied for our linear system (6), see [40] for more details. Here, we consider two-phase bubbly flows with, for instance, air (a low-density phase) and water (a high-density phase). In this case, ρ is piecewise constant with a contrast, , which is the ratio of the two densities, i.e., ρ =    ρ0= 1, x ∈ Λ0, ρ1= x ∈ Λ1,

where Λ1is the volume occupied by the low-density phase, namely air bubbles in the domain, Ω, and Λ0is

the volume occupied by the high-density phase, namely the fluid domain around these bubbles.

3. Numerical Methods for the Pressure-Correction Equation

Solving (6) is complicated by the number of points necessary to accurately capture the evolving flow. Whether finite difference, finite element, or finite volume techniques are used for the discrete gradient and divergence operators in (5), the resulting matrix will be sparse, but with a bandwidth of n2, in lexicographic ordering. For a discretization on a cube with n = 100 grid points in each direction, this means that the matrix is of dimension N = 106_{, with bandwidth n}2_{= 10}4_{. It is well-known that direct solution techniques without}

reordering require a number of operations that scale as n7 _{for a banded Cholesky decomposition, O(10}14₎

operations in the example above. Thus, here, we consider the solution of (6) using iterative techniques. The solution of (6) using iterative methods is, itself, complicated both by the matrix size and its condition number. Using a regular, Cartesian-product grid with n grid points in each direction guarantees that the matrix, of dimension N = n3_{, also has O(N ) non-zero entries. Thus, the cost of a single matrix-vector}

product is proportional to the number of degrees of freedom in the system and, as a consequence, the cost per iteration of most classical iterative techniques will also be O(N ). The number of iterations required by such a technique to produce a converged solution, however, is dependent on the condition number of A. For any of the standard discretizations, κ(A) depends in the usual way on the problem size, κ(A) ∝ n2_.

For multiphase flow, however, this simple bound is compounded by variations in the material parameters between the different phases. For example, given a density contrast, ≤ 1, κ(A) ∝ 1. Thus, solution of (5) is more complicated than solution of the corresponding homogeneous Poisson equation.

3.1. Conjugate Gradient and Preconditioning

Because of the symmetry and (semi-)definiteness of the matrix, A, a natural choice for the iterative solution of (6) is the conjugate gradient (CG) algorithm. However, because of the ill-conditioning of A, we expect CG to converge quite slowly. Indeed, using the standard bound [15],

kx(L)− xkA≤ 2 p κ(A) − 1 pκ(A) + 1 !L kx(0)− xkA, (8)

where x(L)_{denotes the approximation to x, the solution of (6), after L steps of CG, x}(0) _{denotes the initial}

approximation, and κ(A) is the condition number of A, we see that the expected number of iterations required to achieve a fixed tolerance grows with κ(A).

For this reason, CG is rarely applied directly to (6) but, rather, is typically applied in preconditioned form, which implicitly uses the matrix M−1A in place of A, for some SPD matrix, M , such that M−1r is easily computed for an arbitrary residual, r. The incomplete Cholesky factorization of A [24] is one commonly-used preconditioner, which is known to greatly improve the performance of the CG iteration for moderate matrix

(7)

sizes, at relatively little additional cost. Incomplete Cholesky preconditioned CG, or IC-CG, is not, however, an ideal iteration, as is well-known for the single-phase flow case, where the number of iterations required to solve (6) still grows with the problem size. It has also been observed that the performance of IC-CG diminishes under variation of some of the other problem parameters; see results in Section 5.

3.2. Deflation Approaches

Writing the linear system that is solved in IC-CG as M−1Ax = M−1b,

where M denotes the Incomplete-Cholesky preconditioner, we could try to improve the performance of the IC-CG method by including a second operation in the preconditioner,

M−1P A˜x = M−1P b, (9)

for some operator, P . The choice of P should then, as in the original choice of M , be done to complement the spectrum of M−1A. Deflation methods [29] choose P based on a two-grid projection approach that is easy to implement within an existing preconditioned CG code.

In many applications, including the one considered here, it is the subspace associated with the small eigenvalues of A, and of M−1A, that causes slow convergence of the conjugate gradient iteration. If this subspace is small, and if an approximate basis for this subspace is known explicitly, then P could take the form of an inexpensive projection of components in this subspace out of an approximation or a residual. Writing the matrix, Z, as the matrix of basis vectors,

Z := [z1z2 · · · zK], (10)

for vectors zi, i = 1, . . . , K, and choosing P = I − AZE−1ZT, with E = ZTAZ, gives P as the orthogonal

projection onto the complement of the range of Z in the A-inner product. With this choice of P , we can view the system given in (9) as a preconditioned system for the solution of the singular system, P A˜x = P b. Because P A is singular, we must postprocess the solution, ˜x, of (9) to get the solution of (6). Given Z, E, and P defined as above, it is easy to see that the solution, x, to (6) can be expressed as

x = ZE−1ZTb + PTx,˜ see, e.g., [27].

If the subspace associated with the small eigenvalues is not known exactly, or the subspace is too large to efficiently treat exactly, deflation techniques based on other vectors may be more effective. Approximate eigenvectors [30,34], vectors based on Krylov subspace recycling processes [21,31], or vectors based on domain decomposition [22, 29] may all be used to define effective deflation approaches. Here, we consider vectors based on domain decomposition. In particular, we consider the division of each axis of the (discrete) domain into k non-overlapping equi-sized “super-cells”, and define K = k3 _{subdomains using the tensor-product}

grid generated by these intervals. With each subdomain, we associate a single deflation vector, zi, which

takes the value 1 at each grid point included in subdomain i, and value 0 at all other grid points. With this choice for the deflation vectors, the deflation matrix, Z, as defined in (10), is quite sparse, and computations involving Z and P may be performed relatively efficiently. The subdomain vectors give good approximations to a large subspace associated with the small eigenvalues of the preconditioned system, M−1A [41].

If K becomes large, the coarse linear systems involving E become more costly to solve and, in particular, use of standard sparse direct solvers may be inefficient. Instead, an iterative solver can be adopted to deal with these coarse linear systems. In this paper, we use a variant of Deflated IC-CG, or DIC-CG, where the coarse linear systems are, themselves, solved using IC-CG [40]. Of course, as K increases, even standard iterative solution of the coarse-level systems can become quite expensive, and deflation-like techniques should also be incorporated into the coarse-level IC-CG algorithm, leading to a recursive multi-level deflation method [11]. This is, however, not considered in the results presented here.

(8)

3.3. Multigrid Approaches

Since the equation for the pressure closely resembles a diffusion equation, another class of techniques to consider for the solution of (5) is the family of multigrid methods. In the case of single-phase flow, in particular, the equation for pressure reduces to the case of a simple Laplace equation, for which geometric multigrid methods are known to provide optimal solution techniques [43]. For multiphase fluids, more com-plicated multigrid techniques are necessary to achieve optimal performance; such techniques are well-known in other fields. Here, we present the details of these methods, and their specialization to solving (5).

Just as deflation methods use a correction over a small subspace to account for the deficiencies of a classical preconditioner, all multigrid methods combine the use of a coarse-scale (or coarse-grid) correction process that is aimed at correcting modes that a fine-scale iteration (or relaxation) is slow to resolve. Different multigrid methods differ in the choices made in these two processes, particularly in the details of how the coarse-scale space is chosen and how that correction is computed. Here, we will concentrate on multigrid methods that make use of simple, pointwise smoothers, such as the Jacobi or Gauss-Seidel iterations, and consider four different choices for the coarse-grid correction procedure. A brief introduction into classical multigrid techniques is presented in the following subsection; for more details, consult, e.g., [43].

3.3.1. Geometric Multigrid

For the case of single-phase flows discretized on a regular grid, the use of geometric multigrid techniques has long been studied (cf. [43]). These methods are based on the realization that, while simple iterations, such as weighted-Jacobi and Gauss-Seidel, do not efficiently resolve all modes of the solution, they do quickly and effectively damp a large subspace of errors for a large class of matrices, including those considered here. In particular, for single-phase flow (constant density), the errors that are not quickly damped by these simple iterations can be seen to be dominated by so-called “smooth” error modes; errors that vary slowly between neighboring grid points. It is, thus, the job of the coarse-grid correction process to attenuate exactly these modes.

An important difference between the coarse-grid correction processes in multigrid and deflation techniques is the size of the subspace identified for coarse-grid correction. While deflation aims for a correction over a much smaller subspace than the fine-scale problem size, the size of the coarse-grid problem in a multigrid method is specifically chosen to be a relatively large fraction of the fine-grid size; typical coarsening rates are by a factor of 2 or 3 in each dimension. Such slow coarsening is justified by considering the convergence behavior of the complementary stationary iteration; for any fixed reduction tolerance, the number of error modes which are reduced in magnitude by a factor larger than that tolerance (i.e., the number of slowly converging modes) increases with the size of the fine-grid matrix. Thus, to achieve convergence that is truly independent of problem size using classical stationary iterations requires that the size of the coarse-grid problem always remains a fixed fraction of the fine-grid problem size.

Here, we consider a geometric multigrid approach using Gauss-Seidel relaxation. For maximum efficiency, the algorithm is specialized to grids with the number of grid points in each dimension of the form p2q_,

where p is a small integer (typically, p = 1 or 3), so that the coarse grid may be chosen by reducing the grid size by a factor of 2 in each direction at all levels. Interpolation is trilinear cell-centered interpolation, while restriction is taken as cell-centered piece-wise constant restriction [25]. Relaxation and residuals on the coarse grid are realized using direct discretization of the fine-scale homogeneous problem on the coarse grid. This is possible because of the assumption of constant-density for the single-phase (incompressible) flow; for the two-phase flows considered here, more sophisticated techniques, described below, are needed to discretize, on the coarse scale, a fine-scale flow that may have phase boundaries that cannot be represented on the coarse grid.

While geometric multigrid techniques often yield the fastest solvers for the constant-coefficient version of (5), several complications arise in extending these techniques to the case of variable density. The smooth-ing properties of simple relaxation techniques, such as Jacobi or Gauss-Seidel, are highly dependent on the density; in the case of non-constant density, the dominant errors after relaxation may exhibit sharp transi-tions and/or cusps, which must be accounted for in the coarsening process. Furthermore, if the variatransi-tions

(9)

in density have fine-scale features (as we expect for bubbly flows), it may not be clear how best to represent the equations on the coarser grid, as needed in the multigrid process. In the following subsections, we discuss several approaches for overcoming these obstacles.

Another approach to overcome these complications is to use the geometric multigrid method as a pre-conditioner for the conjugate gradient iteration. For a problem with bounded density, ρmin≤ ρ ≤ ρmax, we

expect 1 ρmax xTA1x ≤ xTAρx ≤ 1 ρmin xTA1x,

where Aρ is the discretization of (5) with density ρ, and A1 is the discretization of the constant-coefficient

Poisson problem. Thus, κ(A−1₁ Aρ) ≤ ρ_ρmax

min, where we ignore the common null-spaces of A1and Aρ. This means

that, for small density contrasts, the constant-coefficient and variable-coefficient problems are spectrally equivalent, with a small spectral-equivalence bound. A good solver, such as geometric multigrid, for the constant-coefficient case is then expected to make a good preconditioner for the variable density case, so long as the density contrast is not too significant. Results for this approach are reported in Section 5 as method GMG-CG. All of the other multigrid approaches that we develop here may be applied both as a standalone solver and as a preconditioner for CG. In what follows, we discuss only the case of these techniques being used as standalone solvers; see Section 3.3.5.

3.3.2. Galerkin Coarsening

While GMG-CG (and geometric multigrid in general) performs well when the density contrast is small, its performance suffers greatly when problems with large density contrasts are considered (as shown in Section 5). Improving the multigrid performance requires improvement in one (or both) of the multigrid components, relaxation or coarse-grid correction. In GMG-CG, however, direct relaxation on (5) is replaced by relaxation on the constant-coefficient problem as a preconditioner for the variable-density problem of interest. Thus, a first step in improving the performance of GMG-CG would be to replace the relaxation on the homogeneous problem with that on the real problem of interest.

Making this improvement on the fine scale is simple to implement; both the constant-density and variable-density problems are well-defined on the fine scale, and it is relatively simple to replace relaxation on one with relaxation on the other. On coarse scales, however, we have no direct representation of the fine-scale problem, unless it is possible to represent the variation in the density naturally on the coarse scale. For many flows of interest, this will clearly not be the case for all coarse scales in a multigrid hierarchy. Thus, we need some indirect way to account for fine-scale variations in the density directly in relaxation on the coarse scales.

There are many possible ways to create a coarse-scale model with a fine-scale density distribution; the problem of numerical homogenization, or upscaling, has been studied in many disciplines [52]. While these techniques focus on defining an effective density coefficient that can be naturally represented on the coarse scale only, we will instead focus on the multigrid point of view that the primary purpose of relaxation on the coarse scales is not to represent the flow on those scales but, rather, to compute an appropriate correction to the errors in the fluid pressures on the fine scale.

Using the coarse-grid models to improve a fine-grid approximation to the pressure naturally leads to the question of how good a correction is possible from a coarse grid. Mathematically, we consider a fixed coarse grid and interpolation operator, Z, that maps from the coarse grid to the fine grid. Asking for the best possible correction from the coarse grid, i.e., the best correction in the range of Z, means that we wish to minimize some norm of the error, e, in our approximation, x(L)_{, to the discrete pressure, x, that satisfies (6).}

Writing the corrected approximation as ˆx(L) _{= x}(L)_{+ Zy, for some vector y, this means that we wish to}

minimize

kˆek = kx − (x(L)+ Zy)k = ke − Zyk.

Both the minimum value and the coarse-grid vector, y, for which the minimum is achieved depend strongly on the norm chosen for the minimization. Choosing the A-norm, kwk2

A= w

T_{Aw, implies that the optimal}

choice for y satisfies

(10)

That is, the best possible coarse-grid correction for a fixed choice of the multigrid interpolation operator, Z, may be expressed in terms of a coarse-grid matrix, E = ZT_{AZ, and restriction of the fine-grid residual,}

b − Ax(L)_{, using Z}T _{as the restriction map. This choice of coarse-grid and restriction operators is known}

in the multigrid literature as Galerkin coarsening [28, 43] because of its close relationship to Galerkin finite elements.

A first generalization of GMG-CG is then to consider the multigrid method with interpolation, Z, given as in GMG-CG, but with relaxation on all levels replaced by Gauss-Seidel relaxation on the finest-grid matrix and its Galerkin restrictions. This technique, which we will denote by CCMG, was first proposed for problems similar to (5) in [53] and later studied in [20, 49].

3.3.3. The Black Box Multigrid Method

While CCMG offers a great improvement over GMG-PCG in terms of its scalability, its performance still degrades as the density contrast increases (see, for example, Table 1 in Section 5). As CCMG arose through improvements to the relaxation phase in GMG-PCG, we now consider the role of interpolation in the performance of CCMG. Of particular importance in achieving consistent multigrid performance regardless of the density contrast or configuration of the flow is the principle of complementarity; as multigrid methods aim to reduce errors through two distinct processes, relaxation and coarse-grid correction, optimal multigrid performance can only occur when these processes are appropriately complementary.

While we could aim to improve the performance of CCMG by making further improvements to the relax-ation routine, such improvements often dramatically increase the cost of the iterrelax-ation. Instead, we aim to improve the multigrid performance by making a different choice for the interpolation operator, Z, to better complement the performance of lexicographical Gauss-Seidel relaxation. It has long been recognized that for problems with discontinuous coefficients, such as (5), the errors left after relaxation are not smooth, as in the case of constant-coefficient problems [2]. Thus, while coarse-grid correction with a fixed interpolation operator, such as those analyzed in [25] and discussed above, may be used to effectively complement relax-ation for constant-coefficient problems, they are less appropriate when the problem contains large jumps in its coefficients.

The solution to the problem with large jumps in coefficients, first discussed in [2] and further developed in [8, 9], is to allow the coefficients of the interpolation operator, Z, to depend on the coefficients of the matrix, A. Such an operator-induced interpolation is better able to reflect the slow-to-converge errors of relaxation as these errors are, themselves, dependent on the variation in A. The technique of the Black Box Multigrid Method, or BoxMG [8], defines the coefficients of interpolation to a fine-grid point by combining the entries of the matrix in the rows corresponding to the grid point and its graph neighborhood.

In three dimensions, the BoxMG algorithm assumes that the fine-grid matrix comes from the discretization of an equation such as (5) on a logically rectangular grid [9]. The fine-grid operator is then assumed to have at most a 27-point connectivity structure; for each grid point, connection is only allowed to grid points that reside in grid point neighboring (possibly only at corners) that in which the grid point lies. The coarse grid is constructed by removing every other plane of grid points in each direction, in contrast to the coarsening used in GMG and CCMG, where finite volumes were aggregated in pairs in each direction. Interpolation in BoxMG then falls into four categories: fine-grid points may be themselves coarse-grid points (as the coarse grid is embedded in the fine grid), they may lie on the line segment connecting two coarse-grid points, they may lie in the same plane as four coarse-grid points, at the center of the square defined by these grid points, or they may lie in a plane with no coarse-grid points, at the center of the cube defined by 8 coarse-grid points.

Interpolation of corrections to embedded coarse-grid points is always done using injection, as the errors at these grid points are directly represented on the coarse grid. For fine-grid points lying between two coarse-grid points, the interpolation weights are defined by first adding the matrix entries in the row of A corresponding to the grid point along the planes orthogonal to the connecting line, collapsing the 27-point stencil to a 3-point stencil joining these three grid points. The correction to the grid point is then computed by setting this three-point stencil to zero, substituting in the injected corrections at the coarse-grid points and solving for the correction at the fine-grid points. A similar approach is used for fine-grid points lying

(11)

at the center of a square in the plane of four coarse-grid points, however first the four other neighboring grid points in that plane are resolved using the first approach. In this way, the 27-point stencil need only be collapsed to a 9-point stencil, which can be treated using the previously computed corrections as in the 3-point stencil case. Finally, the correction to the grid point at the center of the cube defined by 8 coarse-grid points may be calculated directly, by satisfying the 27-point stencil at this grid point using the corrections computed at all of its neighbors [4].

The coarse-grid matrix in BoxMG is again defined using a Galerkin coarsening, with Z defined as described above. It can be verified that, if the fine-grid stencil has its non-zero connections confined to within a 27-point stencil pattern, then the coarse-grid operator also has 27-point connectivity. Thus, the BoxMG method may be applied recursively to define a full multigrid hierarchy. While there are many ways to use knowledge of the discrete operator, or of the density distribution itself, to define the multigrid hierarchy, the approach taken in BoxMG has been shown to be successful for a wide variety of problems. In [26], it is shown that one of the reasons for the success of BoxMG is that defining interpolation in this way approximately preserves the continuity of the normal flux,_ρ1∇p· n, across an interface with a jump in the density. Thus, BoxMG can be thought of as combining effective multigrid principles with useful physical insight in achieving a stable and efficient solution algorithm.

3.3.4. Algebraic Multigrid

With the early papers on BoxMG [2, 4, 8], it was recognized that the combination of operator-induced interpolation and Galerkin coarsening can lead to very robust methods for a wide class of problems. The algebraic multigrid method [5,33], or AMG, is an algorithm based on a different implementation of the same principles, but which can be applied to an even larger class of problems. In particular, AMG can be applied to problems without a regular grid structure and allows for the choice of unstructured coarse grids regardless of the fine-grid operator structure.

The central idea of AMG is that all components of the coarse-grid correction cycle should be determined by the properties of the fine-grid operator. The first step in coarsening is then to determine whether two grid points that are connected in the fine-grid operator (grid points i and j are said to be connected if aij 6= 0)

are connected in a significant way. Each grid point, i, is said to strongly depend on any of its neighboring grid points for which aij is of similar size as the largest entry in row i. For M -matrices, such as the natural

discretization of (5), we define the set of grid points that i strongly depends upon as

Si= j : −aij ≥ θ · max J 6=i{−aiJ} ,

for some suitable θ, 0 < θ ≤ 1. Once these strong connections have been identified, a coarse grid is formed by taking a maximal independent set of the graph created by the set of edges, aij, where j ∈ Si.

To define interpolation in AMG, a similar strategy is used to collapse connections between grid points that appear only on the fine grid and define an interpolation operator. Choosing the coarse grid through the maximal independent subset algorithm described above implies that the coarse-grid points are embedded in the fine grid. For any fine-grid points, i, that is not also a coarse-grid point, interpolation can be defined by collapsing the connections from i to other fine-grid points, j, based on their common coarse-grid neighbors. Unlike BoxMG, this is done without using any intuition into the couplings involved; the elimination of these fine-fine connections is a purely algebraic operation. Each fine-fine connection is replaced using a weighted average of the coefficients connecting the fine-grid point, j, to the coarse-grid neighbors of grid point i; see [33, 36] for details.

Because both the choice of the coarse grid and of the interpolation operator in AMG are determined based on the fine-grid operator, we expect AMG to have convergence properties similar to, or possibly even better than, those of BoxMG. However, the price paid in AMG for this robustness is the use of completely unstructured matrix and vector data structures, as a result of the unstructured grid hierarchy. Additionally, the cost of the additional operations to compute the coarse grid (and, in fact, a more expensive computation of the interpolation weights) makes AMG an expensive alternative in situations where BoxMG (or CCMG or GMG) is expected to perform well. Nevertheless, AMG is often the method of choice in commercial codes

(12)

where robustness is considered more important than achieving the fastest possible solution time. Indeed, in CFD, AMG solvers have been recognized as an important tool for achieving efficient solution in a wide variety of flow regimes [32].

3.3.5. On the need for preconditioning

While the multigrid methods discussed here are typically considered as standalone solvers, it is sometimes useful to also consider them as preconditioners for CG. Both BoxMG and AMG aim to directly treat the fine-scale structure of the density field through the use of operator-dependent interpolation algorithms. It is possible, however, that operator-dependent interpolation alone is not sufficient to yield an optimal solution algorithm in all situations.

Consider a case with contrast , where a small bubble with density is embedded in a background medium with unit density. In both AMG and BoxMG, interpolation weights near the boundary of the bubble are determined to be proportional to the matrix entries in that region. Because the matrix comes from a discretization of (5), these matrix entries will be proportional themselves to the inverses of the densities; some matrix elements will have size O 1 and some will have size O(1). For small , both algorithms typically give small interpolation weights from a coarse-grid point that is located outside of a bubble to a fine-grid point inside of the bubble. Physically, this also makes sense. Across the bubble boundary, we expect to see continuity of both the pressure, p, and its normal flux, 1∇p · n. When there is a large jump in between the bubble and the background, we expect to also see a large jump in ∇p across the bubble’s edge, which makes interpolation across this edge difficult, particularly in the direction of high density (small ∇p) to low density (large ∇p).

Following this reasoning, we see that it is quite natural for AMG and BoxMG to not interpolate signif-icantly across bubble boundaries. A natural requirement from this point of view would be to require, for each fine-scale bubble, a sufficient number of coarse-grid points to lie within the bubble, to allow for an accurate computation of a coarse-grid correction for all grid points within the bubble. In a real-life simu-lation, however, this may not be practical, due to bubbles and droplets that may only be resolved at the size of a single grid point on the fine scale. In Table 3, below, we see that the number of iterations for both BoxMG and AMG without the use of a Krylov wrapper increase as the number of bubbles increases. While such an increase is not dramatic, it can easily be attenuated by the use of these multigrid methods as a preconditioner for CG; in this case, the multigrid method gives good convergence for almost all types of errors, and the CG acceleration effectively resolves the few error modes associated with these small bub-bles. In Section 5, we denote the experiments with CCMG, BoxMG, and AMG preconditioners for CG by CCMG-CG, BoxMG-CG, and AMG-CG, respectively.

4. Implementation and Computational Cost

In the numerical experiments in Section 5, we make use of standard implementations of the methods discussed above, when available. In the following subsections, we discuss the relative costs of these techniques, as well as their scalability for large problem sizes. Relative to CG by itself, or even to IC-CG, all of the other methods considered here have a larger cost per iteration. Their utility lies in the significant reduction in iterations possible using a multilevel technique as compared to a single-level method, such as IC-CG. Here, we stress the details of the relative costs of a single cycle of these algorithms, as a prelude to the numerical results in Section 5.

4.1. Cost of Deflation

The computational cost of the deflation method is discussed in [40]. The setup of the deflation method is rather cheap, since Z may be constructed independently of the problem matrix. Furthermore, it is not necessary to store Z explicitly in memory; AZ and E may be computed beforehand. In the 3D case, construction of AZ and E can each be done in O(n2k) flops, assuming that E has dimensions K × K, for K = k3.

(13)

Moreover, DIC-CG needs only one more step than IC-CG at each iteration. The additional cost for the deflation step is O(N +θ) flops, where θ is the number of flops required for the inner solves involving E. Using IC-CG as inner iterative solver, then each inner iteration costs O(K) flops, and at most O(k) iterations are required to achieve sufficient accuracy in the inner solve (based on a simple condition-number bound and (8)), leading to θ = O(k4) operations. Note that, because AZ can be precomputed and is much sparser than A, there are no additional matrix-vector multiplications with A required at each iteration of DIC-CG.

While, in principle, DIC-CG may be applied in an unstructured manner, the implementation considered here is based on the assumption of a rectangular tensor-product grid. This allows significant savings in both the storage and the computational cost required by the iteration, as structured matrix data structures may be used in place of the more general (and more costly) storage required by an unstructured implementation. In this sense, the DIC-CG methods tested here are more comparable to a geometric multigrid approach than to AMG, although the DIC-CG algorithm, in principle, could be applied in the same unstructured settings as AMG.

It is also important to note that the implementation of DIC-CG is slightly different than for the other methods considered here. While all other methods are realized as PCG iterations for solving Ax = b with various choices of preconditioners, DIC-CG is implemented as a preconditioned system for the deflated system, as in (9). This means that the stopping criterion within the DIC-CG iteration is based on the deflated residual, P b − P A˜x, and not on the original residual, b − Ax. In exact arithmetic, P b − P A˜x = b − Ax and, so, this has no effect on the applicability of DIC-CG. It does, however, have some effect on the overall accuracy, as some additional numerical error is introduced in the post-processing of the solution generated by DIC-CG to get the solution to the original linear system, Ax = b.

4.2. Cost of Multigrid

The relative costs of the multigrid methods studied here, in principle, increase with the complexity of the algorithm. Geometric multigrid can easily be implemented in a very efficient manner. In fact, because GMG-CG uses relaxation only on the homogeneous problem, it may be implemented in a fully matrix-free manner. The same stencil is applied everywhere on each level, up to boundary conditions, and the simple transfer operators can be implemented again with constant stencils away from the boundaries. Thus, the true computational cost of a single iteration of GMG-CG is much smaller than that of a method with similar number of operations because of the optimization possible under the assumption of constant coefficients in the operator.

While multigrid methods use much finer coarse grids than typical deflation methods, their recursive treatment of these grids leads to an overall cost per iteration that remains O(N ). Consider, for example, the cost of a single GMG V(1,1) cycle (that is, the cost of a single preconditioning step in GMG-CG). On each level of the multigrid hierarchy, 2 relaxation sweeps are performed at the appropriate resolution. On every grid, the cost of these relaxation sweeps is directly proportional to the size of that grid. Thus, O(N ) operations are required for each sweep on the finest grid, O(N₈) operations are required for each sweep on the first coarse grid (which has size N

8), and O( N

64) operations per sweep are required on the next coarsest

grid, etc. Overall, the total number of operations required to perform two relaxation sweeps on all levels is then bounded by 16₇ times the cost of a single sweep of relaxation on the finest level, which is O(N ). Similarly, the additional storage requirements for GMG-CG can also be bounded by a small constant times the fine-scale, O(N ) storage requirement for CG.

CCMG adds the cost of structured matrix storage and operations on all levels, as well as that of the Galerkin product in the setup stage of the algorithm. For the numerical results presented in Section 5, we use 64-point interpolation and restriction stencils, corresponding to bilinear interpolation and its adjoint for restriction. This results in some growth in the stencil size on coarser grids, but this growth can be easily quantified and still included in a structured-grid matrix data structure. Alternately, lower-order interpolation and restriction may be used, as in [20], to control the growth of the stencil on coarse grids. While these costs somewhat increase the cost of CCMG relative to GMG, the overall cost per cycle for CCMG remains O(N ), as the added storage and computational costs on each level are bounded by a small constant times

(14)

the number of unknowns on that level.

Here, we have chosen to not focus on an optimal implementation of the CCMG algorithm. As will be seen in Section 5, the iteration counts for CCMG clearly scale less well than those for DIC-CG, BoxMG, and AMG for the problems considered here, particularly as the density contrast increases (Table 1). Instead, the CCMG method has been implemented within the AMG code, with a fixed choice of coarse grids and transfer operators. As a result, the implementation does not use the most efficient data structures and the reported CPU times in Section 5 are much larger than strictly necessary for CCMG, although we stress that the iteration counts and final residuals are the same for this implementation as they would be for a more efficient one. In practice, the setup costs for CCMG should be cheaper than those for BoxMG, due to the fixed interpolation pattern. However, the cost of a single iteration of CCMG is expected to require more operations than a single iteration of BoxMG, as BoxMG uses a 27-point interpolation stencil, compared to CCMG’s 64-point stencil, leading to denser coarse-grid matrices for CCMG when compared with BoxMG.

In contrast to GMG and CCMG, BoxMG is not based on cell-centred data structures. Instead, BoxMG is, primarily, a node-based code; however, its robustness leads to successful results for our cell-centred discretization as well. The added cost in BoxMG is primarily in the setup phase of the algorithm, where coefficients of the operator on each level are used in determining interpolation to that level. Coarsening in BoxMG is also slightly different than that in CCMG and GMG, as it naturally takes nodal fine grids of 2q _{+ 1 grid points in each dimension into nodal coarse grids with 2}q−1_{+ 1 grid points; however, because}

of the use of operator-induced interpolation, BoxMG is also able to successfully solve problems with grids of arbitrary sizes, as will be seen in Section 5, while maintaining multigrid’s typical O(N ) complexity per cycle.

Finally, AMG has the highest cost per iteration of the multigrid (and other) approaches considered here, because of the added cost of its unstructured grid processing and data structures, as well as a significant setup cost. This is to be expected; our choice of a structured-grid discretization naturally suggests that the best efficiency will be obtained with a structured-grid solver. Results for AMG are included to answer two interesting questions. First, it is interesting to see how much of a performance loss is seen with these unstructured data structures; results in Section 5 suggest that AMG is typically a factor of 10 to 15 times slower than BoxMG, when used as a preconditioner. Secondly, we note that while the CPU-time cost of AMG is significantly greater than that of the multilevel structured-grid codes (indeed, it is sometimes greater than that of simple IC-CG), the iteration counts for AMG-CG are quite good (typically comparable to those of BoxMG). This demonstrates the scalability and robustness seen with AMG, while highlighting the advantages of using a structured-grid algorithm when possible.

4.3. Singularity of Coefficient Matrix

As mentioned in Section 2, the coefficient matrix, A, is singular. A non-unique solution, x, always exists, because the linear system (6) is consistent due to (7). However, extra care should be taken in the implementa-tion of the methods we compare. Matrix E is often singular, due to the singularity of A and the construcimplementa-tion of Z. In this case, E−1does not exist, and instead the pseudo-inverse, E+_{should be used in the operator P .}

The iteration process using E+ _{does not cause any difficulties in DIC-CG, since the corresponding systems}

are consistent, see [40] for more details. The multigrid iterations are similarly insensitive to the singularity on all levels but the coarsest, as only iterative approaches are used in reducing errors at all other levels. On the coarsest level, the known form of the null space of E allows a simple perturbation and projection technique to be used in the direct solve of this system; see [10] for details.

4.4. Parallelization

While the tests performed in Section 5 are all done in a serial computing environment where, because of the efficiency seen in the best approaches, problems of up to 8 million degrees of freedom are easily handled, it is important to stress that this was done for convenience alone and not because of any inherent serial nature of the algorithms considered. Indeed, much effort has been invested in the parallelization of exactly

(15)

the algorithms considered here. Parallelization of deflation solvers has been considered in [14], where it is shown that, with a sensible alignment of the subdomains and processor boundaries, deflation applied to block-incomplete Cholesky preconditioners can be implemented with limited increase in cost over that of the parallel matrix-vector multiplies already required by block-incomplete Cholesky preconditioned CG.

Parallelization of standard multigrid methods has been considered for many problems and many architech-tures; see, for example, [13, 18]. Similarly, parallel implementations exist for BoxMG [3] and AMG [16]. Because of the use of pointwise smoothers in the relaxation step, virtually no parallel communication is necessary when block-Jacobi relaxation is used in place of pointwise Gauss-Seidel. In AMG, parallel com-munication and inherently serial algorithms is a well-studied issue, particularly with respect to the choice of coarse grid [1].

4.5. Implementation

While we stress that there is nothing “out of the ordinary” in the multigrid implementations considered here, it must be acknowledged that there have been many less-successful attempts to apply multigrid to these problems. Therein lies the attractiveness of the DIC-CG method; given an existing code, with existing data structures and single-level preconditioners, it is relatively simple to add an effective deflation step to the existing pressure solver. In contrast, use of the best-available multigrid codes (as considered here) requires some translation of the discrete problem into data structures that are more natural for multigrid treatment. This trade-off is at the root of the questions investigated here. While a greater investment of programming effort may be necessary to implement a robust, efficient multigrid solver, such as BoxMG, this effort appears to pay off in the reduced computing times seen in the following results.

5. Numerical Experiments

In this section, we present some numerical experiments with 3-D bubbly flow problems as described in Section 2. First, stationary problems are considered, where the time step, l, is fixed and all of the presented methods will be compared. Then, real-life simulations will be carried out, where the performance of the best methods from the stationary problems will be further investigated. The computations are performed on an Intel Core 2 Duo (2.66 GHz) computer with a memory capacity of 8GB. The code is compiled with the Intel FORTRAN compiler, ifort, on LINUX.

5.1. Stationary Problems

Stationary 3-D bubbly flows are considered, where a 1 cm3 cube is fully filled with one fluid and m spherical bubbles of another fluid can appear. The bubbles have the same radius, s, and are located in a ‘structured’ way in the fluid. The geometry of some test cases can be found in Figure 1. The aim is to solve the resulting linear system (6), where n grid points are chosen in each spatial direction, so that the total number of degrees of freedom is N = n3_{. We will examine all of the methods described in Section 3.}

The experiments will be divided into several parts, where some parameters will be varied to see how they affect the performance of the methods: N (total number of degrees of freedom), m (number of bubbles), s (radius of each bubble) and (contrast between the phases). The results of the experiments will be given in terms of the required computing time for the pressure-correction system (CPU), including both the set-up and solution time, number of iterations or cycles (IT), and the obtained accuracy (RES), measured as the relative norm of the residual, ||b − Ax(L)_||

2/||b − Ax(0)||2, where x(L)denotes the L-th iterate.

For all CG methods, the starting vector is the zero vector (i.e., x(0) _{= 0), and the stopping criterion}

is chosen as ||r(L)||2/||r(0)||2 < δ = 10−8, where r(L) is the CG-generated residual at step L. In theory,

||r(L)_||

2/||r(0)||2should be equal to RES, but they might differ in the experiments due to roundoff errors [15,

Section 7.3].

In DIC-CG, we typically take k = n₈; k = 43, 83, 163 are chosen for N = 323, 643, 1283, respectively. Moreover, in the stationary MG methods, it is common to use the stopping criterion based on the real

(16)

residuals, (i.e., ||b − Ax(L)_{||/||b − Ax}(0)_{|| < δ = 10}−8_{). Therefore, RES is always below δ for the stationary}

MG methods, while this is not necessarily the case for the CG methods.

(a) m = 1 and s = 0.1 cm. (b) m = 8 and s = 0.05 cm. (c) m = 27 and s = 0.025 cm. Fig. 1. Geometry of some stationary bubbly flows.

5.1.1. Varying Contrasts

The results for the test problems with varying contrast, , are given in Table 1.

A smaller typically corresponds to a linear system whose coefficient matrix is more ill-conditioned. Therefore, most methods need more iterations and computing time as decreases, as shown in Table 1. However, the performance of BoxMG-CG and BoxMG appears to be independent of the contrast. In addition, they are the fastest methods in the experiments, followed by DIC-CG for = 10−3 or = 10−5. Usually, the performance of DIC-CG can also be made independent of , if the number of deflation vectors is chosen to be sufficiently large; however, this scalability generally comes at an increased cost for all values of . Moreover, we observe in Table 1 that GMG-CG and CCMG are very sensitive to : the number of iterations grows quickly with decreasing . In fact, for = 10−1, we see that GMG-CG is competitive with both BoxMG and DIC-CG as a solution technique, For smaller , however, the significant increase in number of iterations makes GMG-CG a much less attractive option. For CCMG-CG and AMG-CG, the number of iterations only grows slowly. As mentioned in Section 4.2, CCMG-CG is not implemented as efficiently as possible. Nevertheless, we can get an idea of the cost of its optimal implementation by comparing with the performance of BoxMG. CCMG-CG is at least as expensive per iteration as BoxMG, so, in the case of = 10−3, CCMG-CG would require at least 1.2 sec instead of 4.3 sec and may, therefore, be slightly faster than DIC-CG. However, for = 10−5, CCMG-CG would need at least 2.5 sec, so DIC-CG appears to be faster than CCMG-CG for larger contrasts.

We also observe in Table 1 that the accuracy of DIC-CG is consistently the worst when compared with the other methods, although the differences are generally quite small. As mentioned in Section 4.1, this is caused by the fact that the deflated residuals are used in DIC-CG, leading to extra roundoff errors.

5.1.2. Varying Radii

The results for an experiment with varying the radii of the bubbles, s, are given in Table 2. The smallest radius is chosen to be s = 0.01875 cm, because the bubbles are no longer resolved for s < 1/64 = 0.015635 cm. In general, a smaller radius does not significantly affect the conditioning of the coefficient matrix, but it does change the form of the errors that are difficult to resolve, possibly making them more difficult to approximate.

In Table 2, it can be seen that there are changes in convergence behavior of the various methods for different radii. In general, a smaller s leads to a more favorable performance for several of the iterative methods, including IC-CG, GMG-CG and CCMG-CG. The other methods do not have a clear relation with respect to s. This even seems to hold for the stationary methods, CCMG, BoxMG and AMG, which we might expect to be sensitive to the size of the bubbles due to the challenges discussed in Section 3.3.5. BoxMG and BoxMG-CG seem to be fully insensitive to s. They are also the fastest methods in this experiment,

(17)

Test Problem = 10−1 _{= 10}−3 _{= 10}−5

Method CPU IT RES CPU IT RES CPU IT RES

IC-CG 3.1 131 0.1E-7 5.8 244 0.1E-8 6.9 289 0.5E-8 DIC-CG 1.1 35 0.3E-7 1.7 54 0.4E-7 1.9 59 0.4E-7 GMG-CG 1.0 33 0.1E-7 3.9 132 0.9E-8 8.0 267 0.8E-8 CCMG-CG 3.3 10 0.1E-8 4.3 18 0.1E-7 6.7 37 0.4E-8 BoxMG-CG 0.8 12 0.1E-8 0.8 12 0.2E-8 0.8 12 0.2E-8 AMG-CG 8.0 9 0.3E-8 8.9 14 0.1E-8 9.2 16 0.2E-8 CCMG 4.0 17 0.6E-8 11.2 79 0.1E-7 40.9 338 0.1E-7 BoxMG 0.8 17 0.3E-8 0.8 17 0.3E-8 0.8 17 0.3E-8 AMG 8.9 15 0.8E-8 13.0 40 0.1E-7 21.5 92 0.9E-8 Table 1

Convergence results for the experiment with N = 643_{, m = 2}3_{, s = 0.05 cm and varying the contrast, .}

followed by again DIC-CG. It is interesting to note that while GMG-CG is very ineffective in the case of large bubbles, its performance improves as the bubbles shrink and, for the case of s = 0.01875 cm, it becomes competitive with BoxMG and DIC-CG.

Test Problem s = 0.1 cm s = 0.05 cm s = 0.025 cm s = 0.01875 cm

Method CPU IT RES CPU IT RES CPU IT RES CPU IT RES

IC-CG 6.0 250 0.1E-7 5.8 244 0.1E-8 3.8 159 0.8E-8 4.1 170 0.9E-8 DIC-CG 1.3 39 0.3E-7 1.7 54 0.4E-7 1.5 46 0.9E-7 1.5 45 0.8E-7 GMG-CG 4.3 143 0.9E-8 3.9 132 0.9E-8 2.6 85 0.8E-8 1.3 41 0.4E-8 CCMG-CG 5.1 24 0.4E-8 4.3 18 0.1E-7 3.6 12 0.5E-8 3.8 14 0.4E-8 BoxMG-CG 0.8 12 0.3E-8 0.8 12 0.2E-8 0.8 12 0.2E-8 0.8 12 0.1E-8 AMG-CG 8.3 11 0.6E-8 8.9 14 0.1E-8 8.2 12 0.7E-8 8.8 14 0.3E-8 CCMG 13.0 95 0.8E-8 11.2 79 0.1E-7 7.0 43 0.7E-8 10.1 69 0.9E-8 BoxMG 0.8 17 0.4E-8 0.8 17 0.3E-8 0.8 17 0.3E-8 0.8 17 0.3E-8 AMG 10.6 25 0.8E-8 13.0 40 0.1E-7 10.9 29 0.6E-8 12.8 39 0.7E-8 Table 2

Convergence results for the experiment with N = 643_{, m = 2}3_{, = 10}−3_{and varying the radius of the bubbles, s.}

5.1.3. Varying Number of Bubbles

In Table 3, we present results demonstrating the performance of the various solvers with variation in the number of bubbles, m. Note that the test case with m = 0 corresponds to the Laplace equation, i.e., a domain with only one phase. It is known that increasing the number of bubbles leads to the appearance of more large eigenvalues in the original coefficient matrix, A, and small eigenvalues in the Incomplete Cholesky preconditioned coefficient matrix, M−1A, (see, e.g., [41, 50]), resulting in a more difficult linear system to solve, although both the original and preconditioned coefficient matrices are not necessarily worse conditioned.

It can be seen in Table 3 that, for most methods, the convergence worsens with increasing m, as expected. Moreover, GMG-CG is the best method in the case of m = 0, but quickly loses efficiency for m > 0. The number of iterations required by CCMG also grows rapidly with m, whereas it grows gradually for CCMG-CG, AMG-CG and AMG. A single iteration of these methods, however, is more expensive than one of BoxMG or DIC-CG, as mentioned in Section 4. For m > 0, BoxMG and BoxMG-CG are always the fastest

(18)

methods, followed by DIC-CG. The performance of BoxMG-CG degrades only a little with increasing m, while the iteration counts for BoxMG increase more substantially. For a sufficiently large number of deflation vectors, K, DIC-CG would be less sensitive to changes in m.

Test Problem m = 0 m = 1 m = 23 _{m = 3}3

Method CPU IT RES CPU IT RES CPU IT RES CPU IT RES

IC-CG 3.0 125 0.8E-8 3.9 163 0.7E-8 5.8 244 0.1E-8 8.2 342 0.8E-8 DIC-CG 1.0 31 0.2E-7 1.5 47 0.2E-7 1.7 54 0.4E-7 2.0 60 0.7E-7 GMG-CG 0.3 8 0.1E-8 3.7 124 0.8E-8 3.9 132 0.9E-8 9.8 329 0.8E-8 CCMG-CG 3.2 9 0.7E-8 3.7 13 0.7E-8 4.3 18 0.1E-7 6.5 35 0.4E-8 BoxMG-CG 0.8 12 0.1E-8 0.8 12 0.1E-8 0.8 12 0.2E-8 1.0 15 0.2E-8 AMG-CG 7.7 7 0.9E-8 8.0 9 0.2E-8 8.9 14 0.1E-8 8.9 14 0.8E-8 CCMG 4.0 17 0.4E-8 10.0 68 0.9E-8 11.2 79 0.1E-7 27.9 223 0.1E-7 BoxMG 0.8 17 0.3E-8 0.8 17 0.3E-8 0.8 17 0.3E-8 1.3 29 0.7E-8 AMG 8.2 11 0.6E-8 11.8 33 0.6E-8 13.0 40 0.1E-7 14.1 45 0.7E-8 Table 3

Convergence results for the experiment with N = 643_{, s = 0.05 cm, = 10}−3_{and varying the number of bubbles, m.}

5.1.4. Varying Number of Grid Points

Table 4 presents results with varying grid points. A larger number of grid points leads to coefficient matrices that are more ill-conditioned, as mentioned in Section 3. It can be observed in Table 4 that only BoxMG, BoxMG-CG and DIC-CG show perfectly scalable iteration counts with respect to the number of grid points, although the computing times grow relatively quickly. Recall that, for DIC-CG, more deflation vectors are taken for larger N , which results in a decreasing number of iterations for DIC-CG. Moreover, observe that BoxMG and BoxMG-CG outperform the other methods both in terms of the number of iterations required and the computing time. For larger N , the number of iterations for CCMG-CG and AMG-CG grows very slowly; however, the large cost per iteration for these methods still makes them uncompetitive. As mentioned earlier, a lower bound for the cost of CCMG-CG can be given; in the case of N = 1283_,

CCMG-CG would require at least 11.1 sec (instead of 38.0 sec for our current implementation) and, therefore, may be competitive with DIC-CG.

Test Problem N = 323 _{N = 64}3 _{N = 128}3

Method CPU IT RES CPU IT RES CPU IT RES

IC-CG 0.3 112 0.9E-8 5.8 244 0.1E-8 92.3 444 0.9E-8 DIC-CG 0.2 64 0.8E-7 1.7 54 0.4E-7 11.7 39 0.3E-7 GMG-CG 0.2 81 0.9E-8 3.9 132 0.9E-8 36.0 134 0.8E-8 CCMG-CG 0.4 14 0.3E-8 4.3 18 0.1E-7 38.0 19 0.9E-8 BoxMG-CG 0.1 12 0.2E-8 0.8 12 0.2E-8 7.0 12 0.3E-8 AMG-CG 0.8 10 0.4E-8 8.9 14 0.1E-8 89.0 15 0.2E-8 CCMG 0.8 44 0.1E-7 11.2 79 0.1E-7 99.1 85 0.8E-8 BoxMG 0.1 16 0.8E-8 0.8 17 0.3E-8 7.4 17 0.4E-8 AMG 0.9 20 0.1E-7 13.0 40 0.1E-7 129.2 45 0.8E-8 Table 4

(19)

In this experiment, the scalability of the methods can be easily observed. Considering the computing time, it can be seen that the required CPU time for both DIC-CG and BoxMG increase by a factor of approximately 8 when doubling the number of grid points, n, in each direction. This is quite favorable in comparison with the other methods, which scale with factors of 10 or more. Finally, we observe that IC-CG requires significantly more CPU time as N increases, as predicted by the classical theory.

5.1.5. Difficult Test Problem

We end the stationary experiments with a test case where the most difficult parameters (taken based on the previous experiments) are chosen. The results associated with this experiment can be found in Table 5, using stopping tolerance δ = 10−8. We note that a higher accuracy than δ = 10−8 can not be reached, due to the accumulation of roundoff errors and the effects of finite precision arithmetic.

As in the other experiments, BoxMG-CG and BoxMG perform the best, in terms of both the number of iterations and the computing time. They are again followed by DIC-CG, which also performs rather well. GMG-CG, CCMG, and AMG all typically fail to converge within the allowed number of iterations for δ = 10−8 _{(and also for the weaker tolerance, δ = 10}−6_{. CCMG-CG and AMG-CG do converge but, as}

always, are not competitive in terms of true CPU time.

Method CPU IT RES IC-CG 195.4 942 0.1E-7 DIC-CG 19.6 65 0.5E-7 GMG-CG – > 1000 – CCMG-CG 114.0 92 0.1E-7 BoxMG-CG 7.4 13 0.4E-8 AMG-CG 107.6 29 0.8E-8 CCMG – > 1000 – BoxMG 7.8 18 0.7E-8 AMG – > 1000 – Table 5

Convergence results for the difficult test problem. The following parameters are kept constant: N = 1283_{, m = 3}3 _{s = 0.025}

cm, = 10−5, and δ = 10−8.

While neither AMG nor CCMG converge within the allowed number of iterations as standalone solvers, both perform reasonably well as preconditioners. In fact, both unaccelerated solvers do converge, but very slowly, with AMG converging marginally faster than CCMG. While it may be possible to improve this performance somewhat by using different relaxation schemes, or by changing some of the parameters in the AMG setup stage, this is beyond the scope of the current study. Clearly BoxMG (and BoxMG-CG) and DIC-CG are both efficient techniques for solving these problems, and so we restrict the remainder of our study to comparing these two methods with one another, and with a single-level preconditioner, IC-CG.

5.2. Time-Dependent Problems

In this section, we consider simulations of three “real-life” bubbly flows. In order to obtain sophisticated geometries, these flows are considered without surface tension. The pressure-correction method is adopted to solve the Navier-Stokes equations, as described in Section 2. The interface advection is carried out using the mass-conserving level set method [46, 47]. Our main interest in each time step is the pressure solve, which takes the bulk of the computing time in each simulation, especially when the total degrees of freedom, N , is relatively large. The actual time step, ∆t, is restricted by

∆t ≤ β := hα

2 (|u|max+ |v|max+ |w|max)

(20)

where h is the distance between grid points in one direction and α = 0.35 is the CFL number, see [46] for more details. That means that we use an adaptive time stepping procedure by considering the time-step restrictions due to convection of the bubbly flow. At each time step, l, the actual time, t, is advected with an increment, ∆t, that obeys

∆t = min (β, ∆tmax) ,

where we choose ∆tmax= 0.0005 sec.

At each time step, the resulting linear system (6), with N = n3_{, is solved using both BoxMG-CG and}

DIC-CG, as these are shown to be the most stable and efficient methods for the stationary problems considered in Section 5.1. DIC-CG with K deflation vectors will be denoted as DIC-CG−K, where K is chosen to minimize the required CPU time. IC-CG is used as a benchmark in the experiments. The initial guess for each solve is chosen to be the previous solution, except for the first 10 time steps, where the zero starting vector is used (to avoid problems with achieving a too-strict relative residual reduction when the flow is initialized). The stopping criterion of all methods is chosen as in the stationary experiments. For more details on the physical problems simulated here, see [46, 47].

5.2.1. Rising Air Bubble in Water

We first consider a test problem where a cube of 1 cm3is filled with water to a height of 0.6 cm. For these experiments, we take the density of water to be 820 times that of air (i.e., = 1.22 × 10−3). At the initial time step, l = 0, a spherical air bubble with radius of 0.125 cm is located in the middle of the domain at a height of 0.35 cm. The exact material constants and other relevant conditions for this simulation can be found in [46, Sect. 7.2]. The evolution of the geometry during 500 time steps is given in Figure 2. Here, we take a grid with N = 1003_{; in this case, the optimal value for K in DIC-CG−K appears to be K = 20}3_.

Results of the experiment can be found in Figure 3, showing both the number of iterations and computing time required for each method for the pressure solves at each time step, l.

(a) l = 0 (t = 0 sec). (b) l = 100 (t = 0.013 sec). (c) l = 200 (t = 0.022 sec).

(d) l = 300 (t = 0.032 sec). (e) l = 400 (t = 0.050 sec). (f) l = 500 (t = 0.064 sec). Fig. 2. Evolution of a rising bubble in water.

It can be readily observed from Figure 3 that, for each time step, both DIC-CG and BoxMG-CG converge in fewer iterations and require less computing time than to IC-CG. Due to the zero starting vector in the first 10 iterations, one can observe a peak in the IC-CG cost around these first iterations, whereas this phenomenon can not be clearly seen for DIC-CG and BoxMG-CG. Moreover, BoxMG-CG shows better

(21)

(a) Number of iterations versus time step. (b) CPU time versus time step.

Fig. 3. Results for IC-CG, DIC-CG−203 _{and BoxMG-CG for the pressure solves during the real-life simulation with a rising}

air bubble in water.

performance than DIC-CG. We remark that both methods behave smoothly in time and seem to be more-or-less independent of the (sometimes complicated) geometry of the density field. This is in contrast to IC-CG, whose convergence is rather erratic. Only some small outliers can be observed in the convergence of DIC-CG and BoxMG. For example, a small peak can be seen around l = 325 in DIC-CG and around l = 390 and l = 410 in BoxMG-CG. This is likely related to particular changes in the density field at these time steps, but it is difficult to pinpoint the cause, due to the complicated surface dynamics at these time steps. Moreover, it can be seen that, for l ∈ [150, 350], more iterations are required especially for DIC-CG, because the geometry is most complicated in this period, due to the interaction of the bubble with the interface and the appearance of many droplets, as can be observed in Figure 2.

5.2.2. Falling Water Droplet in Air

In the next simulation, we consider a 1 cm3 cube filled with water to a height of 0.45 cm. At the initial time step, l = 0, a spherical water droplet with radius 0.125 cm is located in the middle of the domain at a height of 0.6 cm. The same material constants and conditions are used as in the previous simulation. The evolution of the geometry during the 500 time steps is depicted in Figure 4. Again, the grid resolution is N = 1003 _{and the optimal number of deflation vectors is K = 20}3_.

The results of the experiment can be found in Figure 5. It can again be noticed that IC-CG performs worse than both DIC-CG and BoxMG-CG. BoxMG-CG is always faster than DIC-CG, although the differences are small in this experiment; the number of iterations and computing time per time step are approximately the same for both methods. We observe a very smooth behavior of the corresponding performance curves, because very few additional bubbles or droplets appear during the simulation. In this experiment, BoxMG and DIC-CG appear to be more-or-less insensitive to the geometry of the density field, while it can be readily observed that the performance of IC-CG does depend on it.

5.2.3. Two Rising and Merging Air Bubbles in Water

In the final simulation, we consider a test problem where a 1 cm3 _{cube is filled with water to a height}

of 0.65 cm. At the initial time step, l = 0, two air bubbles of radius 0.1 cm are located with centers at coordinates (0.5, 0.5, 0.37) and (0.5, 0.3, 0.15). The evolution of the geometry during 2500 time steps can be found in Figure 6. This test problem is, obviously, harder to solve than the previous two test problems, since there is interaction between the two bubbles at the same time as they interact with the water interface. In addition, we now consider a refined grid with N = 2003, resulting in a strongly ill-conditioned coefficient matrix and making the problem very complicated to solve. DIC-CG−K with K = 253appears to be optimal in terms of the required CPU time for DIC-CG-K methods.