Index of /rozprawy2/11494

(1)

Akademia Górniczo-Hutnicza

im. Stanisława Staszica w Krakowie

Wydział Informatyki, Elektroniki i Telekomunikacji

K

ATEDRA

I

NFORMATYKI

P

RACA DOKTORSKA

M

ARCIN

Ł

O ´S

E

FEKTYWNE ALGORYTMY DLA RÓ ˙

ZNYCH ARCHITEKTUR

MASZYN RÓWNOLEGŁYCH DO PRZEPROWADZANIA

SYMULACJI PROCESÓW NIESTACJONARNYCH

P

ROMOTOR

:

prof. dr hab. Maciej Paszy´nski

(2)

NIEPRAWDY, ZE NINIEJSZ ˛˙ A PRAC ˛E DYPLOMOW ˛A WYKONAŁEM OSOBI ´SCIE I SAMODZIELNIE,I NIE KORZYSTAŁEM ZE ´ZRÓDEŁ INNYCH NI ˙Z WYMIENIONE W PRACY.

. . . .

(3)

AGH

University of Science and Technology in Krakow

Faculty of Computer Science, Electronics and Telecommunications

D

EPARTMENT OF

C

OMPUTER

S

CIENCE

P

H

D T

HESIS

M

ARCIN

Ł

O ´S

E

FFICIENT ALGORITHMS FOR NON

-

STATIONARY PROBLEM

SIMULATIONS ON PARALLEL ARCHITECTURES

S

UPERVISOR

:

prof. Maciej Paszy´nski

(4)

I want to express my sincere gratitude to my thesis advisor, prof. Maciej Paszy´nski, whose guidance and support were vital for the success of my research efforts. I would also like to thank the rest of my research team, including prof. Robert Schaefer and my fellow PhD students, as well as our collaborators from all over the world – results presented in this thesis are fruits of labor of many people.

I am particularly indebted to prof. Victor Calo for introducing some of the core concepts this thesis builds upon and inviting me to King Abdullah University of Science and Technology (KAUST) during my Master’s degree studies, as well as prof. Keshav Pingali, who supported my visits at ICES, University of Texas.

My research presented here was supported by the polish National Science Centre (NCN) – grants no. 2017/26/M/ST1/ 00281 (HARMONIA), 2016/21/B/ST6/01539 (OPUS) and 2015/17/B/ST6/01867 (OPUS).

Last but not least, I am deeply grateful to my friends and family for their unwavering continuous support in matters both scientific and personal.

(5)

List of Tables

5.1 Values of parameters used in the bio-heat transfer simulation . . . 54

5.2 Value of cell diffusion coefficientDbfor various skin layers . . . 73

5.3 Values of parameters related to the continuous part of tumor model . . . 75

5.4 Values of parameters related to the vasculature model . . . 76

5.5 Errors for U-shape benchmark . . . 94

5.6 Errors for X-shape benchmark . . . 97

(10)

3.1 Pseudocode of the strategy restarting adaptations fromVinit . . . 27

3.2 Pseudocode of the strategy reusing the mesh from previous step . . . 28

3.3 Pseudocode of the parallel strategy . . . 30

4.1 Main loop of the IGA-FEM simulation . . . 43

4.2 ComputeRHS – sequential version . . . 45

(11)

1. Introduction

1.1. Motivation

The role of partial differential equations (PDEs) in modern science and engineering is difficult to overestimate. They play a central role in physical description of a wide variety of phenomena, such as solid mechanics, electro-magnetics, fluid dynamics, acoustics, diffusion, general relativity and quantum mechanics. They found plenty of applications in biology, chemistry, geology, materials science and even economics (Black-Scholes equation). Models based on PDEs are used extensively in nearly all conceivable branches of industry and engineering, including architecture, manufacturing processes, weather prediction, pharmaceuticals, vehicle design and so on. On the other end of the spectrum, PDEs are of considerable intrinsic interest for the pure mathematicians, and their investigation has led to advancements in functional analysis, development of rich theory of distributions by Sobolev and Schwartz, and the notion of Sobolev space. Suitable generalizations of differential operators play a prominent role in modern geometry (e.g. Atiyah–Singer theorem).

Given their versatility and pervasiveness, it is no wonder PDEs attract a lot of attention from both mathematicians and engineers alike. Of particular practical interest are numerical methods for obtaining approximate solutions of PDEs, especially since the advent of general-purpose computers, which immensely enhanced the available computational capacity of mankind and made possible large-scale calculations previously considered infeasible. The underlying idea of these strategies is to discretize the continuous PDE in order to reduce it to a finite problem (usually a system of algebraic equations) that can be solved by an algorithm in a finite number of steps. Starting with the conceptually simplest finite difference discretizations, numerous approaches have emerged over the years, one of the most prominent and widely used today being the Finite Element Method (FEM), where the approximate solution is expressed as a linear combination of basis functions constructed on a computational mesh based on the problem domain. The equations defining coefficients of such linear combination are derived from the variational formulation of the PDE and suitable orthogonality conditions (Galerkin method). In the most prevalent case of linear PDEs, the resulting problem is a system of linear equations.

One relatively new and promising variant of FEM is isogeometric analysis (IGA), developed initially by prof. T. Hughes. Its main purpose is to address an impedance mismatch between available FEM software packages and computer-aided design (CAD) tools used extensively in engineering. Geometry in CAD systems is represented by B-splines and NURBS. In order to perform a FEM simulation the model needs to be converted to a discrete mesh. Not only does it result in loss of accuracy, as the mesh is an approximate representation of the model created in CAD system, which itself is a significant disadvantage for certain classes of problems sensitive to geometric imperfections, but usually such conversion is not fully automatic, and may require significant effort. As a result this task, called model development, typically accounts for over 80% of overall analysis time in the design process in some branches of engineering [HCB05, Hug14]. This greatly impacts the flexibility of the process, since feedback from FEM analysis cannot be readily used to iteratively improve the design. Furthermore, disparity between CAD model and

(12)

since it requires frequent communication with CAD system, which may be difficult to provide in practice. In order to alleviate these problems, IGA utilizes directly geometry described by B-splines and NURBS, which removes the need for conversion and reduces the aforementioned bottleneck, allowing to significantly decrease time and cost of the design process. Another major potential advantage over the traditional approach is the higher continuity of the basis functions, which facilitates solving higher order PDEs (e.g. Cahn-Hilliard equation) directly, without employing reduced order weak formulations.

Computational cost of the Finite Element Method is considerable, especially when high accuracy is desired and large computational meshes must be employed. Two primary operations constitute the majority of the cost – assembling the matrix and the right-hand side of the linear system, and solving it. Since the matrix typically is

sparse and the number of its nonzero entries is_{O (N), where N is the total number of degrees of freedom (DOFs),}

the first operation has complexity_{O (N), which means that solving the system usually comprises the majority of}

computational cost. For this reason, of paramount importance for FEM is the subject of efficient linear solvers. A wide variety of linear solvers can be applied to FEM systems, both specialized and general-purpose. Ideally, a solver

would have_{O (N) complexity, but such solvers are rarely available for most of non-trivial problems, and in general,}

linear complexity is probably too much to hope for. The problem is exacerbated by IGA, where the performance of frequently used direct and iterative solvers deteriorates with increasing order of basis functions faster than in more classical variants.

Computational cost of a FEM is especially high in case of time-dependent problems, like modeling turbulent flows and car crashes. In such simulations, Finite Element Method is usually combined with a time marching scheme – FEM is used to discretize space, and the time dimension is handled separately by a time stepping strategy. This means that the whole computation is carried out as a sequence of steps, each consisting in solving a single problem using FEM. This effectively lowers the dimension of problem(s) solved by FEM by one, thereby significantly reducing computational time and, even more importantly, memory usage, in comparison to simply solving full space-time problem at once. Nevertheless, the cost remains high due to necessity of solving multiple, albeit smaller, linear systems, which calls for development of new, efficient parallel algorithms and techniques for speeding up simulations of transient problems. In this book, two such algorithms are presented.

First of these techniques is design to speed uphp-adaptive FEM. In this approach, one starts with a relatively

coarse mesh and the solution accuracy is gradually improved by iteratively refining the mesh by either breaking the elements or locally increasing order of basis functions. This process consists of solving the problem multiple times using FEM on more and more adapted meshes until the desired accuracy is reached, using some error estimates to guide the refinement process. While each of these FEM computations can be solved using a parallel algorithm, in case of time-dependent problems we can also partially overlap execution of the entire consecutive time steps, allowing for even more parallelizm. Typically, the next time step depends on the solution of the previous one, so

there is a data dependency between steps, forcing their sequential execution. Since inhp-adaptive FEM each time

step produces a sequence of results with increasing accuracy, these partial results produced in one time step can be immediately used to start executing the next time step without waiting for the current one to finish.

Second algorithm is geared towards the isogeometric computations using the Alternating Direction Solver (ADS). ADS is a fast, linear time solver exploiting the specific structure of matrices arising in explicit dynamics simulations using regular meshes. It is so extraordinarily efficient that it reduces the cost of solving linear system to the point where it is basically negligible, and the dominant factor in the overall computational cost becomes assembling the right-hand side (numerical integration). Highly scalable parallel implementation of ADS for machines with distributed memory has been created and analyzed by M. Wo´zniak in his PhD Thesis [Wo´z17]. However, using huge meshes with explicit time discretization methods severely limits the time step size due to CFL condition. Here we present a shared-memory implementation, where the time step size and the size of the mesh are kept in balance and thus the CFL condition is not a significant restriction. The proposed algorithm allows for efficient parallelization of

(13)

1.2. Main thesis 13 computing the right-hand side in such environment. This is confirmed by several numerical examples, including propagation of elastic waves, non-linear flow in heterogeneous medium, and two- and three-dimensional tumor growth simulations.

1.2. Main thesis

The purpose of this book is to prove the following statement:

It is possible to design new efficient algorithms for perfming fast simulations of time-dependent problems, using either adaptive Finite Element Method or smooth isogeometric analysis. The proposed solvers will outperform currently available state-of-the-art solutions and will deliver linear or quasi-linear computational cost for a wide class of engineering and bioengineering problems including wave propagation, flow through porous media and tumor growth simulations. The stated goals will be achieved by means of:

– proposing a novel approach allowing for parallel execution of multiple consecutive time steps using adaptive Finite Element Method in a distributed memory environment using MPI-like interface

– designing an efficient parallel implementation of the fast ADS for shared memory machines and demonstrating its scalability by applying it to several time-dependent problems – heat transfer, linear elasticity, flow in heterogeneous porous medium and tumor growth model

Viability and efficiency of proposed solutions will be demonstrated by applying them to several non-stationary problems, both simple (heat transfer) and complex (tumor growth simulation). We will also hint at alternative

applications of the developed tools by using the parallelL2_{-projection solver to construct an efficient, smooth misfit}

approximation for inverse problems.

1.3. Structure of this book

Chapter 2 presents some background information necessary to properly appreciate the contents of the thesis and briefly describes the state-of-the-art solutions we endeavor to improve upon, in particular pointing out their inadequacies we strive to remedy. It contains a concise description of the idea and history of FEM and its variants

relevant to this work –hp-adaptive FEM and isogeometric analysis (IGA).

Chapter 3 describes a novel algorithm allowing for parallel execution of multiple consecutive time steps using adaptive variants of the Finite Element Method, together with a proof of convergence and discussion of the necessary assumptions. It contains a precise definition of the adaptation model, presents classical alternative strategies and demonstrates their inferior parallelization potential.

Chapter 4 discusses the design and implementation of the efficient parallel shared-memoryL2_{-projection solver}

and its application to explicit dynamics isogeometric FEM simulations of time-dependent problems.

Chapter 5 consists of several numerical examples of the presented strategies. First two sections present applica-tions of the parallel space-time adaptive algorithm to heat transfer in L-shaped domain (sec. 5.1) and the Pennes equation (sec. 5.2). Subsequent sections demonstrate the shared-memory parallel integration algorithm applied to heat transfer (sec. 5.3), elastic wave propagation (sec. 5.4), non-liner flow in heterogeneous porous medium (sec. 5.5) and the tumor growth model (sec. 5.6). Finally, section 5.7 describes a fast fitness approximation strategy designed to speed up evolutionary optimization algorithms.

Final chapter 6 summarizes the results presented in the thesis, discusses the remaining shortcomings of the proposed methods and points out potential directions of further developments.

(14)

2.1. Finite Element Method

Finite Element Method (FEM) is a family of numerical method for constructing approximate solutions of PDEs. The fundamental idea of FEM is to subdivide the spatial domain of the problem into a mesh of smaller subdomains, for each subdomain construct a set of equations representing the problem and combine them into a global system of equations whose solution describes an approximation of the exact solution. This is typically done by converting the strong problem formulation into a variational (weak) formulation considered on a (subset of) suitable Sobolev

spaceU and applying the Galerkin method with a finite-dimensional space Uh⊂ U spanned by basis functions

constructed on the mesh (usually piecewise polynomials). Example of classical Lagrange basis functions for a rectangular domain subdivided into two subdomains is presented in fig. 2.1. Approximate solution of the considered PDE is constructed as a linear combination of such functions. FEM can be applied to both stationary and non-stationary problems, leading to system of linear equations and system of ordinary differential equations (ODEs) (semi-discrete formulation), respectively.

FEM was conceived independently by various mathematicians and engineers in 1940s and 1950s, starting with works of Hrennikoff [Hre41] and Courant [Cou43], and further developed by J. Argyris, R. W. Clough, O. Zienkiewicz and others in the 1960s and 1970s. Long before the rigorous mathematical basis of the method was established by Strang and Fix [SF73], FEM proved to be an immensely powerful and practically useful tool for numerical modeling of various physical phenomena, especially in the domain of structural mechanics, where it was originally applied. One of the main advantages of FEM is the ease of working with domains possessing complex geometry, which is especially important in practical engineering applications. While other methods, such as Finite Difference Method (FDM) can be modified to handle irregular meshes, it is often far from straightforward (see e.g. [LO80])

While the essential idea of FEM – derive a variational formulation of the problem and build a finite set of basis functions to use with Galerkin method – is clear, there are multiple ways to carry out both of these steps and various possible modifications and extensions, leading to a plethora of variants of the finite element method, such as the

ones including adaptive refinements of the spaceUh(h-adaptive FEM, p-adaptive FEM, hp-FEM), Spectral Finite

Element Method (SFEM) which uses very high order Chebyshev or Legendre polynomials to achieve higher order of convergence [Pat84], Extended Finite Element Method (XFEM) designed to better handle discontinuities of solution and used commonly for solving problems like crack propagation [DB99], Discontinuous Galerkin (DG) combining features of FEM and Finite Volume Method (FVM) by using discontinuous basis functions [ABCM02], and others.

One particularly notable variant of FEM is Discontinuous Petrov-Galerkin (DPG) – a relatively new approach developed by L. Demkowicz and J. Gopalakrishnan [DG10, DG11] with the goal of creating a Petrov-Galerkin method with automatic stability guarantee. This is achieved by constructing an approximation of the optimal test space that complies with the discrete inf-sup condition which ensures well-posedness of the discrete problem.

(15)

2.2.hp-FEM 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 3 5 2 4 8 7 9

Figure 2.1: Example FEM basis functions for a rectangular domain subdivided into two elements – basis functions are associated with each vertex, edge and element.

This is of paramount importance in fields such as fluid dynamics, where standard approaches often suffer stability issues and modifications such as SUPG [NJ82] are required, which was one of the obstacles that made FEM-based methods’ success in these domains less prominent than in structural mechanics.

2.2. hp-FEM

The concept of adaptivity in computational science has been of paramount importance since its dawn. It appears in many shapes and forms – adaptive methods can be found in fields of numerical quadratures [PTVF07], linear algebra [DYPD15, KHG04], genetic algorithms [SP94] and most importantly in the context of this book, for solving differential equations. While the details vary, the central idea is to exploit some properties of the problem to obtain greater increase of solution quality than more straightforward methods that perform some unnecessary additional work instead of focusing on the most important features. Over the years, a plethora of strategies for controlling time step size have been developed for systems of ordinary differential equations, ranging from very simple, like step doubling, to very sophisticated ones, based on complex error and convergence rate estimation techniques [GS97, Gus92]. Similarly, efficient adaptive methods have been developed for numerical PDEs discretizations, including wavelet methods [CM00], Finite Volume Method [CFP08, GL08] and, in particular – FEM.

The accuracy of solutions delivered by standard FEM using piecewise polynomials depends on size of mesh

elementsh and order of the basis functions p. The earliest and most straightforward approach used to increase the

accuracy of FEM simulations was theh-method – uniformly decreasing size of all the mesh elements. For many

applications this yields convergence rate of_{O (h}p_{) in energy norm, but this rate is also limited by regularity of}

the solution and thus is lower for problems with singularities, which makes using highp inefficient [Cia94]. This

problem can be removed by usingh-adaptive methods, whereby the mesh is not refined uniformly, but rather only in

carefully chosen regions of the domain [BKP79, LPJ85]. Convergence can also be achieved by increasing degrees

of the polynomial basis functions – so calledp-method [Vog83]. As a representative of spectral FEM, p-method is

capable of delivering an exponential convergence in ideal circumstances, but only under regularity assumptions

too strong for practical problems. Furthermore, increasingp leads to much denser matrices, which significantly

(16)

The idea ofhp-method is to combine both approaches – refine the mesh and increase the polynomial degrees at the same time [Dem06]. If done properly, for a large class of problems this method can deliver convergence exponential w.r.t. the number of degrees of freedom (DOFs) [GB86b, GB86a, MS98, CDS05]. If properties of the solution are known a priori, suitable mesh and degrees of basis functions can often be determined before

the computation. The main challenge of efficiently usinghp-FEM is to devise a fully automatic procedure that

constructs an optimally adapted mesh delivering exponential convergence without such knowledge. A number of approaches have been proposed [MA97, OWA95], perhaps the most successful being the one proposed by L.

Demkowicz [DRD02] and described in detail in his books [Dem06, DKP+_07].

2.2.1. Automatic

hp-adaptivity

L. Demkowicz’s fully automatichp-adaptive FEM algorithm starts with some initial mesh and performs a series

of adaptation steps. Each adaptation step consists of modifying the mesh and/or polynomial basis functions’ orders with the goal of improving the quality of the solution while minimally increasing the number of DOFs (and so, the computational cost). The idea is to compare the solution computed using the current mesh with the solution computed using a uniformly refined mesh and estimate which refinements impact the solution the most. With some simplifications, the algorithm of a single adaptation step can be summarized as follows:

1. Compute the solutionuh,pusing FEM on the coarse mesh

2. Perform a globalhp-refinement – split every element into element sons (4 in 2D, 8 in 3D) and raise the degree

of basis functions uniformly throughout the mesh

3. Solve the problem on the resulting fine mesh to obtain a more accurate reference solutionuh/2,p+1

4. Project the fine mesh solutionhh/2,p+1onto the coarse mesh using Projection Based Interpolation [Dem04]

to obtainu˜h,p

5. For each mesh elemente_{∈ E, compute an H}1_{error estimate}

ηe=kuh/2,p+1− ˜uh,pkH1_(e)

and mark for refinement all the elements withηe≥ 1₃maxa∈Eηa

6. For each marked element, choose an optimal refinement type. Element can be broken in one or more dimensions (h-refinement), or have the order of associated polynomial basis functions increased (p-refinement). Refinement type is chosen to maximize

kuh/2,p+1− ˜uh,pkH1_(e)− ku_h/2,p+1− wk_H1_(e)

∆nrdof

wherew is the projection of fine mesh solution uh/2,p+1on the element refined using the refinement type

being evaluated, and∆nrdof is the number of additional degrees of freedom resulting from the refinement.

7. Refine each marked element according to the chosen refinement type

For a more in-depth exposition, see [DKP+_{07, Chapter 14].}

2.3. IGA

Isogeometric analysis (IGA) is a variant of Finite Element Method originally proposed in 2005 [HCB05] to address certain shortcomings of the more classical methods, stemming primarily from the discrepancy between

(17)

2.3. IGA 17 geometric representations of models used in CAD software and traditional finite element analysis. This incompati-bility results necessitates converting the CAD model to a discrete mesh, which is not fully automatic and in practice significantly contributes to the cost of a simulation, leads loss of accuracy and impedes usage of adaptive methods, since mesh refinements require frequent communication with CAD system.

Isogeometric analysis (IGA) attempts to tackle these problems by using NURBS and B-splines, widely utilized by CAD systems, to represent geometry and basis functions in FEM. This allows to easily use highly complex geometries without need for mesh generation and maintain accuracy of the description without significantly increasing number of degrees of freedom. This facilitates applications of isogeometric analysis especially to problems with complicated spatial domains, like blood flow analysis, which can help physicians to predict outcome of alternative treatment plans for individual patients. A complete NURBS-based framework designed for creating patient-specific vascular models to simulate flow of blood using isogeometric analysis has been developed [BCZH06,

BZC+_{06, ZBG}+_{06]. Based on input data in the form of tomography or MRI scan, arterial paths are found using}

image processing techniques and extracted with isocontouring. Resulting one-dimensional model of the vascular system is used along with the extracted surface mesh to create a clean, simple NURBS representation, which can be subsequently used to simulate blood flow using Navier-Stokes equations.

Apart from the main motivation, this approach was found to provide a range of additional benefits. Basis functions of classical finite element approximation spaces are continuous, but not differentiable across element

boundaries, while B-spline basis is globally_Cp−1, which makes it more suitable for higher-order problems, which

can be very difficult to tackle using FEM with traditional basis functions. One prominent example is the application of isogeometric analysis to Cahn-Hilliard equation, describing a phase transition phenomena [GCBH08]. It can be used to model a wide range of processes, from phase segregation of binary alloy systems to planet formation

[Tre03] and cancer growth [FSN+_{06]. Obtaining numerical solutions is difficult, as the equation is non-linear}

and nearly ill-posed [Fur01]. Cahn-Hillard equation involves spatial differential operator of order four, which

necessitates usage of globally_C1 basis functions in order to utilize Finite Element Method. Other approaches

used previously [Fur01, Sun95, LS03] were confined to simple spatial domains and so inadequate for complex geometries encountered in real-world engineering applications. Isogeometric analysis, free from such limitations and unhindered by the basis continuity requirements, has been successfully applied and allowed to obtain solutions with high degree of accuracy. Another similar example is provided by Navier-Stokes-Korteweg equation [GHNC10].

Furthermore, it has been found that use of higher-order continuous basis functions provides better approx-imability per degree of freedom in comparison with the traditional methods, thus leading to smaller systems.

This has been observed empirically in fluid simulations [ABC+08, ABKF11, EH13a, EH13b] and

electromagnet-ics [BBdVC+06, BSV10, BRSV11], as well as proven mathematically [EBBH09, BdVBRS11]. Since its

concep-tion, IGA has been successfully applied to a wide variety of problems in various areas, such as elasticity [DLHC12],

structural vibrations [CRBH06], turbulent flows [ABC+08, CHC12], aerodynamics [HAB11], gradient damage

models [VSHdB10], phase field models [GCBH08, DBH12] and arterial blood flow [BCZH06, ZBG+07]. Detailed

exposition of the subject with remarks on general philosophy of the approach and a range of example applications can be found in a book by its founders [CHB09].

All these advantages are obtained at the expense of efficiency, since while superior approximability reduces number of degrees of freedom in resulting linear systems, these systems are significantly more computationally

expensive to solve – especially for high-order bases. This is the case for both the direct solvers [CPD+_{12, CCPP11,}

(18)

0 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 N3 0 N3 1 N23 N3 3 N43 N53 N63 N3 7 N83 N3 9

Figure 2.2: B-spline basis for open knot vectorΞ = (0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7) and p = 3

2.4. B-splines

Basis functions used in IGA are B-splines and Non-Uniform Rational B-Splines (NURBS). Set of B-spline basis

functions is determined by its orderp and the knot vector

Ξ = (ξ0, . . . , ξM) , ξi≤ ξi+1 (2.1)

i.e. a non-decreasing sequence of numbers (points of the knot vector may be repeated). Knot is said to be open, if

the first and last elements are repeatedp + 1 times. B-spline basis functions are defined on [ξ0, ξM] recursively by

the following Cox-de-Boor formula:

N0 i(x) =    1 forx_{∈ [ξ}i, ξi+1] 0 forx_{6∈ [ξ}i, ξi+1] i = 0, . . . , M_{− 1} Nip(x) = x_{− ξ}i ξi+p− ξi Nip−1(x) + ξi+p+1− x ξi+p+1− ξi+1 Ni+1p−1(x) i = 0, . . . , M− p + 1 (2.2)

Example B-spline basis of order3 with open knot is displayed in fig. 2.2. We shall denote the space spanned

by the B-spline basis functions of orderp by_Sp_{. Higher-dimensional B-spline basis on the Cartesian product of}

one-dimensional domains can be obtained from one-dimensional bases using standard tensor product construction:

Nip1···id(x) = N

p

i1(x1)· · · N

p

id(xd) (2.3)

B-spline functions possess a number of interesting properties (see [Mar05, Chapter 8], [PT97, Chapter 2]), most

important being higher-order continuity and locality of support. Basis of orderp is globally_Cp−1_{, except at repeated}

knot points, at which it is_Cp−k,k being multiplicity of the point. B-spline basis functions are non-negative and

support ofNipis[ξi, ξi+p+1].

2.5. State-of-the-art FEM solvers

Finite Element Method, both classical variants and IGA, reduce the problem of solving a PDE to solving a system of algebraic linear equations. These systems, however, while sparse, are typically very large and consequently tremendously expensive to solve – in most cases cost of the solver constitutes the vast majority of the total cost of the computation. It is thus desirable to design solvers that are capable of leveraging the specific structure and properties of systems arising from FEM discretization.

(19)

2.5. State-of-the-art FEM solvers 19

2.5.1. Direct solvers

Abstractly speaking, direct solvers solve linear systems by performing a series of algebraic transformations producing equivalent systems, usually computing (explicitly or implicitly) an LU decomposition of the matrix of the system. Assuming exact arithmetic, these solvers produce exact solutions in finite number of steps – the only source of error in practical implementations is the round-off error introduced by inherent inaccuracy of floating point operations. Archetypal direct solver is the Gaussian elimination algorithm, which uses elementary row operations to reduce the augmented matrix of the system to row echelon form and then performs backward substitution to find the solution.

Since the complexity of naive implementation of Gaussian elimination is_{O N}3, where N is the number of

degrees of freedom (DOFs) of the system [GL13, sec. 3.2.6], this approach is not suitable for systems arising in FEM. For large sparse systems much more efficient approaches that can exploit the sparsity of the matrix have been proposed. While the underlying idea is similar, state-of-the-art direct solvers use a plethora of sophisticated strategies to minimize fill-in in the LU decomposition factors. Wide variety of existing implementations, both commercial and free, are available, including MUMPS [MUM], PARDISO [PAR], PaStiX [PaS], SuperLU [Sup], UMFPACK [UMF] and others.

One of the most successful types of direct solvers, especially well-suited for FEM matrices is the multifrontal solver [DR83], which is a parallel-friendly extension of the frontal solver. Multifrontal solver decomposes the matrix into a tree of small submatrices based on a graph of dependencies between the variables and works its way from the leaves up to the root, at each step combining the submatrices, eliminating fully assembled degrees of freedom (resulting in a dense matrix) and propagating the remaining DOFs upwards. In case of systems arising in FEM, nodes of such a tree may correspond to parts of the mesh and uneliminated degrees of freedom correspond to basis functions supported in more than one such part. A comprehensive presentation of the multifrontal solver can be found in [Pas16].

Performance of the multifrontal solver depends heavily on the geometry of the mesh and the elimination tree, since majority of the computational cost comes from performing (partial) elimination (computing Schur complement or LU factorization) on dense matrices whose size depends on size of the interfaces between parts of the mesh. For

uniform meshes of dimensiond≥ 2 using classical C0_{hierarchical basis functions, the complexity of multifrontal}

solver is dominated by LU-factorization in the root and is_{O N}3(1−1/d)_{– that is, O N}3/2_{in 2D and O N}3

in 3D,N being the total number of degrees of freedom [PPC15]. Notably, the complexity is independent of the

degree of basis functions. On a mesh consisting ofNeelements there is approximatelyN ≈ Nepdhierarchical basis

functions of orderp, meaning that the complexity is_ONe3/2p3

for 2D and_{O N}_e2p6_{for 3D. For large meshes}

common in practical applications, this is a very significant cost that easily dominates the rest of FEM computation.

The situation is even worse in isogeometric analysis. While the B-spline basis of orderp is smaller (N =

Ne+ p≈ Ne), since the supports of B-splines are larger than those of hierarchical basis functions, the matrices

become denser and fewer degrees of freedom (DOFs) can be eliminated in each stage. For B-splines of orderp with

continuityCp−1_{on the same uniform mesh the complexity of multifrontal solver is}

ONe3/2p6

=O N3/2_p3

for 2D and_{O N}2

ep9 = O N2p3 for 3D [WKP+14, CPD+12], meaning that despite fewer DOFs, for highp the

system is more expensive to solve than in case of hierarchical basis functions. Using parallel implementation on a

machine with sufficiently many cores these estimates can in theory go down to_{O Np}2 for 2D and O N4/3p2_in

3D [WKP+14, Wo´z17], which is a significant speedup, but for 3D the cost of the solver is still not linear inN .

2.5.2. ADS solver

Alternating Direction Solver (ADS) was originally proposed as an efficient linear time solver for parabolic and hyperbolic PDEs in context of finite difference method [PR55, DR56, BVY62]. The method was applied by

(20)

V. M. Calo and his student L. Gao to systems originating from the isogeometricL2_{-projection problem on regular}

patches [GC14], which appears naturally in the context of time-dependent problems where time is discretized using an explicit scheme.

ADS is a specialized direct solver exploiting the Kronecker product structure of the mass matrix appearing in

theL2_{-projection problem when using a tensor product B-spline basis on a regular grid. To be precise, the algorithm}

reduces the problem of solving a linear system withN× N matrix M which can be expressed as

M= M1⊗ M2⊗ · · · ⊗ Md

for some matricesM1, . . . , Md, to the problem of solving multiple linear systems with matricesM1, . . . , Md. If

each of these matrices can be factorized in time linear with respect to its rank, the total cost of solving the full

system with matrix M is_{O (N), which is significantly faster than the general-purpose direct solvers described}

earlier. In particular, this is the case for the Gram matrix of tensor product B-spline basis defined as in eq. (2.3), since it can be written as a Kronecker product of Gram matrices of the corresponding one-dimensional B-spline bases, and each of these has a banded structure due to locality of support of B-spline basis functions.

Requirement that the matrix has a Kronecker product structure is a significant limitation of ADS and severely

restricts its applicability to more complex problems. For example, if one wishes to parameterize a domainΩ with

complex geometry using a mappingΦ from the reference regular grid to Ω, the integrals defining the entries of the

mass matrix change due to an additional factor – Jacobian ofΦ, and the Kronecker product structure in most cases

is lost. In this case the ADS solver can still be used as a preconditioner to some iterative solver [GC15]. Similar issues arise when one attempts to use an implicit time stepping scheme – matrix of the resulting system is no longer just a mass matrix of the basis and in case of popular implicit schemes (like and Crank-Nicolson) does not possess the Kronecker product structure. This restricts using ADS to explicit schemes, which causes stability issues and restricts the size of time steps due to necessity of complying with the CFL condition.

A parallel implementation of ADS for distributed memory machines was considered by M. Wo´zniak in his PhD Thesis [Wo´z17]. While the proposed solution exhibited an exquisite scalability delivering near-perfect speedup up to 1728 processors and solving a system with over a billion DOFs in less than a second, for such large meshses the CFL condition enforces time steps too small to be practical:

∆t_{≤ C (∆x)}2=_{O N}−2

whereN is the mesh size in each dimension.

2.6. Summary of open problems

This book attempts to address the following problems:

– Cost of solving time-dependent problem using Finite Element Method, especially usinghp-adaptive FEM,

is very significant. In order to perform large-scale simulations within a reasonable time frame it is often necessary to employ large distributed memory machines. This creates a need for highly scalable parallel algorithms capable of efficiently harnessing the full power of such machines. While such solutions exist for

solving stationary problems withhp-FEM and can be applied to parallelize individual time steps, limited

scalability of multifrontal solvers puts an upper bound on the efficiency of such approach for time-dependent problems. It is therefore desirable to devise new algorithms supporting concurrent execution of multiple time steps, thus circumventing these limitations and exhibiting superior scalability.

– For isogeometric simulations on regular meshes, very efficient ADS solver can be used to compute each time step in time linear with respect to number of degrees of freedom. There exists an efficient parallel

(21)

2.7. Main scientific contributions 21 ADS implementation for machines with distributed memory, but for many problems using very large meshes forces a prohibitively small time step size due to CFL condition and is thus impractical. In such cases using smaller shared memory machines and focusing on parallelizing the right-hand side computation rather than the ADS solver itself can be more beneficial, since the cost of the solver is in this case almost insignificant. Consequently, there is a need for an efficient parallel algorithm for assembling the right-hand side in shared memory environment.

– Solving inverse problems typically entails solving a large number of instances of the forward problem, which is often a FEM simulation of time-dependent problem. In such situations resources that can be allocated for solving a single instance are limited, which precludes using large meshes and distributed memory clusters. On the other hand, since different instances are fully independent computations, they can be solved concurrently. In these circumstances the cost of traditional direct solvers might be prohibitive, both in terms of computational time and memory consumption, creating a need for efficient shared-memory algorithm with a low memory footprint for solving single instances of the forward problem. Due to its very modest memory requirements and excellent efficiency, forward problem solvers utilizing ADS seem to be well-suited for this task. – In some applications, like modeling tumor growth, the goal is to simulate a process over a long period of time.

If stability of the time discretization is not an issue, choice of time step size is dictated purely by the desired accuracy and time scale of external processes influencing the one we model (e.g. vasculature evolution), meaning that explicit schemes can be utilized without sacrificing efficiency to conform with CFL condition. In these cases, using ADS can significantly reduce the cost of the simulation.

– In order to reduce the number of (often very expensive) evaluations of the fitness function, evolutionary optimization algorithms routinely employ surrogate fitness technique, whereby fitness function evaluations performed during the optimization process are stored and used to build a model, which can then be used to cheaply obtain approximate fitness values in arbitrary points. While a number of existing approaches exists, most are either relatively computationally expensive for certain purposes, or fail to have some desirable

properties. An alternative strategy, using fast L2_{-projection algorithm and Lagrange interpolation on a}

Delaunay triangulation of the available evaluation points, can provide an efficient (both to create and evaluate) and smooth approximation of the fitness function.

2.7. Main scientific contributions

The major research contributions described in this thesis are the following:

– Proposing a novel algorithm for non-stationary problems using space adaptations, which allows parallel execution of multiple consecutive time steps and reusing partial results computed in preceding time steps – Designing and implementing parallel IGA-ADS algorithm for machines with shared memory using GALOIS

framework, which facilitates creating efficient simulations of various engineering problems, including but not limited to the ones presented in this book – heat transfer, elastic wave propagation (with possible extension to car crash simulations) and fossil fuels extraction

– Coupling the explicit simulation using IGA-ADS algorithm with a discrete graph-based vasculature model to create a full melanoma growth simulation running in linear time

– Adopting the parallelL2_{-projection algorithm to create an efficient, smooth misfit approximation strategy}

(22)

Using spatial adaptivity in time dependent problems entails additional complexity. That is true from the implementation point of view, since the mesh may differ between time steps and developing some strategy of mesh management is necessary. It is also the case when it comes to formal analysis of method convergence, since adaptivity gives a finer control over the error of spatial discretization and so ensuring that both time and space discretization errors play nicely together while at the same time exploiting this additional control to improve efficiency of the computation requires more care than in the non-adaptive case. This chapter attempts to address both concerns, by describing some practical adaptive strategies for time dependent problems, including a novel one that allows parallel execution of time steps, and presenting some results concerning convergence of fully discrete schemes.

The rest of the chapter is structured as follows. First, we need to establish some basic notions and definitions regarding mesh adaptation and time dependent problems. Having done that, we take a close look at popular classical strategies for such problems, as well as the one allowing parallel execution of time steps, pointing out the differences and benefits. Finally, we develop some general conditions allowing to prove convergence of spatially adaptive scheme for time dependent problems and discuss its applicability to previously discussed approaches.

3.1. Formalism of mesh adaptation

3.1.1. Informal discussion

Let us start by recalling basic concepts of adaptive Finite Element Method. The problem we seek to solve is

posed on some suitably regular domainΩ_{∈ R}n_{. The exact solution – or, in case of time dependent problems}

we consider here, value of the solution at each point in time – exists in some function spaceU on Ω, usually a

Sobolev spaceHk_{(Ω) with k depending on the order of spatial differential operators appearing in the weak problem}

formulation. To make the problem amenable to numerical computation, instead of the exact solution we seek its

approximation in some finite-dimensional subspacesV _{⊂ U. In practice, such spaces arise as a span of some set of}

basis functions, constructed based on a mesh – discretization of the domainΩ, and some additional data (i.e. orders

for commonly used piecewise polynomial shape functions).

The computation starts out using some initial mesh and associated set of basis functions. Following that, multiple adaptation steps are performed. In each step, based on the results obtained in the previous step, we perform

some refinement, that changes the approximation space. Inhp-adaptive FEM there are two kinds of refinement:

h-refinement, consisting in breaking some mesh element into smaller elements, and p-refinement, that affects orders of polynomial basis functions. Usually there are multiple possible ways to perform a refinement and some error

estimation strategy must be employed (e.g. [DRD02]). Regardless of the specific procedure, refining some spaceV

yields a different space ˜V that is supposed to be a better approximation of U . Space V may be a subspace of ˜V , but

(23)

3.1. Formalism of mesh adaptation 23

...

V

init

V

end

Figure 3.1: Partial order on set of FE spaces (represented by meshes –h-adaptations only). Arrows from space A

toB represents a single elementary adaptation step, i.e. (A, B)∈ E

We do, however, often need a way to transfer elements between the approximation spaces. For example, automatic hp-adaptive FEM developed by Leszek Demkowicz and his group involves projecting fine mesh solutions on spaces

corresponding to coarser grids using Projection Based Interpolation algorithm [Dem06, DKP+_{07, DRD02, Dem04,}

ODRW89]. Moreover, as we shall see, adaptive strategies for time dependent problems often require bringing solution from the previous time step into the space considered in the current one, where none of these two spaces is necessarily a refinement of the other. Thus, to accommodate such techniques we need the ability to freely move between any two approximation spaces.

Finally, while it is not essential to our considerations, there are practical limitations on adaptation. For instance,

depth ofh-adaptations is limited by accuracy of floating point numbers used in the computation – as breaking

the elements decreases their diameter exponentially, above some point further adaptation becomes impractical. Similarly, using very high order polynomial shape functions can lead to numerical issues. Thus, we can assume

existence of the unique maximally refined space. In case ofhp-adaptive FEM, it arises from the finest mesh and

highest degree basis functions.

3.1.2. Formal adaptation model

Based on the above ideas, we can formulate a mathematical model of adaptation. Since all kinds of adaptations are ultimately reflected in changes of the approximation spaces, we shall not mention explicitly the mesh and basis functions and instead speak of abstract approximation spaces, with the understanding that this perspective is fully equivalent to the more „computational” one.

LetU be a separable Banach space containing the exact solution of the problem being considered. ByV we

shall denote the family of finite dimensional spacesV _{⊂ U with norms induced by U. These are the Finite Element}

(24)

Let_{E denote the binary relation on V such that (A, B) ∈ E means that B can be obtained from A using a} single, elementary adaptation (e.g. breaking a single mesh element, locally increasing an order of polynomial shape

functions etc.) We assume this relation is asymmetric. Let_{≺ = E}+be the transitive closure of_{E, so that A ≺ B}

means thatB can be obtained from A by a series of elementary adaptations. Relation_{≺ defines a strict partial}

order on_{V. To formalize the intuitive meaning of V}init, we require it to be the least element inV ordered by ≺.

Analogously,Vendis required to be the greatest element inV.

Finally, since as previously mentioned, we want to be able to move approximate solutions between spaces in_V,

we require existence of family of continuous linear projectionsπA,B: A→ B for all A, B ∈ V, with πA,A= idA.

3.2. General time dependent problem formulation

For clarity we shall formulate a fairly general abstract form of a first order time dependent problem, subsuming

common cases. Let us fix a separable Banach spaceU and T > 0, I = [0, T ]. Let us consider a time dependent

problem with values inU given by

   u0(t) = f (t, u(t)), t∈ I u(0) = u0 (3.1)

whereu0 ∈ U is an initial state and f : I × U → U0is some function assuming values inU0, continuous dual

ofU . The exact solution u belongs to spaceC1_{(I, U ) of continuously differentiable U -valued functions on I, that}

isu(t)_{∈ U for each t ∈ I , u is Fréchet differentiable at each point of I and u}0_{: I}

→ U0_{is continuous.}

3.2.1. Examples

This formulation has a convenient generality. By appropriately choosing value spaceU and right hand side

functionf we can adapt this formulation to various kinds of time dependent problems. Setting U = Rn_{turns (3.1)}

into a system of ODEs. Choosing a Sobolev spaceU = H1_{(Ω) on the other hand leads to a variational formulation}

of time-dependent PDE. For example, by setting

hf(t, u), vi = − (∇u, ∇v) + Z

∂Ω

gv ds we obtain a weak formulation

∂u ∂t, v =− (∇u, ∇v) + Z ∂Ω gv ds ∀v ∈ H1 (Ω) of the heat equation with non-homogeneous Neumann boundary conditions:

     ∂u ∂t = ∆u ∇v · ˆn = g

We can also impose Dirichlet boundary conditions onΓ_{⊂ ∂Ω by restricting U to the space of functions vanishing}

onΓ and modifying f to include the lift function [Dem06, Bre11]. Finally, using some approximation space V _{∈ V}

produces problem formulation after Galerkin spatial semi-discretization (equivalent to system of ODEs).

3.3. Discretization order

Usually in time dependent problems time dimension is discretized separately from the space. One of these is discretized first, producing a semi-discrete formulation which later, upon discretizing the remaining dimensions,

(25)

3.4. Difference schemes 25 becomes a fully discrete formulation. In a simple, non-adaptive setting the order of discretization (space first or time first) largely does not matter, but the situation becomes more complicated in presence of adaptations.

Let us illustrate both possibilities using an example generic parabolic problem      ∂u ∂t − Lu = f t∈ [0, T ] u(0) = u0 (3.2)

where_{L is some second order elliptic spatial differential operator. To perform spatial semi-discretization first, we}

start by deriving a weak formulation of (3.2). ∂u

∂t, v

= b(u, v) + (f, v) (3.3)

for some bilinear formb. Using Galerkin method with some space V ∈ V spanned by a set of basis functions Bi,

i = 1, . . . , n leads to the following, finite-dimensional problem: for each t_{∈ [0, T ], find u(t) ∈ V such that eq. (3.3)}

holds for allv_{∈ V . We can write u as}

u(x, t) =

n

X

i=1

ui(t)Bi(x)

and the problem formulation is equivalent to system of ODEs:

n X i=1 (Bi,Bj) u0i = n X i=1 b(Bi,Bj)ui+ (f,Bj)

forj = 1, . . . , n. This approach is known as the Method of Lines [Sch91, ˘C11] and has the advantage of reducing

the problem to the formulation that allows one to employ very well developed theory of ODEs. On the other hand,

such formulation essentially freezes spaceV and does not allow to easily and naturally use spatial adaptations.

The other approach – performing time discretization first – is known as Rothe’s method [ ˘C11] and provides

different benefits. For the sake of an example, let us use the simple backward Euler scheme. Applying it to eq. (3.2) yields

(1− τ L)u(i+1)_{= u}(i)_{+ τ f}

whereτ is the time step size. Since u(i)_{– solution from the previous time step – is a known function, the above}

equation can be seen as a stationary problem, and so all the standard techniques for such problems may be employed. In particular, approximation spaces for each time step can be independently chosen. For this reason it seems to be a much more suitable framework for approaches involving spatial adaptation and so we adopt it in the description of adaptive FEM strategies.

3.4. Difference schemes

In our considerations we assume the time is discretized using a finite difference method. In other words, we will

approximate the exact solutionu_{∈ C}1_{(I, U ) by the grid function u}

τ: S→ U, where S = {0, τ, . . . , Kτ = T } for

a fixed number of time stepsK. For simplicity we denote uτ(iτ ) by ui. Furthermore, we shall restrict our attention

to the class of two-level schemes, that is schemes involving values of the solution from exactly two time steps. Many of these can be expressed as

Li+1(ui+1) = Ri(ui) (3.4)

where_{Li}, {Ri} are families of (not necessarily linear) operators U → U0, i.e. can be separated algebraically into

components containing exclusivelyui+1andui. This class comprises many widely used finite difference schemes,

(26)

– explicit Euler ui+1− ui τ = f (ti, ui) ⇐⇒ u| {z }i+1 Li+1(ui+1) = ui+ τ f (ti, ui) | {z } Ri(ui) – implicit Euler ui+1− ui

τ = f (ti+1, ui+1) ⇐⇒ u_|i+1− τ f(t_{zi+1, ui+1)_}

Li+1(ui+1) = ui |{z} Ri(ui) – Crank-Nicolson ui+1− ui τ = 1 2[f (ti, ui) + f (ti+1, ui+1)] ⇐⇒ ui+1− τ 2f (ti+1, ui+1) | {z } Li+1(ui+1) = ui+ τ 2f (ti, ui) | {z } Ri(ui)

In the following discussion of time-dependent adaptive strategies we shall use operatorsLi,Ridefined on

various approximation spacesV ∈ V. Technically we should distinguish them e.g. by adding a superscript denoting

the space, but in our setting they can be seen as restrictions of corresponding operators defined on the exact solution

spaceU and the precise space is clear from the context, so the space index can be omitted without the danger of

confusion.

3.5. Strategies

This section presents three possible approaches to spatialhp-adaptations using automatic adaptive Finite Element

Method presented in sec. 2.2.1 to solve time dependent problems. As noted in sec. 3.3, using Rothe’s method we reduce time dependent problem to a sequence of stationary ones that can be discretized independently. Each time step

corresponds to a stationary problem of the form (3.4) posed on some approximation spaceVi+1∈ V, and consists

of multiple adaptive iterations, where the problem is solved on a coarse and fine meshes and appropriate refinements

are chosen based on an error estimate, until the acceptable error level is reached. Since in general Vi6= Vi+1and

soui,ui+1belong to different approximation spaces, one adjustment to eq. (3.4) needs to be made:uineeds to be

explicitly brought intoVi+1using the projection operators whose existence we required in the adaptation model

from sec. 3.1. Thus, the problem solved in each time step can be expressed as

Li+1(ui+1) = Ri(πVi,Vi+1(ui)) (3.5)

It is clear that we can reuse adaptation techniques developed for this kind of problems, but there are multiple pos-sibilities regarding changing mesh over time. This choice affects performance, solution accuracy and parallelization potential.

3.5.1. Independent adaptations in each time step

The simplest and most natural strategy is to use the same initial mesh in every time step and restart the adaptation process each time. Detailed pseudocode of this approach is presented as Algorithm 3.1. Variants of this approach are commonly used, see e.g. in [MSP15, DSCK10, SDK10].

Properties

One clear advantage of rebuilding the mesh from scratch in each step is that all the adaptations are chosen based on the exact problem being solved, so the mesh is free to adapt in an optimal way. This decreases the number of degrees of freedom required to achieve desired accuracy and avoids the problems stemming from transient

(27)

3.5. Strategies 27

Algorithm 3.1: Pseudocode of the strategy restarting adaptations fromVinit

1 V0← Vinit

2 fori = 0 to K− 1 do

3 A← V_init

4 computeucoarse_i+1 ∈ A by solving Li+1(ui+1coarse) = Ri(πVi,A(ui)) in the space A

0_{= V}0

init

5 repeat

6 globalhp–refinement of the space A to B A, B ∈ V

7 computeufine_i+1∈ B by solving Li+1(ui+1fine) = Ri(πVi,B(ui)) in the space B

0

8 estimate errore using ucoarse_i+1 , ufine_i+1

9 select optimallyhp-refined space C with A ≺ C ≺ B

10 computeui+1∈ C by solving Li+1(ui+1) = Ri(πVi,C(ui)) in the space C

0

11 A← C

12 ucoarse_i+1 ← ui+1

13 untile < 14 V_i+1← A 15 end

t

i

t

i+1

u

i

Figure 3.2: Diagram of the first strategy

localized phenomena present in approaches retaining some adaptations from previous time steps. On the other hand, this approach requires fully recomputing the adaptations from the initial coarse mesh and thus discards all the information about the mesh structure from the previous steps. Depending on the nature of the problem, behavior of the solution and time step size, optimal mesh for the current step may be very similar to the one from previous step.

Parallelization potential

Flow of data in the algorithm is depicted in Figure 3.2. It is clear that there is a data dependency between

subsequent time steps – to start stepti+1, we need the solutionuifrom time stepti. This precludes executing both

time steps in parallel, since time steptimust be finished before it is possible to start executing the next time step.

Thus parallelism can only be introduced at the level of single time steps – solving stationary problems.

3.5.2. Starting with mesh from the previous time step

Another idea is to carry out the adaptations at each time step starting from the fine mesh obtained in the previous time step. Examples of using this strategy can be seen in e.g. [SAuS10, BHL15]. Detailed description of this approach is presented in Algorithm 3.2. The pseudocode is very similar to the previous strategy – the only difference

is the choice of initial space in each time step. Note that the choosingVi removes the need of using projection

(28)

Algorithm 3.2: Pseudocode of the strategy reusing the mesh from previous step

1 V0← Vinit

2 fori = 0 to K− 1 do

3 A←V_i

4 computeucoarse_i+1 ∈ A by solving Li+1(ucoarsei+1 ) = Ri(ui) in the space A0 =Vi0

5 repeat

6 globalhp–refinement of the space A to B A, B ∈ V

7 computeufine_i+1∈ B by solving Li+1(ui+1fine) = Ri(πVi,B(ui)) in the space B

0

8 estimate errore using ucoarse_i+1 , ufine_i+1

9 select optimallyhp-refined space C with A ≺ C ≺ B

10 computeui+1∈ C by solving Li+1(ui+1) = Ri(πVi,C(ui)) in the space C

0

11 A← C

12 ucoarse_i+1 ← ui+1

13 untile < 14 V_i+1← A 15 end

t

_i

t

_i+1

V

i

, u

i

t

_i+2

V

_i+1

, u

_i+1

Figure 3.3: Diagram of the second strategy

Properties

Compared to the first strategy, an advantage of this approach is that it reuses the already adapted mesh from previous time steps. That is clearly desirable if the domain regions requiring increased accuracy remain static during the simulation, as in such case retaining adaptations from previous time steps avoids recomputing the same or very similar adaptations over and over again in each time step, thus skipping lots of redundant work. In such case, each step is likely to require little or no further adaptations, while at the same time the simulation maintains high quality of the mesh. On the other hand, if the solution possesses features requiring higher accuracy that move across the mesh in time, adaptations accumulate over the time steps in the regions that at some point needed increased accuracy. As a result, for such problems retaining previous adaptations leads to an unnecessarily fine mesh, and so lots of additional degrees of freedom that do not contribute much to solution accuracy.

Parallelization potential

Flow of data is presented in Figure 3.3. In this strategy there is even stronger dependency between steps – not

only is the solutionuirequired to start executing the next step, but also the mesh the adaptation process starts with.

(29)

3.5. Strategies 29 Improvement

One variant of this approach, applicable to problems that use some kind material data, consists in using a mesh adapted to that data as a starting point instead of a simple coarse one. The difference compared to Algorithm 3.2 is

that before the loop over time steps a space ˜V is built by adapting Vinitto the material data andV0is initialized not

withVinit, but with ˜V . In this method we assume that shape of the material data influences the static problems in

such a way that the regions requiring higher accuracy correspond to those where increased precision is needed to

approximate the material data. Such approach has been adapted e.g. in [SPMG14, GSPM13, GSP+13]. In these

papers the material data has been approximated using Projection Based Interpolation algorithm [Dem04].

3.5.3. Parallel strategy

Both previous strategies possess data dependencies between subsequent time steps that prevent the possibility of

parallel execution. While achieving full independence of time steps is not possible, since time stepti+1inevitably

requires value ofuiat some point in the computation, the nature of thehp-adaptive algorithm – in particular, the

fact that intermediate solution approximations are produced in the process – allows us to relax this dependency by

using these solutions. This approach was proposed in [SLS+_{15]. Numerical results for this strategy are presented in}

sections 5.1 (heat transfer in L-shaped domain) and 5.2 (Pennes bio-heat equation). Assumptions

Unlike the previous algorithms, this one is inherently parallel. We assume the algorithm is executed on a cluster

consisting ofK processors – the same as number of time steps. Furthermore, we assume the cluster provides

MPI-like API for interprocess communication – each process is assigned a rank, processes can send and receive data and it is possible to execute a global barrier to synchronize all the processes.

Algorithm

In this strategy, every processor is responsible for performing computations for a single time step. All the

processors performM iterations of the adaptation in lockstep – after each iteration a global synchronization takes

place. Adaptations are performed starting with the initial coarse mesh, as in the first strategy described in sec. 3.5.1, thus all the advantages of this approach are retained in the parallel strategy. Unlike the previously described strategies,

this one takes advantage of the structure of automatichp-adaptive FEM algorithm. Each adaptation iteration consists

of the following steps:

– computing solution on a coarse mesh

– globalhp-refinement producing a fine mesh

– computing solution on a fine mesh – selecting an optimal refinement

in order to compute solution, solution for the previous time step is required. Since a single processor computes solution for a single time step, processors need to communicate to obtain these solutions. Each processor waits for the solution from previous processor before computing solution for its own time step (both on coarse and fine mesh), and having done that, sends his solution to the next processor. Detailed description of the strategy is presented as Algorithm 3.3. For the first and the last processors the algorithm is slightly different – first processor uses initial state instead of receiving solution from the previous processor (since there is none), and the last one does not need to send its solution. For clarity, proper handling of these special cases is omitted in the algorithm description.

Index of /rozprawy2/11494

Akademia Górniczo-Hutnicza

im. Stanisława Staszica w Krakowie

Wydział Informatyki, Elektroniki i Telekomunikacji

K

I

P

RACA DOKTORSKA

M

ARCIN

Ł

O ´S

E

FEKTYWNE ALGORYTMY DLA RÓ ˙

ZNYCH ARCHITEKTUR

MASZYN RÓWNOLEGŁYCH DO PRZEPROWADZANIA

SYMULACJI PROCESÓW NIESTACJONARNYCH

P

:

prof. dr hab. Maciej Paszy´nski

AGH

University of Science and Technology in Krakow

Faculty of Computer Science, Electronics and Telecommunications

D

C

S

P

H

D T

HESIS

M

ARCIN

Ł

O ´S

E

FFICIENT ALGORITHMS FOR NON

-

STATIONARY PROBLEM

SIMULATIONS ON PARALLEL ARCHITECTURES

S

:

prof. Maciej Paszy´nski

Contents

List of Tables

1. Introduction

1.1. Motivation

1.2. Main thesis

1.3. Structure of this book

2.1. Finite Element Method

2.2.

hp-FEM

2.2.1. Automatic

hp-adaptivity

2.3. IGA

2.4. B-splines

2.5. State-of-the-art FEM solvers

2.5.1. Direct solvers

2.5.2. ADS solver

2.6. Summary of open problems

2.7. Main scientific contributions

3.1. Formalism of mesh adaptation

3.1.1. Informal discussion

...

...

V

V

3.1.2. Formal adaptation model

3.2. General time dependent problem formulation

3.2.1. Examples

3.3. Discretization order

3.4. Difference schemes

3.5. Strategies

3.5.1. Independent adaptations in each time step

t

t

u

3.5.2. Starting with mesh from the previous time step

t

t

V