• Nie Znaleziono Wyników

Fast solvers

N/A
N/A
Protected

Academic year: 2021

Share "Fast solvers"

Copied!
64
0
0

Pełen tekst

(1)

Fast solvers

for mesh-based computations

Maciej Paszyński

Department of Computer Science AGH University, Krakow, Poland

home.agh.edu.pl/paszynsk

Collaborators:

Anna Paszyńska (Jagiellonian University,Poland) David Pardo (UPV/BCAM/IKERBASQUE,Spain) Victor Calo (UWA,Australia)

Keshav Pingali (ICES,UT,USA)

Luis Garcia-Castillo (Carlos III Univ. Madrid) Leszek Demkowicz (ICES,UT,USA)

Mikhail Moshkov (KAUST,Saudi Arabia) Lisandro Dalcin (KAUST,Saudi Arabia)

PhD Students:

Piotr Gurgul (PhD def. 10/2014)

Arkadiusz Szymczak (PhD def. 5/2015) Marcin Sieniek (PhD def. 11/2015) Maciej Woźniak

Marcin Łoś Konrad Jopek

Marcin Skotniczny

Grzegorz Gurgul

1

(2)

Outline

PART I - Introduction

1. Sparse-matrix-based direct solvers

(global matrix storage, ordering, elimination trees, LU factorizations) 2. Mesh-based direct solvers

(element partition trees, orderings, LU factorizations)

PART II - Benefits of mesh-based solvers and element partition trees 1. New ordering algorithms

2. Transforming element partition tree into an ordering

3. Straightforward parallelization with element partition trees 4. Reutilization of partial LU factorizations

5. Reuse of identical LU factorizations

2

(3)

Computational mesh, sparse matrix and direct solvers

Sparse matrices as seen by classical sparse-matrix-based direct solvers We miss some important information here:

• What is the structure of the mesh? Is it 1D, 2D or 3D? Uniform or h refined?

• What are the basis functions spread over the mesh? Are they uniform or p adaptive?

• What is the discretization method? 3

(4)

Computational mesh, sparse matrix and direct solvers

Two dimensional mesh, finite element method, linear basis functions

4

(5)

Computational mesh, sparse matrix and direct solvers

Two dimensional mesh, finite element method, quadratic basis functions

5

(6)

Computational mesh, sparse matrix and direct solvers

Two dimensional mesh, isogeometric finite element method, quadratic B-splines

6

(7)

Computational mesh, sparse matrix and direct solvers

Two dimensional mesh, isogeometric collocation method, quadratic B-splines

7

(8)

Sparse matrix based direct solvers

Sparse global matrix, stored in some compressed manner, e.g.

• coordinate format,

• CSC format

• CSR format

(see e.g. Sparse Matrix Computations lecture notes by Jean Yves L’Excellent et al.

http://graal.ens-lyon.fr/~bucar/CR07/introSparse.pdf for more details)

8

(9)

Sparse matrix based direct solvers

Ordering P

P

-1

AP

Ordering can be stored in a vector

Several algorithms for constructing of the ordering looking at the structure of the sparse matrix, e.g.

• nested-dissections (METIS)

• aproximate minimum degree (AMD)

• PORD

… available through MUMPS solver interface

9

Ordering generator

(10)

Sparse matrix based direct solvers

1 X X X X X X X X X X 2 X X 0 X X X X 3 X X 0 X X X X 4 X X 0 X X X X 5 X X 0 X X X X

10

5 X X X X 4 X X 0 X X 3 X X 0 X X 2 X X 0 X X 1 X X X X X 0 X X X X Impact of ordering on factorization

Elimination of the first row with two different orderings:

Ordering P

P

-1

AP

Ordering generator

(11)

Ordering generator

Sparse matrix based direct solvers

Ordering P

P

-1

AP

Elimination tree

Elimination tree is constructed internally by the solver The ordering defines elimination tree (is it 1 to 1 ?)

Elimination tree expresses dependencies between particular steps of the factorization For more details on the elimination tree

see e.g. Sparse Matrix Computations lecture notes by Jean Yves L’Excellent et al..:

http://graal.ens-lyon.fr/~bucar/CR07/lecture-etree.pdf http://graal.ens-lyon.fr/~bucar/CR07/factorization.pdf

It helps with memory management, parallelization of the factorizations)

LU factorization

11

Sparse-matrix-based solver

(12)

Sparse matrix based direct solvers

When you eliminate row a

you need to subtract it from row c So  row c is eliminated after row a

When you eliminate row c or row f you need to subtract them from row g

So  row g is eliminated after rows c and f

12

Ordering generator Ordering P

P

-1

AP

Elimination tree

LU factorization Sparse-matrix-based solver

(13)

Mesh-based solver

Mesh-based solvers

Mesh-based solver can perform LU factorization based on

• element partition tree

• list of element (dense) matrices

LU factorization

Element partition tree

Element matrices

13

Element partition tree generator

(14)

Mesh-based solvers

Sparse-matrix-based solver

Ordering for sparse-matrix-based solver can be also computed based on the element partition tree and pass it to sparse-matrix-based solver

Elimination tree LU factorization

Ordering P

P

-1

AP

14

Element partition tree

Element partition tree generator

(15)

Element partition trees

Several algorithms has been proposed to generate element partition trees:

• Area & neighbors algorithm (2D grids h adaptive )

Paszyńska A., Paszyński M., Jopek K., Woźniak M., Goik D., Gurgul P., AbouEisha H., Moshkov M., Calo V. M., Lenharth V. M., Nguyen D., Pingali K., 2015. Quasi-Optimal Elimination Trees for 2D Grids with Singularities, Scientific

Programming, Volume 2015 Article ID 303024:1-18.

• Volume & neighbors algorith for (3D grids h adaptive )

Paszyńska A., Volume and neighbors algorithm for finding elimination trees for three dimensional h-adaptive grids, Computers & Mathematics with Applications, 68 (10) (2014) 1467-1478.

• Bisections weighted by element size (2D, 3D, h and hp adaptive grids)

H. AbouEisha, V. Calo, K. Jopek, M. Moshkov, A. Paszyńska, M. Paszyński, M.Skotniczny, Element Partition Trees for Two- and Three-Dimensional h-Refined Meshes and Their Use to Optimize Direct Solver Performance, Journal of Computational Science (2016) submitted

15

(16)

Computational mesh, sparse matrix and direct solvers

Bisection weighted by element size for 3D h adaptive and hp adaptive grids

H. AbouEisha, V. Calo, K. Jopek, M. Moshkov, A. Paszyńska, M. Paszyński, M.Skotniczny,

Element Partition Trees for Two- and Three-Dimensional h-Refined Meshes and Their Use to Optimize Direct Solver Performance, Journal of Computational Science (2016) submitted

16

(17)

Element partition trees

17

(18)

Computational mesh, sparse matrix and direct solvers

Bisections weighted by element size GRAPH: VERTICES = ELEMENTS

WEIGHT = SCALLED ELEMENT SIZE * ORDER OF APPROXIMATION EDGES = ADJACENCY RELATION

WEIGHT = ORDER OF APPROXIMATION

18

(19)

Computational mesh, sparse matrix and direct solvers

Bisections weighted by element size

19

(20)

Element partition trees

Ratio between Bisections weighted by element size and METIS: 1.52

Ratio between Bisections weighted by element size and AMD or PORD: 1.26

20

(21)

Ordering based on element partition tree

21

(22)

Ordering generator Sparse-matrix-based solver

Ordering based on element partition tree

Ordering P

P

-1

AP

Element partition tree can be translated into an ordering and passed to sparse matrix based solver

Elimination tree

LU factorization

22

(23)

Ordering based on element partition tree

Ordering is generated by post-order transition of the element partition tree, and listing nodes of the mesh that can be eliminated at this point

23

(24)

Ordering based on element partition tree

List interior of element 1:

44

24

(25)

Ordering based on element partition tree

List faces belonging to element 1 only:

44, 33,35,37,38,41

25

(26)

Ordering based on element partition tree

List edges belonging to element 1 only:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27,

26

(27)

Ordering based on element partition tree

List vertices belonging to element 1 only:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7

27

(28)

Ordering based on element partition tree

List interior of element 2:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45

28

(29)

Ordering based on element partition tree

List faces belonging to element 2 only:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43

29

(30)

Ordering based on element partition tree

List edges belonging to element 2 only:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32

30

(31)

Ordering based on element partition tree

List vertices belonging to element 2 only:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11

31

(32)

Ordering based on element partition tree

List faces belonging to elements 1 and 2:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11, 42

32

(33)

Ordering based on element partition tree

List faces belonging to elements 1 and 2:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11, 42, 14,30,21,29

33

(34)

Ordering based on element partition tree

List vertices belonging to elements 1 and 2:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11, 42, 14,30,21,29, 3,4,10,9

34

(35)

Ordering based on element partition tree

List vertices belonging to elements 1 and 2:

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7, 45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11, 42, 14,30,21,29, 3,4,10,9

 Send this ordering through PERM_IN into MUMPS solver

35

(36)

Ordering based on element partition tree

The orderings generated by bisection weighted by element size are passed to MUMPS in PERM_IN array

( with option icntl(7)=1 ) ( MUMPS automatic icntl(7)=7 )

36

(37)

Ordering based on element partition tree

37

Internal nodes of element 1

Interface nades (between element 1 and 2)

(38)

Ordering based on element partition tree

38

Internal nodes of element 2

Interface nades (between element 1 and 2)

(39)

Ordering based on element partition tree

39

Interface nades (between element 1 and 2)

(40)

Straightforward parallelization

Straighforward parallelization

44, 33,35,37,38,41, 16,17,23,24,13,28,20,27, 1,2,8,7,

45, 34,36,39,40,43, 18,19,25,26,15,27,31,32, 5,6,12,11, 42, 14,30,21,29, 3,4,10,9

Unfortunatelly this information cannot be send to MUMPS (it has to recover it by constructing its own elimination tree)

40

(41)

Straightforward parallelization

Level by level processing of element partition trees

41

Paszyńska A., Paszyński M., Jopek K., Woźniak M., Goik D., Gurgul P., AbouEisha H., Moshkov M., Calo V. M., Lenharth V. M., Nguyen D., Pingali K., 2015. Quasi-Optimal Elimination Trees for 2D Grids with Singularities, Scientific Programming, Volume 2015 Article ID 303024:1-18.

(42)

Straightforward parallelization

Level by level processing of element partition trees

42

(43)

Straightforward parallelization

Level by level processing of element partition trees

43

(44)

Reutilization of partial LU factorizations

Processing element partition tree

44

Maciej Paszyński, David Pardo, Victor Calo, A direct solver with reutilization of LU factorizations for h-adaptive finite element grids with point singularities,

Computers and Mathematics with Applications (2012) 65(8) 1140-1151 Anna Paszyńska, Graph-grammar greedy algorithm for reutilization of

partial LU factorization over 3D tetrahderal grids, Journal of Computational Science (2016) submitted

(45)

Reutilization of partial LU factorizations

Processing element partition tree

45

(46)

Reutilization of partial LU factorizations

Processing element partition tree

46

(47)

Reutilization of partial LU factorizations

Processing element partition tree

47

(48)

Reutilization of partial LU factorizations

Processing element partition tree

48

(49)

Reutilization of partial LU factorizations

Processing element partition tree

49

(50)

Reutilization of partial LU factorizations

Processing element partition tree

50

(51)

Reutilization of partial LU factorizations

Updating the element partition tree when refining the mesh

51

(52)

Reutilization of partial LU factorizations

Processing element partition tree

52

(53)

Reutilization of partial LU factorizations

Updating the element partition tree when refining the mesh

???

53

(54)

Reutilization of partial LU factorizations

Updating the element partition tree when refining the mesh – optimistic case

54

(55)

Reutilization of partial LU factorizations

Updating the element partition tree when refining the mesh – pesimistic case

55

(56)

Reutilization of partial LU factorizations

Updating the element partition tree when refining the mesh

56

(57)

Reuse of partial LU factorizations

Marcin Sieniek, Maciej Paszyński, Subtree reuse in multi-frontal solvers on regular grids in Step-and-Flash Imprint Lithography Modeling, Advanced Engineering Materials (2014) 16(2) 231-240

Ignacio Martinez-Fernandez, Maciej Woźniak, Luis-Garcia Castillo, Maciej Paszyński,

Mesh-Based Multi-Frontal Solver with Reuse of Partial LU Factorizations for Antenna Array

Journal of Computational Science (2016) submitted 57

(58)

Reuse of partial LU factorizations

Reuse of identical LU factors over identical branches of element partition tree

58

(59)

Reuse of partial LU factorizations

Reuse of identical LU factors over identical branches of element partition tree

59

(60)

Reuse of partial LU factorizations

60

First problem is a „toy antenna” approximated with 18 tetrahdrals

(61)

Reuse of partial LU factorizations

61

Second problem is „real antenna” approximated with 3297 tetrahedrals

(62)

Reuse of partial LU factorizations

62

Third problem is „real antenna” approximated with 8324 tetrahedrals

(63)

Conclusions

Mesh-based solvers and element partition trees allow to speed up the LU factorization process This can be done by using

1. New ordering algorithms, such as Bisections weighted by element size 2. Straightforward parallelization based on the element partition trees

3. Reutilization of partial LU factorizations over unmodified mesh elements 4. Reuse of LU factorizations over identical parts of the mesh

63

(64)

Maciej Paszyński, Fast solvers for mesh-based computations, Taylor & Francis, CRC Press 2016

https://www.researchgate.net/profile/Maciej_Paszynski (Table of Contents + Introduction)

64

Cytaty

Powiązane dokumenty

➢ accesses to the same element must be from the tested memory level – separated by sufficient number of accesses to different elements to evict from levels of memory closer to

11 , it can be observed that the total measured concentration of the 10 corresponding individual tar compounds obtained from all three fuels under gasi fication at atmospheric

Narrative representations of the communist period were created in the Karta exhibition entitled “Faces of Totalitarianism,” temporary exhibi- tions organized by the SocLand

Incomplete Cholesky preconditioned CG, or IC-CG, is not, however, an ideal iteration, as is well-known for the single-phase flow case, where the number of iterations required to

We have developed stable finite element schemes for two-fluid flow problems with in- terfacial tension and presented some numerical results for rising bubble problems.. These are

Then based on the analysis of numerical results and a rough estimate of the number of floating point operations required for the frequency domain and time domain imaging algorithms,

(Color online) Intensity patterns of the reflected field when the incident Gaussian beam is TM polarized and the spot is focused in the center of a pit (left) or in the middle

Sixteen of these patients harboured 41 polyps (size range 1 – 20 mm) and 15 patients did not harbour polyps. The results of the evaluations are then checked and feedback is provided