• Nie Znaleziono Wyników

Frontal and multi-frontal solvers: Generalization to isogeometric finite element method

N/A
N/A
Protected

Academic year: 2021

Share "Frontal and multi-frontal solvers: Generalization to isogeometric finite element method"

Copied!
65
0
0

Pełen tekst

(1)

Maciej Paszynski

Department of Computer Science

AGH University of Science and Technology, Krakow, Poland maciej.paszynski@agh.edu.pl

http://home.agh.edu.pl/paszynsk

http://www.ki.agh.edu.pl/en/staff/paszynski-maciej http://www.ki.agh.edu.pl/en/research-groups/a2s

Main collaborators Victor Calo (KAUST)

Leszek Demkowicz (ICES, UT) David Pardo (IKERBASQUE)

Frontal and multi-frontal solvers:

Generalization to isogeometric finite

element method

(2)

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Strong formulation

d

dx A x   d u

d x



  B x   d u

d xC x   u f x  

u 0

 

0

A 1

 

d u 1

 

d x  u 1

 

Weak formulation

Find uV

uH 1

 

0,1 : u 0

 

0

s.t.

b v,u

 

l v

 

, v V

v H 1

 

0,1 : v 0

 

0

b v,u

 

A x

 

d v

d x d u

d xB x

 

v x

 

d u

d xC x

 

v x

 

u x

 

 

d x

0

1 v 1

 

u 1

 

l v   v 1  

(3)

Find uV   u H

1

  0,1 : u 0   0s.t.

b v,u   l v   , v V v H

1

  0,1 : v 0   0

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Using B-splines as basis functions

u

 

x Ni,p

 

x

i

di v

 

x Nj ,p

 

x

b N

j ,p

 

x ,Ni,p

 

x

ai l N

j ,p

 

x

i

, j

contribution of b(N2,1;N3,1) Linear B-splines

(4)

Find uV   u H

1

  0,1 : u 0   0s.t.

b v,u   l v   , v V v H

1

  0,1 : v 0   0

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Using B-splines as basis functions

u

 

x Ni,p

 

x

i

di v

 

x Nj ,p

 

x

b N

j ,p

 

x ,Ni,p

 

x

ai l N

j ,p

 

x

i

, j

contribution of b(N2,2;N3,2) Quadratic B-splines

(5)

GRAPH GRAMMAR PRODUCTIONS AS ATOMIC TASKS

We assign indices to grammar productions in order to localize the places where the graph grammar productions were fired

The elimination tree obtained by executing the following sequence of productions (P1)-(P2)1-(P2)2-(P2)3-(P2)4-(P3)1-(P3)2-(P3)3-(P3)4-(P3)5-(P3)6

(6)

SCHEDULER BASED ON GRAPH COLORING

Dependency relation for construction of the elimination tree (P1)D{(P2)1,(P2)2}

(P2)1D{(P2)3,(P2)4} (P2)3D{(P3)1,(P3)2} (P2)4D{(P3)3,(P3)4} (P2)2D{(P3)5,(P3)6} Alphabet:

A = {(P1) , (P2)1 , (P2)2 , (P2)3 , (P2)4 , (P3)1 , (P3)2 , (P3)3 , (P3)4 , (P3)5 , (P3)6 }

(7)

SCHEDULER BASED ON GRAPH COLORING

Dependency graph

(8)

SCHEDULER BASED ON GRAPH COLORING

Dependency graph

(9)

TRACE THEORY BASED SCHEDULER

(P1)-(P2)1-(P2)2-(P2)3-(P2)4- (P3)1-(P3)2-(P3)3-(P3)4-(P3)5-(P3)6

[(P1)][(P2)1(P2)2][(P2)3(P2)4(P3)5(P3)6][(P3)1(P3)2(P3)3(P3)4] Scheduling according to Foata Normal Form:

Thus, the execution of the solver consists of several steps, where independent tasks are executed in concurrent, interchanged with the synchronization barriers.

    

 

k

 

k

ik kj

k j k i k

k i

n l n n l

l

Da a

l j

l i

k

Ia a l

j i k

A a

a a

a a

a a a

a

a

n

1 1

2 1 2

2 2 2 1 1 1

2 1 1

,..., 1 ,...,

1

,..., 1 ,

...

...

...

...

1 1

i<>j where I=AxA\D

Foata Normal Form

(alphabet)

(10)

GRAMMAR BASED NUMERICAL INTEGRATION

using Gaussian quadrature the integration over the domain can be substituted by a weighted summation over Gauss points

b N

j ,p

 

x , Ni ,p

 

x

A x

 

d Nj ,p

 

x

d x

d Ni ,p

 

x

d xB x

 

Nj ,p

 

x d Ni ,p

 

x

d xC x

 

Nj ,p

 

x Ni ,p

 

x



d x

0

1

  Nj ,p

 

1 Ni ,p

 

1

l N

i,p

  x N

i,p

  1

A x

 

d Ni,p

 

x

d x

d Nj ,p

 

x

d xB x

 

d Ni,p

 

x

d x Nj ,p

 

x C x

 

Ni,p

 

x Nj ,p

 

x



d x

0

1

wl A x

 

l d Ni,p

 

xl

d x

d Nj ,p

 

xl

d xB x

 

l d Ni,p

 

xl

d x Nj ,p

 

xl C x

 

l Ni,p

 

xl Nj ,p

 

xl





l

(11)

GRAMMAR BASED NUMERICAL INTEGRATION

(12)

GRAMMAR BASED NUMERICAL INTEGRATION

(13)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Generation of frontal matrices at leaves of the eliminaton tree expressed as the execution of graph grammar productions (A1)-(A)4-(AN)

(14)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar productions generating local frontal matrices for left boundary, interior and right boundary nodes for linear B-splines

(15)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar productions merging element frontal matrices at parent level

(16)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production eliminating fully assembled row at parent level

(17)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production for solution at root level

Graph grammar production for merging element frontal matrices at root level

(18)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production for recursive backward substitution

(19)

Expression of the solver execution by graph grammar productions

(A1)-(A)4-(AN) (generation of frontal matrices at leaves of the elimination trees) (A2)3 (merging contributions at father nodes)

(E2)3 (elimination of fully assembled nodes)

(A2) – (E2) (merging at parent node followed by elimination)

(Aroot) – (Eroot) (merging at root node followed by full forward elimination) (BS)4 (backward substitutions)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

(20)

SCHEDULER BASED ON GRAPH COLORING

Dependency relation for the solver algorithm {(A1),(A)1}D(A2)1

{(A)2,(A)3}D(A2)2 {(A)4,(AN)}D(A2)3 (A2)1D(E2)1

(A2)2D(E2)2 (A2)3D(E2)3

{(E2)1,(E2)2}D(A2)4 (A2)4D(E2)4

{(E2)3(E2)4}D(Aroot) (Aroot)D(Eroot)

(Eroot)D{(BS)1,(BS)2 (BS)1D{(BS)3,(BS)4}

Alphabet:

A={(A1), (A)1 , (A)2 , (A)3 , (A)4 , (AN), (A2)1 , (A2)2 , (A2)3 , (E2)1 , (E2)2 , (E2)3 , (A2)4 , (E2)4 , (Aroot) , (Eroot) , (BS)1 , (BS)2 , (BS)3 , (BS)4 }

(21)

SCHEDULER BASED ON GRAPH COLORING

Dependency graph

(22)

SCHEDULER BASED ON GRAPH COLORING

Dependency graph

(23)

TRACE THEORY BASED SCHEDULER

Scheduling according to Foata Normal Form:

(A1)-(A)1-(A)2-(A)3-(A)4- (AN)-(A2)1-(A2)2- (A2)3-(E2)1-(E2)2-(E2)3- (A2)4- (E2)4- (Aroot)-(Eroot)-(BS)1-(BS)2-(BS)3-(BS)4

[(A1)(A)1(A)2(A)3(A)4(AN)][(A2)1(A2)2(A2)3][(E2)1(E2)2(E2)3] [(A2)4][(E2)4] [(Eroot)][(Aroot)][(Eroot)][(BS)1(BS)2][(BS)3(BS)4]

Thus, the execution of the solver consists of several steps, where independent tasks are executed in concurrent, interchanged with the synchronization barriers.

    

 

k

 

k

ik kj

k j k i k

k i

n l n n l

l

Da a

l j

l i

k

Ia a l

j i k

A a

a a

a a

a a a

a

a

n

1 1

2 1 2

2 2 2 1 1

1 2 1 1

,..., 1 ,...,

1

,..., 1 ,

...

...

...

...

1 1

Foata Normal Form

(alphabet)

(24)

GRAPH GRAMMAR PRODUCTIONS EXPRESSING THE SOLVER ALGORITHM

Linear B-splines

(25)

GRAPH GRAMMAR PRODUCTIONS EXPRESSING THE SOLVER ALGORITHM

Quadratic B-splines

(26)

NUMERICAL EXPERIMENTS

(27)

1D NUMERICAL RESULTS LINEAR B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(28)

1D NUMERICAL RESULTS QUADRATIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(29)

1D NUMERICAL RESULTS CUBIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(30)

1D NUMERICAL RESULTS QINTIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(31)

COMPARISON WITH CPU MUMPS SOLVER

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock Intel(R) Core(TM)2 Quad CPU Q9400 with 2.66GHz clock, 8GB of memory

Linear B-splines Quadratic B-splines

Cubic B-splines Qintic B-splines

(32)

GENERATION OF 2D ELIMINATION TREE

(33)

GENERATION OF 2D ELIMINATION TREE

(34)

GENERATION OF 2D ELIMINATION TREE

(35)

GENERATION OF 2D ELIMINATION TREE

(36)

GENERATION OF 2D ELIMINATION TREE

(37)

GENERATION OF 2D ELIMINATION TREE

(38)

GENERATION OF 2D ELIMINATION TREE

(39)

GENERATION OF 2D ELIMINATION TREE

(40)

2D NUMERICAL INTEGRATION

(41)

2D NUMERICAL INTEGRATION

(42)

GENERATION OF 2D ELIMINATION TREE

(43)

2D ELIMINATION

Tasks related to single row subtractions

(44)

2D ELIMINATION

Tasks related to single row subtractions

(45)

2D ELIMINATION

Tasks related to single row subtractions

(46)

2D NUMERICAL RESULTS LINEAR B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(47)

2D NUMERICAL RESULTS QUADRATIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(48)

2D NUMERICAL RESULTS CUBIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

(49)

COMPARISON WITH CPU MUMPS ASSEMBLY

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock Intel(R) Core(TM)2 Quad CPU Q9400 with 2.66GHz clock, 8GB of memory

Linear B-splines Quadratic B-splines

Cubic B-splines

Linear B-splines Quadratic B-splines

Cubic B-splines

(50)

COMPARISON WITH CPU MUMPS FACTORIZATION

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock Intel(R) Core(TM)2 Quad CPU Q9400 with 2.66GHz clock, 8GB of memory

Linear B-splines Quadratic B-splines

Cubic B-splines

(51)

COMPARISON WITH CPU MUMPS TOTAL TIME

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock Intel(R) Core(TM)2 Quad CPU Q9400 with 2.66GHz clock, 8GB of memory

Linear B-splines Quadratic B-splines

Cubic B-splines

(52)

COMPUTATIONAL COST OF

SERIAL C 0 and C p-1 SOLVERS

(53)

COMPUTATIONAL COST OF SERIAL C

0

and C

p-1

SOLVERS

(54)

COMPUTATIONAL COST OF SERIAL C

0

and C

p-1

SOLVERS

N.Collier, D. Pardo, L. Dalcin, M. Paszynski, V.Calo; (2012) The cost of continuity:

a study of performance of isogeometric finite elements using direct solvers, Computer Methods in Applied Mechanics and Engineering, 213-216, p. 353-361

(55)

COMPUTATIONAL COST OF SHARED MEMORY

C 0 and C p-1 SOLVERS

(56)

COMPUTATIONAL COST OF

SHARED MEMORY C

0

and C

p-1

SOLVERS

(57)

COMPUTATIONAL COST OF

SHARED MEMORY C

0

and C

p-1

SOLVERS

(58)

1D LINEAR B-SPLINES

(59)

1D QUADRATIC B-SPLINES

(60)

1D CUBIC B-SPLINES

(61)

1D QINTIC B-SPLINES

(62)

2D LINEAR B-SPLINES

(63)

2D QUADRATIC B-SPLINES

(64)

2D CUBIC B-SPLINES

(65)

PAPERS

Multi-frontal solver for IGA discretizations in GPUs

K. Kuznik, M. Paszynski, V. Calo, D.Pardo (2013) Multi-frontal solver for IGA discretizations in GPUs, submitted to Computers and Mathematics with Applications

Graph grammar based 2D isogeometric FEM solver:

K. Kuznik, M. Paszynski, V. Calo (2012) Graph Grammar-Based Multi-Frontal Parallel Direct Solver for Two-Dimensional Isogeometric Analysis, Procedia Computer Science, 9, p.

1454-1463

Computational costs for 1D/2D/3D sequential isogeometric FEM:

N. Collier, D. Pardo, L. Dalcin, M. Paszynski, V.Calo; (2012) The cost of continuity:

a study of performance of isogeometric finite elements using direct solvers,

Computer Methods in Applied Mechanics and Engineering, 213-216, p. 353-361

Graph grammar based 2D FEM solver for distributed memory linux cluster:

M. Paszyński, R. Schaefer (2010) Graph grammar‐driven parallel partial differential equation solver Concurrency & Computations, Practise & Experience 22 (9) p.1063-1097

Graph grammar based 3D FEM solver for distributed memory linux cluster:

M. Paszyński, D. Pardo, A. Paszynska (2010) Parallel multi-frontal solver for p adaptive finite element modeling of multi-physics computational problems,

Journal of Computational Science 1 (1) p.48-54

Cytaty

Powiązane dokumenty

(A1)-(A) 4 -(AN) (generation of frontal matrices at leaves of the elimination trees) (A2) 3 (merging contributions at father nodes). (E2) 3 (elimination of fully

Lecture 1: Frontal and multi-frontal solvers: orderings, elimination trees, refinement trees The lecture introduces the frontal and multi-frontal solver algorithms on the example of

COMPUTATIONAL COST ESTIMATES FOR PARALLEL SHARED MEMORY ISOGEOMETRIC MULTI-FRONTAL SOLVERS,. Computers and Mathematics with Applications, 67(10)

Several variations of the tree graph have been studied; see for instance the adjacency tree graph studied by Zhang and Chen in [5] and by Heinrich and Liu in [3], and the leaf

In order to describe the interconnection among agents with multi-dimensional states, we generalize the notion of a graph Laplacian by extending the adjacency weights (or

studied the functional neuroplasticity of inhibitory control and found that the left SMA-bilateral thalamic loop plays an important role in inhibitory control, suggesting that

A – female pattern hair loss: hair shaft thickness diversity, vellous hairs, yellow dots, 30% of follicular units with one hair and 10% of follicular units with three hairs (20×), b

Once they had ac- quired the task in the second scanning block the PD patients still identified fewer targets than age-matched controls, although both groups performed at the same