Maciej Paszynski
Department of Computer Science
AGH University of Science and Technology, Krakow, Poland maciej.paszynski@agh.edu.pl
http://www.ki.agh.edu.pl/en/research-groups/a2s
Frontal and multi-frontal solvers:
orderings, elimination trees,
refinement trees
Find temperature distribution such that
Finite difference disretization
Front matrix
Frontal matrix focuses on forward elimination of the first row
Front matrix
2nd row = 2nd row – 1/h2 * 1st row
Front matrix
2nd row = 2nd row – 1/h2 * 1st row
Front matrix
The first row has been eliminated
Frontral matrix focuses on the elimination of the second row
Front matrix
3rd row = 3nd row – [1/h2] / [-2/h2] * 2nd row
Front matrix
3rd row = 3nd row – [1/h2] / [-2/h2] * 2nd row
Construct multiple frontal matrices in such a way so they sum up to the full matrix
Variables must be split into parts
First all frontal matrices are constructed
First and second frontal matrices are sum up into new 3x3 frontal matrix
Its first and second rows are fully assembled +
+
First column is eliminated by using first row 2nd row = 2nd row – [1/h2] * 1st row
+
2nd row = 2nd row – [1/h2] * 1st row
Third and fourth frontal matrices are sum up into new 3x3 frontal matrix
Now only the second row is fully assembled
+ +
Change of the ordering
+ +
+ +
Eliminate entries below the diagonal
2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row
+ +
Why can I substract fully assembled 1st row from not fully assemled rows 2nd and 3rd ?
Eliminate entries below the diagonal
2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row
+ +
This is because substraction and addition are interchangable (now I am substructing from the 2nd not fully assembled row, and later I will add the remaining part)
Moreover the 1st row which is utilized for the substractions in other columns contains only zeros
Eliminate entries below the diagonal
2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row
+ +
Eliminate entries below the diagonal
2nd row = 2nd row – [1/h2] / [-2/h2] * 1st row 3rd row = 3rd row – [1/h2] / [-2/h2] * 1st row
Fifth and sixth frontal matrices are sum up into new 3x3 frontal matrix
Now the second and third rows are fully assembled
…
+ + +
+ +
+
Continue until root node
+ node
+ root
All frontal matrices are generated at the same time
Procesor 1 Procesor 2 Procesor 3 Procesor 4 Procesor 5 Procesor 6
Summing up and elimination are executed at the same time over different pairs of frontal matrices
+ + +
Procesor 1 Procesor 2 Procesor 3 Procesor 4 Procesor 5 Procesor 6
+ +
+
Procesor 1 Procesor 2 Procesor 3 Procesor 4 Procesor 5 Procesor 6 Summing up and elimination are executed at the same time
over different pairs of frontal matrices
+ +
+
The agorithm is recursively repeated until we reach the root of the tree
The algorithm results in upper trianguler matrix stored in distributed manner Computational complexity = height of the tree = logN (where N = #unknowns-1)
Procesor 1 Procesor 2 Procesor 3 Procesor 4 Procesor 5 Procesor 6 +
node
+ root
a ( x ) u ' ( x ) ' b ( x ) u ' ( x ) c ( x ) u ( x ) f ( x )
l
C
u
20 ,
Find such that
0 ) 0 ( u
( ) )
( ' )
( l u l u l
a
Strong formulation
Weak formulation
' ' ' ( ) ( ) ( )
0 0
l v dx
v f l
v l u dx
cuv v
bu v
au
l l
Find such that
u V
1 0 , : ( 0 ) 0
v V v H l v
u v l v
b ,
Finite element method disretization
e e l e j Nb
a i j j
N
i
i , 1,...,
1
Ni
i i
e a u
1
e
jv
1 , 5 . 0 0
5 . 0 , 0 2
1
1 for x
x for x x
e
1 , 5 . 0 0
5 . 0 , 0
1 2
x for
x for dx
x de
1 , 5 . 0 2
2
5 . 0 , 0 2
2 x for x
x for x x
e
1 , 5 . 0 2
5 . 0 , 0
2 2
x for
x for dx
x de
1 , 5 . 0 1
2
5 . 0 , 0 0
3 x for x
x x for
e
1 , 5 . 0 2
5 . 0 , 0
3 0
x for
x for dx
x de
Exemplary basis functions for [0,l] = [0,1], for two finite elements
, ' ' '
( ) ( )0
l e l e dx
e ce e
be e
ae e
e
b i j
l
j i j
i j
i j
i
( )0
l e dx
e f e
l j
l
j
j
3 2 1
3 2 1
3 3 3
2 3
1
2 3 2
2 2
1
1 3 1
2 1
1
, ,
,
, ,
,
, ,
,
e l
e l
e l
a a a
e e b e
e b e
e b
e e b e
e b e
e b
e e b e
e b e
e b
Ni
i i
e a u
1
e
jv
Exemplary basis functions for [0,l] = [0,1], for two finite elements
, ' ' '
( ) ( )0
l e l e dx
e ce e
be e
ae e
e
b i j
l
j i j
i j
i j
i
( )0
l e dx
e f e
l j
l
j
j
1
1 1
e K
1 2
2 2 1
K
e K 2
3 2
e K
,
1 ' ' '
1( ) 3( ) 00
3 1 3
1 3
1 3
1 e
ae e be e ce e dx e l e l eb
0
.5 0
2 1 2
1 2
1 1
1 1
0
2 1 2
1 2
1 2
1 1 1 1
1 1
1
) ( ) ( '
' '
, c dx
dx b d dx
d dx a d l
e l e dx
e ce e
be e
ae e
e
b K K K
K K
K
1 5 . 0
2 1 2
1 2
1 3
2 1
0
3 2 3
2 3
2 3
2 2 2 2
2 2
2
) ( ) ( '
' '
, c dx
dx b d dx
d dx a d l
e l e dx
e ce e
be e
ae e
e
b K K K
K K
K
3 2 1
3 2 1
3 3 3
2 3
1
2 3 2
2 2
1
1 3 1
2 1
1
, ,
,
, ,
,
e l
e l a
a e
e b e
e b e
e b
e e b e
e b e
e
b
Ni
i i
e a u
1
e
jv
Exemplary basis functions for [0,l] = [0,1], for two finite elements
, ' ' '
( ) ( )0
l e l e dx
e ce e
be e
ae e
e
b i j
l
j i j
i j
i j
i
( )0
l e dx
e f e
l j
l
j
j
1
1 1
e K
1
2 2
2 1
K
e K 2
3 2
e K
5 . 0
0
2 1 1
1 1
1 1
1 1
0
1 1 1
1 1 1 1
1 1 1
1 1
1
) ( ) ( '
' '
, c dx
dx bd dx
d dx ad l
e l e dx
e ce e
be e ae e
e
b K K
K K
K
1
5 . 0
2 1 1
1 1
1 5
. 0
0
2 2 2
2 2
2
2 2 1
0
2 2 2
2 2 2 2
2
2 2
2 2
2 1
1 1 1
1
) ( ) ( '
' ' ,
dx dx c
bd dx d dx ad dx
dx c bd dx d dx ad
l e l e dx e ce e
be e ae e
e b
K K
K K
K K
K K K
K
15 . 0
2 2 2
2 2
2 3
3 1
0
3 3 3
3 3
3 3
3 2 2
2 2
2
) ( ) ( '
' '
, c dx
dx b d dx
d dx a d l
e l e dx
e ce e
be e
ae e
e
b K K
K K
K
2 2 1
1
2 2 2
2
2 2 2
2 1
1 1
1
1 1 1
1
2 1 2
1
3 2 1
2 2 2
1
1 2 1
1 2
2 2
1
1 2 1
1
, ,
0
, ,
, ,
0 ,
,
K K K
K
K K K
K
K K K
K K
K K
K
K K K
K
l l l
l a
a a b
b
b b
b b
b b
K K K
K K
K K
K
K K K
K
l l x
x b
b
b b
2 1 2
1 2
2 2
1
1 2 1
1
, ,
, ,
3 2 1
3 2 1
3 3 3
2 3
1
2 3 2
2 2
1
1 3 1
2 1
1
, ,
,
, ,
,
, ,
,
e l
e l
e l
a a a
e e b e
e b e
e b
e e b e
e b e
e b
e e b e
e b e
e b
Notice that when we switch from finite difference to finite elements, it only changes the local systems of equations at tree nodes
,
a ' ' b ' c
dx (l) (l)b iK iK
K
K j K i K
j K i K
j K i K
j K
i
e f dx (l)l Kj
K
K j
j
1 2
2 2 1
K
e K
Global basis functions are composed with local shape functions, e.g.
K K K
K K
K K
K
K K
K K
l l x
x b
b
b b
2 1 2
1 2
2 2
1
1 2
1 1
, ,
, ,
Local system of equations generated over the element K
,
a ' ' b ' c
dx (l) (l)b iK iK
K
K j K i K
j K i K
j K i K
j K
i
e f dx (l)l Kj
K
K j
j
K K K
K K K K K
K K K K
l l x x b
b
b b
2 1 2
1 2 2 2 1
1 2 1
1
, ,
, ,
K K K
K K K K
K
K K K
K
l l x x b
b
b b
2 1 2
1 2 2 2
1
1 2 1
1
, ,
, ,
K K K
K K K K
K
K K K
K
l l x
x b
b
b b
2 1 2
1 2 2 2
1
1 2 1
1
, ,
, ,
K K K
K K K K K
K K K K
l l x x b
b
b b
2 1 2
1 2 2 2 1
1 2 1 1
, ,
, ,
K K K
K K K K K
K K K K
l l x x b
b
b b
2 1 2
1 2 2 2 1
1 2 1
1
, ,
, ,
K K K
K K K K
K
K K K
K
l l x x b
b
b b
2 1 2
1 2 2 2
1
1 2 1
1
, ,
, ,
Notice that when we switch from finite difference to finite elements, it only changes the local systems of equations at tree nodes
1
e
hpe
hp3e
hp57
e
hpe
hp98
e
hp
1 i
i hp i
hp
e u
u
We seek the solution u of some weak form of PDE as a linearcombination of shape functions spread over finite element mesh
e
hpi4
e
hp 2e
hp1
ehp ehp3 ehp5
2
ehp ehp4
7
ehp ehp9
8
ehp
, 1 ,..., 15
15
1
j e
l e
e b
u
hpji
j hp i
hp i
hp
The coefficients
(called „degrees of freedom” d.o.f.) are obtained by solving
system of linear equations – finite element disecretization
of a weak (variational) form of PDE
i
u
hp e
hpie
hpj
b , l e
hpjwhere and
are some integrals of shape functions
j hp i
hp
e e ,
e hp i e hp j
b ,
Generates frontal matrix for the first element, eliminates fully assembled degrees of freedom
SOLUTION BASED ON LINEAR ORDER OF ELEMENTS
Partial
forward elimination O(6*9^2)
Generates frontal matrix for the second element, merges with the current frontal matrix
eliminates fully assembled degrees of freedom
SOLUTION BASED ON LINEAR ORDER OF ELEMENTS
Full
forward elimination
SOLUTION BASED ON THE ELIMINATION TREE
Partial
forward elimination
Partial
forward elimination
SOLUTION BASED ON THE ELIMINATION TREE
Full forward elimination
of the interface problem matrix
SOLUTION BASED ON THE ELIMINATION TREE
Number of operations for partial forward elimination
BASED ON THE HISTORY OF MESH REFINEMENTS
For any initial mesh, the elimination tree can be created based on nested dissection algorithm.
BASED ON THE HISTORY OF MESH REFINEMENTS
The elimination tree created for the initial mesh is updated when the mesh is refined (elimination tree is constructed dynamically, during mesh refinements)
BASED ON THE HISTORY OF MESH REFINEMENTS
The elimination tree created for the initial mesh is updated when the mesh is refined (elimination tree is constructed dynamically, during mesh refinements)
• Local matrices for active elements – leaves of the elimination tree – are created
Level of
initial mesh elements
Level of
refinement trees
• Interior degrees of freedom are eliminated at every leaf
Level of
initial mesh elements
Level of
refinement trees
• Schur complement contributions are merged at parent level
Level of
initial mesh elements
Level of
refinement trees
• Fully assembled degrees of freedom are eliminated at parent nodes level (degrees of freedom related to common edges shared by both son elements can be eliminated now)
Level of
initial mesh elements
Level of
refinement trees
• Contributions to Schur complement are merged again at the next level
Level of
initial mesh elements
Level of
refinement trees
• Fully assembled degrees of freedom are eliminated
Level of
initial mesh elements
Level of
refinement trees
• Finally, Schur complement contributions are merged at the tree root node
Level of
initial mesh elements
Level of
refinement trees
• The common interface problem is solved at the tree root node
(The size of the common interface problem corresponds to the size of crossection of the entire domain)
Level of
initial mesh elements
Level of
refinement trees
Level of
initial mesh elements
Level of
refinement trees
• The solution obtained at root node is utilized at son nodes.
• The backward substitution is executed
Level of
initial mesh elements
Level of
refinement trees
• The solution utilized at the second level nodes is utilizes at their sone nodes.
• The backward substitution is executed
Level of
initial mesh elements
Level of
refinement trees
• The solution obtained at the third level nodes is utilized at leaf nodes.
• The backward substitution is executed on the level of leaf nodes
MEMORY MULTI-FRONTAL PARALLEL SOLVER
- Embeded into hp code
+ Outperforms MUMPS for large number of processors - Slower than MUMPS for low number of processors
Profiling showed that the time consuming part for low number of processors was actually process of merging of matrices –
performed on the level of unknowns, with moving of matrix entries.
Switch to the level of nodes, do not touch matrix entries –work with pointers 1,482,570 degrees of freedom (68,826,475 non-zeros)
http://home.agh.edu.pl/~paszynsk/Publications.html
Maciej Paszyński, David Pardo, Carlos Torres-Verdin, Leszek Demkowicz, Victor Calo
A PARALLEL DIRECT SOLVER FOR SELF-ADAPTIVE HP FINITE ELEMENT METHOD
Journal of Parallel and Distributed Computing, 70, 3 (2013) 270-281
Anna Paszynska, Maciej Paszynski, Konrad Jopek, Maciej Wozniak, Damian Goik, Piotr Gurgul, Hassan AbouEisha, Mikhail Moshkov, Victor Calo, Andrew Lenharth, Donald Nguyen, Keshav Pingali
QUASI-OPTIMAL ELIMINATION TREES FOR 2D GRIDS WITH SINGULARITIES
Scientiffic Programming, (2015) Article ID 303024, 1-18
Maciej Paszynski, David Pardo, Anna Paszynska, Leszek Demkowicz OUT-OF-CORE MULTI-FRONTAL SOLVER FOR MULTI-
PHYSICS HP ADAPTIVE PROBLEMS
Procedia Computer Science, 4 (2011) 1788-1797