• Nie Znaleziono Wyników

Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly flows

N/A
N/A
Protected

Academic year: 2021

Share "Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly flows"

Copied!
30
0
0

Pełen tekst

(1)

Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly

flows

Kamath, S.; Masterov, M. V.; Padding, J. T.; Buist, K. A.; Baltussen, M. W.; Kuipers, J. A.M.

DOI

10.1016/j.jcpx.2020.100058

Publication date

2020

Document Version

Final published version

Published in

Journal of Computational Physics: X

Citation (APA)

Kamath, S., Masterov, M. V., Padding, J. T., Buist, K. A., Baltussen, M. W., & Kuipers, J. A. M. (2020).

Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly flows. Journal of

Computational Physics: X, 8, [100058]. https://doi.org/10.1016/j.jcpx.2020.100058

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

J.A.M. Kuipers

a

aMultiphaseReactorsGroup,DepartmentofChemicalEngineering&Chemistry,EindhovenUniversityofTechnology,P.O.Box513,5600MB

Eindhoven,theNetherlands

bProcessandEnergyDepartment,DelftUniversityofTechnology,Leeghwaterstraat39,2628CBDelft,theNetherlands

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory:

Received3October2019

Receivedinrevisedform2April2020 Accepted18April2020

Availableonline4May2020

Keywords:

Parallelization Bubblecolumns Euler-Lagrangemodelling DirectSimulationMonteCarlo

Aparalleland scalablestochasticDirectSimulationMonteCarlo(DSMC)methodapplied tolarge-scaledensebubblyflowsisreportedinthispaper. TheDSMCmethodisapplied to speedup thebubble-bubblecollisionhandling relativeto theDiscrete Bubble Model proposed by Darmana et al. (2006) [1]. The DSMC algorithm has been modified and extended to account for bubble-bubble interactions arising due to uncorrelated and correlated bubble velocities. The algorithm isfully coupled withan in-house CFDcode andparallelizedusingtheMPIframework.Themodelisverifiedandvalidatedonmultiple coreswithdifferenttestcases,rangingfromimpingingparticlestreamstolaboratory-scale bubblecolumns.Theparallelperformanceisshownusingtwodifferentlargescalesystems: withanuniformandanon-uniformdistributionofbubbles.Thehydrodynamicsofa pilot-scale bubblecolumn is analyzedand the effectofthe column scaleis reportedviathe comparisonofbubblecolumnsatthreedifferentscales.

©2020TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Bubbly flowsare one ofthemorecomplex multi-scalemultiphaseflow problemswhich areencountered acrossmany fieldsof physicsand engineering.Small scalephysical changesin thepropertiesof thedifferentphases canhavea huge impact onthe largescale physics ofsystems.Such flows are studiedatthe DNS(DirectNumerical Simulations)level for representative systems to derive closures for interactions forces used in simulations executed at larger length and time scales. Methods such as the Discrete Bubble Model (DBM) and also the stochastic Euler-Lagrange (E-L) models (among whichDSMC methods)are examplesofclasses ofmethodswhere theseclosuresare applied.Thesemodels (DBM/DSMC) havebeenextensivelyused acrossmanylength andtimescales forthesimulation ofmulti-phaseflows andhaveproven toprovidegoodresultsonrelativelycoarsegridscomparedtofullyresolvedDNS, whileremainingmoreaccuratethanthe furthercoarsegrainedEuler-Euler(E-E)models.

However, duetothe largeamountof individuallytracked objectsinlarge-scale systems,the applicationofLagrangian Discrete Methods(LDM) inmulti-phase flows is complicateddueto the highcomputational effort andmemory require-ments. Thelimitationscan beovercome witheffectiveparallelizationofthenumerical code,whichcanbe realizedusing

*

Correspondingauthors.

E-mailaddresses:j.t.padding@tudelft.nl(J.T. Padding),k.a.buist@tue.nl(K.A. Buist).

1 AuthorsS.KamathandM.V.Masterovhavecontributedequallytothiswork.

https://doi.org/10.1016/j.jcpx.2020.100058

2590-0552/©2020TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).

(3)

Table 1

ListofrecentpublicationsonparallelimplementationsofLagrangianmethods.

Name Type CFD grid size #Lagr. elements Parallelization technique #Cores Efficiency Ouro et al. [2] CFD+IBM 52.59·106 3.57·105 MPI+OpenMP 314 ideal

Rakotonirinaa et al. [3] CFD+DEM - 2.304·108 MPI 768 90%

Pozzetti et al. [4] CFD+DEM 106 107 MPI 280 70%

Tian et al. [5] DEM - 1.28·108 GPU/MPI 256/128 62.59%/20%

Spandan et al. [6] CFD+IBM 2·109 2.5·104 MPI 1000 78%

Wang et al. [7] CFD+DEM 3·105 2·105 MPI 64 57.5%

Geneva et al. [8] CFD+IBM 1.6·107 7% vol. frac. MPI 580 ideal

Loisy et al. [9] CFD+DEM 1.68·107 8 MPI 512 90%

Liu et al. [10] CFD+DEM 1.25·105 104 MPI 1000 50%

Yue et al. [11] DEM 2·104 8 GPU 2880 x20 speedup

Mountrakis et al. [12] LBM+IBM 1.342·108 6.624·105 MPI 8192 79%

Ma et al. [13] DBM 6.5·104 1.7·105 OpenMP 12 99%

Goniva et al. [14] CFD+DEM 2.05·105 2.05·107 MPI 512 >100%

Berger et al. [15] DEM - 5·104 MPI+OpenMP 128 50%

Wu et al. [16] CFD+DEM 1.28·105 2·106 MPI+OpenMP 128 60%/20%

Niethammer et al. [17] MD - 1012 MPI+OpenMP 65536 82.5%

Liu et al. [18] CFD+DEM 1.6·106 5

.12·106 MPI+OpenMP 256/128 37.5%

Amritkar et al. [19] CFD+DEM - 1.05·106 MPI+OpenMP 64

-Kuan et al. [20] CFD+FT 6.4·104 - MPI 128 20%

Kafui et al. [21] CFD+DEM - 5·104 MPI 64 58%

Darmana et al. [1] CFD+DBM 6.5·106 105 MPI 32 50%

Maknickas et al. [22] DEM - 105 MPI 16 70%

Pohl et al. [23] LBM 1.078·109 - MPI 512 75%

standaloneandcombinedmodelsfordistributedandsharedmemoryplatforms,suchas:MessagePassingInterface(MPI), OpenMP and programming on Graphics Processing Units (GPU). For an overview of the recent parallelization strategies appliedtoLDMmethodsseeTable1.

ThemaindrawbackofMPI-basedcodesisinthenecessityofkeepinganoptimalloadbalancebetweendistributed pro-cesses, which isoften hard orevenimpossible toreach ina real casescenario.Forinstance, inthe Euler-Lagrange(E-L) framework,anon-evendistributionofthedispersedphasemayleadtoreductionoftheparallelperformanceorevenidling ofsomeprocessesduringexecutionoftheLagrangianpart.Thisproblemismainlycausedbythestaticdomain decomposi-tion basedontheEuleriangrid.Theobvioussolutionistouseaseparate dynamicdecompositionfortheLagrangianpart. The couplingbetweentwo numericaldomains can be achievedby meansof amap, that should bereconstructed during the update ofthe Lagrangian domaindecomposition.This approachwas successfully appliedby Pozzetti etal.[24], who reportedanexcellentparallelefficiencyoftheE-Lframeworkupto1400cores.However,inauniformlydistributedsystem, thisapproachmayalsoleadtoextracommunicationsand,therefore,toadecreaseinparallelefficiency.

ThecomputationalandparallelefficiencyofLDMshighlydependsonthetypeofcollisiondetectionmethodemployedin thealgorithm.Particlebasedsimulations,whichoftenutilizeasoft-sphereoraforcebasedcollisionmodel,employasingle loop-based algorithm,thusleading tooptimalparallel performanceonanyplatform,asshownby Niethammeretal.[17], Pohletal.[23] andTianetal.[5],whoperformedsimulationofsystemscontaining1012,109and108 particles,respectively. MostoftheseworksarebasedonthedecompositionofonlyLagrangianentitiesinthedomain,whichcaneasilyincorporate load-balancingwiththelateststateoftheartalgorithms.

Historically, bubble-based simulations have oftenemployed a hard-sphere model,which is an event-drivenalgorithm that requirescalculationoftheshortestcollisiontimebetweenallpossiblepairsofbubbles.Theefficientparallelizationof theDBMmodelondistributedmemoryplatformsisimpossibleduetothenecessityofcontinuoussynchronizationbetween allprocesses,whilesharedmemoryplatformslimittheamountofbubblesthatcanbesimulated.Severalrecentworksalso reported theuseofsoft spheremodels forbubblyflow simulations.However, thesesoftspheremodels generallyrequire determination ofa springconstant whichdefinesthe amountofoverlapthat is allowed.Thisis challenging sincea high springconstant leadstoverysmalltime steps,whileatoolowspringconstantleads tosignificantover-packing.Toavoid spuriousvelocitiesindensesystemsmodelledwithasoft-spheremodel,veryhighspringconstantsand,therefore,extremely low time-stepsare required to resolve thecollisions sufficientlyaccurate. Therefore, the numberof worksconducted on parallelizationofbubblyflowsatthisscaleisverylimited.

Darmana etal.[25] simulated

O(

105

)

bubblesusinga mirrordomaintechnique fortheDiscreteBubble Model(DBM)

combinedwithan MPI-basedCFDcode.Theseauthorsreportedonly50%ofparallel efficiencyon 32cores.Ma etal.[13] simulateda bubblecolumnwith6

.

5

·

104 bubblesusingaDBMcodeparallelized withOpenMP andreported99%parallel efficiency on12threads. Sungkornetal. [26] havereporteda 40m3 stirred bio-reactor withabout15.6million bubbles usingaLBMbasedflow-solverandthebubblephasebeingsolvedonGPUs.Thisapproachusedaghost-particletechnique developed by Sommerfeldet al. [27].However, the volumedisplacement ofthe bubblesis notaccountedfor intheflow solvervia thevolumefractiontermandisonlycoupledthroughthevolumetricsourceterminthemomentumequations. Recently,Kamathetal.[28] havereportedaDSMCbasedstochasticEuler-Lagrangemethodwhichaccountsforthevolume

(4)

FV M= −CV MρlVb Dbv DbtDbu Dbt CV M=0.5 [25] FW=CWRbρlVb 1 D2 bw |uv|2n W CW= ⎧ ⎪ ⎨ ⎪ ⎩ exp(−0.933Eo¨+0.179), 1≤Eo¨d<5 0.0007Eo¨+0.04, 5<Eo¨d≥33 0.179, Eo¨d>33 [29,30]

displacementofthebubblesintheEulerianphaseandalsohasthepotentialtobeparallellizedforlargescalebubblyflow simulations.

Thispaperpresents anefficient highly scalableparallel approachthat allows oneto simulatedense bubblyflows ina LagrangianformulationusingtheMPIframework.Thebubblecollisionsareresolvedby meansofDirectSimulationMonte Carlo(DSMC),proposedearlierbyKamathetal.[28].Totheauthors’knowledge,thisisthefirsttimeaDSMCbasedmethod isusedtosimulatedensebubblyflowsatthisscale.TheDSMCoperateswithacollisionprobabilitythatcanbecalculated locallyandthusthemethoddoesnotrequireglobalsynchronizationsanddemonstrateshighparallelefficiency.Asaresult, the fullycoupled CFDandDSMC code canbe applied to simulations oflarge scale bubblecolumns withmorethan 106 bubbles.

Thepaperisstructuredasfollows.Section2describestheDSMCmethodappliedtobubblesandthegoverningequations fortheliquidphase.Insection3,thenumericalapproachforbothphasesandtheircouplingisdiscussed.Theparallelization strategyforthegasandliquidphasesisdescribedinsection4.Insection 5,theverificationandvalidationoftheparallel code is reported.Section 6 is dedicated to the investigation ofthe parallel performance. Insection 7.1an example ofa simulationofalargescalebubblecolumnisshown.Finally,theconclusionsarepresentedinsection8.

2. Methodology 2.1. Discretephase

The discrete phase or bubble phase is solved in a Lagrangian framework where the bubble motion is governed by Newton’sequationofmotion(equation(1)).Theinteractionforcesincludedinthismodelarethegravitational(FG),pressure

(

FP

)

,drag

(

FD

)

,lift

(

FL

)

,virtualmass

(

FV M

)

andwalllubrication

(

FW

)

force (equation(2)).Therelevantclosuresusedfor

theseforcescanbefoundinTable2.

ρ

bVb dv dt

= 

F

− (

ρ

b d

(

Vb

)

dt

)

v (1)



F

=

FG

+

FP

+

FD

+

FL

+

FV M

+

FW (2)

The DSMC methodsgenerallyemploy a parcelapproach wherea group ofdiscrete particles ofthesame size and ve-locity isrepresented by one simulatedparticle. The resolution ofthe DSMC method is also governed by the numberof discretephaseentitiesperparcel.Inthecurrentwork,theparcelsizeforallsimulationsistakenas1,toobtainanaccurate representationof the Lagrangianphase. Moreover, the extension tolarger parcel sizes requiresclosures determined with carefulconsiderationofthedifferentforcesactingonthebubblesasaswarmandmappingfunctionsforthecalculationof theporosity.Thisiscausedbythenon-lineartrajectoryofthebubblesduetocouplingwiththeliquid.Theparcelgrowth cannotbeassumedtobeisotropic.Moreover,parcelsofsimilarsizeshouldbeabletobreak-upandcoalescebasedonthe fluid-parcelinteractionsandtheinducedturbulenceintheliquid.Thisbehaviour iscomplexduetothefourwaycoupling andthisaspectwillbeconsideredinourfuturework.

ThemaingoaloftheDSMCmethodistopredictthefrequencyandprobability ofcollisions ofparticles.Thisstochastic approach ismuch more efficientthan deterministic collision models.The method was initially developedandapplied to molecularsystems thathavefinite/largeKnudsen numberwithno externalforce fieldsaffectingthepositionandvelocity

(5)

of the parcels [32]. The modified Nanbu method is a DSMC type method used by Pawar et al. [33] to simulate small droplets inspraymodellingwithafew alterations.Twomodificationsaremadetothealgorithmreportedby Pawaretal. [33] byKamathetal.[28].Firstofall,theradialdistributionfunctionatcontactisaddedtoaccountfortheincreaseinthe collisionfrequencyandtheprobabilityofcollisionsfordenseclusters.Secondly,anearestneighbourcollisioncheckisadded accountingfordiscretephase entitieswithcorrelatedvelocitieswhicharenot accountedforby thepreviouslymentioned algorithms.

2.2. Liquidphasehydrodynamics

Theliquiddynamicsareresolvedbymeansofthevolume-averagedNavier-Stokesequations:

ε

l

t



+

σ

ε

lu

·

dA

=

0 (3)

ρ

l

ε

lu

t



+

ρ

l

σ

ε

luu

·

dA

= −

ε

l

p



σ

ε

l

τ

l

·

dA

+

ε

l

ρ

lg



+ 

(4)

where



isthevolumeofthegridcell, A isthefaceareaofthegridcell,

ε

l isthelocalliquidvolumefraction,

represents

anexternalsourcetermduetothepresenceofthegasphaseand

τ

l istheviscousstresstensorgivenby:

τ

l

= −

μ

e f f,l



(

u

)

+ (∇

u

)

T

2 3I

(

∇ ·

u

)



(5) where

μ

e f f,l

=

μ

L,l

+

μ

T,l (6)

The

μ

L,l and

μ

T,l arethedynamic molecular andeddy viscosity,respectively.Toclosethe systemwe usethesubgrid

scalemodelproposedbyVremanetal.[34]:

μ

T,l

=

ρ

l2

.

5C2s

β

b

α

i j

α

i j

,

α

i j

=

uj

xi

β

b

= β

11

β

22

− β

122

+ β

11

β

33

− β

132

+ β

22

β

33

− β

232

, β

i j

= 

2/3

α

mi

α

mj (7)

whereCsistheSmagorinskyconstantequalto0.1.

3. Numericalsolutionmethod 3.1. Timemarching

The numerical solution of the governing equations for our Euler-Lagrange model is obtained using a time-marching techniquesimilartotheonereportedinDarmanaetal.[25] wheretheequationsforthediscreteandcontinuousphaseare solvedsequentially,whileaccountingforthecouplingviathemomentumsourceterms.Threetimestepsaredefinedinthis model:theflowtime-step(

δ

tf low),theLagrangianphasetimestep(

δ

tbub)andthecollisiontime-step(

δ

tcoll)suchthat:

δ

tf low

=

N

i

δ

tbub

, δ

tbub

=



δ

tcoll (8)

where

N

iisfixedforagivensimulationsuchthatthetimeintegrationofallinteractionforcesactingonagivenbubbleis

carriedout inan accuratemanner.Inthiswork

N

i

=

20 tomaintainsufficientaccuracyforboththeforceintegrationand

also theDSMC collisions.

δ

tcoll is calculatedonthefly bythe DSMCalgorithmfromthe estimatedcollisionfrequencyfor

each discrete phase entity,asreportedby Kamathetal. [28].This isexplainedin moredetail inthecoming sections.In DBM,

δ

tcolliscalculatedasthetime-steptoreachthenextphysicalcollision,withwhichthewholesystemisupdated.

3.2. Coupling

DSMCalgorithms havecomefarintermsofdescribingthehydrodynamicsandheattransfer,whenimplementedatthe molecular level orin applications where particle-particleor droplet-droplet collisions are the main physicalphenomena occurringinthesystemofinterest[32,35,36].Thecontinuousphaseiseitherabsent,weaklycoupledorcoupledonlyusing momentumbasedsourceterms,whichresultsinacompletelycollisiondrivensystem.Thisisnotwhatweencounterinthe currentsystemofbubblyflows.Apartfromtherightcollisionfrequency,thephasefractionneeds tobeevaluated forthe discretizedvolume-averagedNavier-Stokesequations(seeequation(3) and(4)).

(6)

v

n+1 vn dv

=

tnt bub

tn



Fn mb dt (9)

The force integrationhasbeentestedwitha fourthorderexplicit Runge-Kuttaschemeanda firstorderexplicit Euler scheme.The first orderschemeis morecomputationally efficient,when thetime-step constraintisdue tothe collisions. Both schemeswere checkedon asingle bubbleriseproblem(seesection 5.2) formanytime-stepsuntil theterminalrise velocityisestablished.

The bubblesare moved intime using thetime-marching technique described insection 3.1.During each bubbletime step,theforces(using equation (9))arecalculated atthediscretebubblelocationsusingthe quantitiesmappedfromthe Euleriangridcellsusingthepolynomialfilterdescribedinsection3.2.DuetoNewton’sthirdlawofmotion,areactionforce iscollected tobemappedontothemomentum nodesintheEulerianframework,tobe appliedattheendofthebubble time-step.Thecollisionsequenceistheninitiated accordingtotheAlgorithm3.1.Thisisrepeateduntilthediscretephase hasmovedforafull

δ

tf low.Theforcescollectedinthevolumetricsourcetermaretime-averagedovertheflowtime-step

δ

tf low,sincethey arecalculatedmultipletimesinoneflow time-step.Theweights fortheaveraging arecalculatedbased

ontheratio

δ

tbub

tf low toachievethecorrectmomentumbalance.Withthenewbubblepositions,thevolumefractionin

eachEuleriancelliscalculatedusingthesamepolynomialmappingfunction.

Inanyparticlebasedtechnique,thetwomajorstepsaretheforce calculationstepandthecollisiondetectionstep.Ina hard-spherecollisionmodel,amongwhichtheDSMC,theforce calculationandthecollisiondetectionloopsareseparated. Typicallythemostexpensivestepisthecollisiondetectionstep,especiallywhendonetriviallyleadingtoatimescalingof theorderof

O(

N2

)

where N isthetotalnumberofdiscretephaseentitiesinthesystem. Theapplicationofcelllists and neighbourlistsreducethenumberofcomparisonsconsiderablyandreducethetimescalingto

O(

pN

)

wherethefactorp dependsonthesizeofthesearchingscopeorthecut-offdistance(rcut).Eventhen,populatingthesedatastructures takes

considerableexecutiontime comparedtotheremainingpartofthealgorithm.AVerletlistfurtherreducesthenumberof timestheabovementioneddatastructures needtobe populatedorrefresheddepending ontheshelldistance Rshell.This

hasbeenimplementedandisbeingutilizedinthisworkforevaluatingtheneighboursinthesearchingscopesofdifferent bubbles.

The modified DSMC collision detection step is detailed in algorithm 3.1. The main additions relative to the already existingalgorithmproposedbyPawar etal. [33] are:a)anexplicittreatmentofnearestneighbourcollisionstoaccountfor correlatedanduncorrelateddiscretephasevelocities,andb)incorporationoftheradialdistributionfunction gi j toaccount

forthe increase incollision frequencyindense or clusteredarrangementsof particlesin 3D space[40]. Amore detailed explanation can be found in the work of Kamath etal. [28]. A speed up in the DSMC algorithm over the deterministic DBMfromDarmanaetal. [25] hasbeenreportedbyKamathetal. [28] withminimallossofaccuracyfordifferentcollision regimes.Moreover,thereasonforthespeed-upisstatedclearlyandtheserialcomputationaltimesforaDEMtime-stepfor variednumberofbubblesandvoidfractionshavealsobeenreported.Inthiswork,anMPIparallelizedversionofthesame algorithmispresented.

The term gi j is given by Ma and Ahmadi [41] for a mono-disperse system (see equation (10)). This is extended to

poly-dispersesystemsbySantosetal. [42] (seeequation(11)). gii

=

(

1

+

2

.

5

ε

p

+

4

.

5904

ε

2p

+

4

.

515439

ε

3p

)

1

ε

p

ε

max

3

0.67802 (10) gi j

=

1 1

ε

p

+



g

(

r

)

contact,ii

+

1 1

ε

p



didj

¯

ddi j (11)

(7)

Algorithm3.1DSMCalgorithm.

1: for eachparticleidi do

2: Chooseabubble/particleidiX suchthat X= {n:n is a positive integer and nNtot}whereNtot representstotalnumberofparticlesinthe

domain/sub-domain. 3: while t< δtbubdo

4: Calculatethecollisionfrequency fifori basedonequation

fi=



jRs,i |vi j| π 4(d 2 i+d2j) nj 4 3πR3s,i gi j

wherevi jistherelativevelocitybetweenparticles/bubblesi andj,d isthediameter,nj istheparcelsize,Rs,i isthesearchingscopesizefor

particle/bubblei andgi jistheradialdistributionfunctionatcontactfordiscreteentityi withparticletype j.

5: λi=|vfii| 6: tp,i=min  λi 3|vi|,δtbub− tcompl 

wheretp,i→discretephasetimestep,λi→meanfreepath,tcompl→completedtimewithinthetimeframeofδtbub

7: Rne w

s,i =max(|vi|tp,i,|vi j|maxtp,i)

8: Choosearandomχ= {x∈ R|0≤x≤1}

9: j=int[χNi]+1 whereNi→numberofneighboursinRs,i

10: Pi j= |vivj|π4(d 2 i+d 2 j) njtp,i 4 3πR3s,i gi j 11: ifχ>Nj iPi j&(vivj)· (rirj)<0 then

12: Collidediscretephaseentitiesi andj

13: else if Stkbub= τbub

τl

<1 then 14: Checkfornearestneighbourcollision 15: else

16: nocollisions 17: end if

18: Updaterianddri

19: if|dri|>Rshellthen

20: neighbourlistrebuild=true 21: else

22: neighbourlistrebuild=false 23: end if

24: t=t+ tp,i

25: end while

26: end for

whered represents

¯

theSautermeandiameteroftheparticleswithinthesearchingscope,and

ε

pistheaveragesolids/bubble

volumefractionintheneighbourhood ofparticlei. 3.3.1. Fluidphasenumericalscheme

Allnumericalsimulations fromthepresentworkareperformedusingFoxBerry - anin-housedevelopedframeworkfor solving unifiedtransportequations.Thediscretizationofthegoverningequations(3) and(4) isdone onastructured grid withstaggeredarrangementofvariables2 usingtheFiniteVolumemethod.TheSIMPLEalgorithmisusedforthe pressure-velocitycoupling,asdescribedbyPatankar[43].AllconvectivefluxesarediscretizedwiththesecondorderBartonscheme [44] andalldiffusivefluxesarediscretizedwiththesecondordercentraldifferencescheme.Tokeepthenumericalstencil compact, adeferredcorrection methodis appliedtoconvective fluxes, resultingina 7-diagonalmatrixina 3D case.The time integration iscarried out usingthe first order backward Euler method.All discretized terms are treated implicitly, exceptsourcetermsoriginatingfromthediscretephase,whichareconsideredexplicitly.

Alllinearsystemsaresolvedbymeansofanin-houseBiCGStab(2)method[45] andthesolutionofthePressurePoisson Equation(PPE)isadditionallyacceleratedwiththemultilevelMLsolver[46] fromtheTrilinoslibrary[47].

4. Parallelizationstrategy 4.1. Domaindecomposition

To obtain the optimalparallel performance, the developedcode utilizes a full 3D decomposition insteadof the often usedpencilorstripedecompositions.Thisallowsforreductionofthesub-domaininterface

A

3 proportionallytoitsvolume

V

, which helpsto decrease the communication messagesize betweensub-domains and, thus, the communicationtime.

2 Discretizedscalarfieldsarestoredatthecenterofacontrolvolume,whilecomponentsofvectorfieldsarestoredatfacecenters.

3 Tosimplifyfurtherexplanationanddiscussion,theinternalsub-domaincellsarereferredtoasthecomputationalvolumeV,whereasthecellsadjacent

(8)

Fig. 1. An example of a full 3D decomposition for various number of processes (p). Sub-domains are coloured randomly for better visualization.

Fig. 2. Distributionofon-edgemomentumcellsbetweenadjacentprocesses1and2.Redpointsrepresentacopyofbluepointswhichbelongtothe

2process.

An exampleofthe cubic domaindecomposedwithdifferent numberofprocesses isshownin Fig. 1. The decomposition algorithm(AlgorithmA.1)isshowninAppendixA.

4.2. Parallelizationstrategyfluidsolver

Due tothe staggered arrangement ofvariables, two adjacentsub-domains



1 and



2 shareone layer ofmomentum

cells(seeFig.2),whichresultsintheduplicationofthematrixrowsduringtheassemblystepofthemomentummatrix.To preservetheinitialsizeofthedistributedmatrixandtoreducethepotentialincreaseofthesolutiontimeandthenumber ofiterationsoflinearsolvers,theruleofthe“positiveowner”isapplied.Thus,allsharedmomentumpointsonthepositive (right)sideofthe



1 sub-domainareconsideredasanextralayerofhalocells,whiletheirphysicalrepresentationbelongs

tothe



2sub-domain.

Theapplicationofthe“positiveowner”ruleresultsinanextracommunicationstepduringtheassemblyofthePressure PoissonEquation(PPE),causedbythedirectdependenceofcoefficientsinthePPEmatrixonthecentralcoefficientsofthe momentummatrices[43].

Tosimplifytheaccessto thisdata,every sub-domainperformsa searchinitsneighbouring sub-domainstodetermine theownershipoftherequiredelements.Thisoperationisperformedonceanditsresultisstoredintheformofahashtable, whichmapsthelocallymissingandremotelydetectedelementswiththeinformationoncorrespondingremoteprocessID. TheconstructedhashtableisusedateveryiterationoftheSIMPLEalgorithmtoobtaintheoff-processdataandtopopulate thelocalvectorsofthediagonalelementsfromthemomentummatrices.

4.3. ParallelizationstrategyDSMC

Parallelization strategy fortheDSMC partfollowsthesame domaindecompositiontechnique asthe fluidsolver men-tionedabove.Generallyparticlebasedsimulationsfollowtheirowndomaindecompositionasdescribed inAbrahametal. [48]. Sucha decompositioncanbe uniformornon-uniform,basedon thedynamicsofthe problemitself.Darmana etal. [25] havereportedamirrordomaintechnique todecomposethedomain.Theadvantageofsuch amethodliesinits load balancing,duetoan evendistributionofbubblesacrossall domains.Theparallelefficiencyofsuch atechniquelastsuntil a maximumwithin 50 processors, asshowninthe same work[25]. Asthe systembecomeslarger, the amountof

(9)

com-Algorithm4.1Bubblephasecalculationstepsinasub-domain

P

.

1: for n<δtf low

δtbub do

2: Settbub=0.

3: Calculateinterfaceforcesandvelocityofbubbles∀iNtot

4: Bubble/particleremovalfromthesub-domains/domain 5: Bubble/particletransport

6: Bubble/particleintroductionintothesub-domains/domain

7: if neighbourlistrebuild=true then seeoperation19inAlgorithm3.1

8: Refreshcell-linkedlistsandneighbourlists∀iNtot

9: else

10: Donotupdatecelllinkedlistsandneighbourlists 11: end if

12: Transferrealbubbledatatoneighbouringhalocells 13: ExecuteAlgorithm3.1∀iNtot+Nghost

14: CalculatevolumeofbubblesindifferentcellsandmapthemtotheEuleriansub-domain.

15: Applydatareductiontohalocellsofcurrentsub-domaintorealcellsofneighbouringsub-domainforthebubblevolumesandthemomentum sourceterms.

16: end for

munications forsuchatechnique willgetlargerduetotimelysynchronizationsamongprocessesandassuchtheparallel efficiencydropsquitequickly.

Anotherimportantdisadvantageofthemirrordomaintechniqueisthememoryrequirement.Thisisespeciallyapparent whentheproblemsizesgoupsuchthatthereare106

107bubblesinthesystemorevenmore.Everyprocessshouldhold enoughmemorytomaintainatleast7doublesforpositionandvelocityvectorsandbubblesize,aswellasmemoryforthe complexdatastructureswhichincludethecelllinkedlistsandneighbourlists.

4.4. Algorithm

Theoveralldiscrete phasealgorithmisdescribed inacombinationofAlgorithms4.1and3.1.TheDSMCalgorithmuses the domaindecompositionof theEulerianpartitself toavoidextra overheadduetocommunicationofthe Euleriandata required forthe force calculation. Moreover the two communicationsteps that are mentioned in Algorithm4.1 are only between the neighbouring domain processes. The additional performance gain is also due to the fact that local DSMC processes canrunalmost independentlyofeachother inthesub-domains, unlikethemirror-domaintechnique employed forthediscretebubblemodel(see[25])wherethesmallestcollisiontime-stepacrossalldomainshastobedeterminedand broadcasttoallprocessors.

4.4.1. Datastructuresandflagging

Cell linked lists andVerlet lists are very well known forcollision detectionin severalparticle based techniques[49]. They are always used to minimize the number of particle-pair comparisons to resolve interactions among the particles themselves.The celllinkedlistusedinthecurrentimplementationisstoredin2arrays:one isalistofheadids andthe other isalistoflinkedids.An adjacencylistisusedtostoretheVerlet list.Apartfromthese,aparticleid

cellid map isalsostoredtoavoidunnecessarycellid calculations.Thecellidsarethenfurtherusedtoidentifythelocationtype (using thecellflags)ortheneighbourhoodoftheparticlesinquestion.

The memory allocations forthesedata structures arekept contiguous toavoid excessiverefreshing ofthe CPUcache. Moreover, theid storageinthememoryfortheneighbourlistisdoneintheorderofitsdistancefromtheparticleunder consideration.Quickaccesstothesevariablesforthedistancecalculation(duringthecollisionfrequencycalculation)results in as manycache hits as possible(see Fig. 3). This isnecessary because the collision frequency calculation is the most expensive step in Algorithm 3.1. SIMD vectorization operations have also shown to improve performance drastically for largenumbersofparticles[50].Thesewillbeconsideredforfurtherperformanceimprovementsinourfuturework.

Amapofflagsoncellsisusedtoidentifythelocationtypeofthebubbles/particlesandalsofortheboundarydetection (see Fig.4).Thismapisnecessaryforboundarydetection(like walls)andhalocells,preventingextrachecksforthe bub-bles/particlesinthebulk. Thisflaggingisdone onceduring initializationofthesimulation,eitherbasedonthe boundary conditionsfromtheEulerianpartorbasedonexplicitboundaryconditionsprovidedfortheLagrangianpart.

4.4.2. Bubblesinthehaloregion

Since the DSMC process in every sub-domain is independent, itis important to check theconsistency of the physics in the form of momentum conservation across sub-domains. Several stochastic methods in literature are known not to conserve momentum at the individual particle level (see [27]) but rather at an overall statistical level. Fig. 3 shows a 2D representationofasinglelayerparticledecompositionacrossneighbouringsub-domains, includingthehaloregions in colour.

ForthepureDSMCalgorithm,theexchangeofmomentumbetweenrealandhaloparticles/bubblesinthecurrentprocess need notbe exactlythesameasbetweenthe particles/bubblesofhaloandrealcellsofaneighbouringprocess.Thiswill only happenifthechosen collisionpairs areexactlythesameacross theneighbouringsub-domains. Sincetheprocess of

(10)

Fig. 3. Decomposition(1layerofghostcells)ofthediscretephaseshowninconnected2Dsub-domainswithaverletlistillustrationatthetopandthe neighbour-listmemoryallocationpatternshownatthebottom.Greycellsrepresenttheinternalcellsandcoloured cellsrepresentcorrespondinghalocells.

Fig. 4. MapofDSMCflagsshownforasub-domainforadomaindecompositionof5cores.Theflagsarenumberedbasedonthetypeofcells:1.Internal cell4.Wall7.Nearaboundary8.Halocells9.Nearaprocessboundary.

collisionpairselectionisrandom,thenetamountofmomentumlossorgainisalsorandom.Tominimizethemomentum exchangeerrors,anearestneighbourcollisionisusednearsub-domainboundaries.

(11)

Table 3

Physicalpropertiesofthediscretephaseandnumericalpropertiesofthesimulationsusedinverification testsfortheimpingingparticlestreamproblemI1.

Parameter Symbol I1 I2

Solid density ρl(kg/m3) 2000 2000

Particle diameter dp(mm) 2 2

Particle time step δtp(s) 5·10−5 5·10−5

Nozzle diameter Dnozzle(m) 0.25 0.1

System size L×W×H(m×m×m) 0.5×0.5×0.5 0.25×0.25×0.25 Inlet mean velocity vmean(m/s) 2.5 2.5

Standard deviation v (m/s) 1.0 1.0

5. Parallelverificationandvalidation

Inourprevious work[28],weverified andvalidatedthedevelopedDSMCmethodfortwolimitingconditions:firstfor a pureDEM(withnobackgroundfluidphase),andsecond forafullycoupledgas-liquidsystem.Thisalsoincluded mono-disperse andpoly-disperse cases. The calculation of the total number of collisions and also the collision frequency was verifiedforvaryingsolidsandgasfractions.Forvalidation,acomparisonwasmadewiththedeterministicDiscreteParticle Model(DPM)andDiscreteBubbleModel(DBM),respectively.

In thecurrentwork,similar testcaseshave beensetup toverifytheparallel implementationoftheDSMC algorithm. Naturally,itcannotbeexpectedfromastochasticalgorithmtoexecutetheexactsamesequenceofcollisionswhenrun mul-tipletimes,evenwhenrunonasinglecore.Thisispossibleinthedeterministiccaseswhentheinitialparticledistribution isidenticaleverysingletime.However,forastochasticalgorithmthestatisticalpropertiesofthecollisionsandmomentum exchangeshouldnotchangeuponparallelization.Thefollowingproblemsaresetupfortheverification:

Impingingparticlestreams

Singlebubbleriseacrosssub-domains

Bubblecolumns

5.1. Impingingparticlestreams

Thesystemconsistsoftwoinjectionnozzlesorientedsuchthattheypartlyfaceeachother.Thesolidparticlesflowinto thesystemthroughthesenozzles.Thesystemboundariesactasexitsfortheparticles.Thedomainisdecomposedbasedon thenumberofprocessorsprovided.Thesolidparticlesareprovidedwithinletvelocitiesbasedontheoverallmassflowrate. To obtaina velocitydistribution,a fluctuatingvelocity, basedon aGaussian distributionis addedtothe averagevelocity. Thewidthofthedistributionisvariedtoobtaindifferentvelocityprofiles fromthenozzles.Twotestcases,I1andI2have beensetup(seeTable3forthesimulationparameters)withfollowinggoals,

1. To verify with varying number of cores, consistency in the collision frequency andalso its independence from the domaindecomposition.(I1)

2. Toquantifymomentumerrorsthatoccuracrosstheneighbouringsub-domainsbecauseofamismatchincollisionpairs duetoindependentlyrunningDSMCloops.(I2)

A schematicoftheproblemsetup forI1withdifferentdecompositionsisreportedinFig. 5.The comparisonoftheserial versionofthemethodwiththedeterministicmethod(DPM)hasbeenreportedinKamathetal. [28] forbothmono-disperse andpoly-dispersecasesatvaryingsolidfractions.

Thesimulationhasbeenexecutedfor5decompositions:12cores,23cores,24cores,48coresand96cores.Theoverall collision frequency,the instantaneous collision frequency andthe total amount of collisions occurringin the systemare reported in Fig. 6. The results are almost independent of the number of cores used with a maximum relative error of 0.2%.Notethatthisalgorithmdoesnotusethenearestneighbourcollisionsasthe Stkp

→ ∞

withoutanycontinuousfluid

medium.

Asmallerrepresentativesystemofthesameproblemissimulated(I2)on5corestoevaluatethemomentum conserva-tionerroracrossneighbouringsub-domains(see Table3).ThiserrorisexpectedtoreduceintimefortheDSMCalgorithm with respect to the absoluteamount ofmomentum transferred across the sub-domain dueto the randomization ofthe particleids,collisionpartnersandsecondlybecausethesearchingscopeisisotropic.

Aschematic representationoftheproblemalong withtheprocess id numberingisshowninFig. 7.The relativeerrors arecalculatedformomentumconservationineachsub-domainwithalloftheirneighboursasfollows:



mom

=

|

pa

+

pb

|

(12)

Fig. 5. Impinging particle streams problem setup with different decompositions (I1).

(13)

Fig. 7. Impinging particle streams problem setup decomposed into 5 processes (I2).

Table 4

Momentumconservationerrorsacrossprocessesduetodomaindecompositionfor theDSMCalgorithmreportedfortheproblemI1.Note:Thisanalysisiscarriedout foradomaindecompositionof5processesandtheerrorsreportedareatthetime of2.0seconds.

Process boundary % Error Process boundary % Error

0↔1 1.108 1↔3 0.331

0↔2 0.553 1↔4 1.272

0↔3 4.921 2↔3 1.485

1↔2 5.747 3↔4 1.204

wherepa representsthemomentumstoredinthepartofthehaloregionofsub-processa thatissharedwithsub-process

b andsimilarlypb.Thesemomentumerrors(



mom)arereportedinTable4andshownforprocessid 0inFig.8.Itcanbe

seenthatthemaximumerrorsreportedareforprocessids0

3and1

2.Itshouldbenotedthattheoverlapbetween thesespecificsub-domainsisasinglerowofcellsalongthediagonal(seeFig.7b).Thenetamountofmomentumexchanged ortheamountofcollisions occurringherearenearlytwoordersofmagnitudelowerthanattheother processboundaries. Thecumulativeerrorwithrespecttotheamountofmomentumexchangedreducesasmorecollisionsoccur.

Nevertheless, to reduce this error, a nearest neighbour collision is forcednear the process boundaries inthe case of bubblyflows.Thisreducesthemomentumerrorsignificantly(tolessthan0.5%)asthebubblepositionsarecommunicated acrosstheprocessboundariesandthustherelativedistancesarethesameforagivenpairineachsub-process.

5.2. Singlebubbleriseacrosssub-domains

Forthesecondtest,asinglebubbleisinitiatedinthemiddleofadomainwithsixno-slipboundarywalls.Thetestcase differsfromthepreviousastheLagrangianparthereiscoupledwiththeflow-solver.Notethatbecausetherearenoother bubbles to collidewith, thepurpose ofthistest isto check whetherthedomain decompositionhasan influence onthe momentumcouplingbetweentheLagrangianandEulerianphase.ThesimulationsettingsarespecifiedinTables 5and6. Twotestsarecarriedouttocheck:

1. thetemporalaccuracyoftheforceintegrationschemewithrespecttothechosentime-step. 2. thespatialstabilityoftheforcecalculationwithchangingsub-domains.

The temporal accuracy of the first order explicit Euler scheme is checked with a fourth order explicit RK4 scheme (see Fig.9a). There are negligible differenceswhich indicate the sufficiency of theflow time-step division into different bubble time-steps. It is clearly seen fromFig. 9bthat thebubble passesthrough the sub-domains without anyjump in velocity arising duetoentryorexit fromthe sub-domains, whichindicates propercommunicationofflow velocitiesand thepressurefieldacrosssub-domains.

5.3. Bubblecolumns

Finally,thefour-waycouplingoftheLagrangianandEulerianphasesisverifiedviasimulationoftwo tallsquare cross-sectionedbubblecolumns(seeFig.10).Apartfromthedimensions,thenozzledistributioninthespargerforbothproblems isdifferent,ultimatelyinducingdifferentkindsofflowsinbothsystems.ProblemsetupparametersarelistedinTable7.

(14)

Fig. 8. Cumulativemomentainthethreedirectionscollectedinthehaloregionsofthecellsbelongingtotheneighbouringprocessesofprocessid0.Note: CI→CumulativeImpulseandSI→SumoftheImpulsescollectedinthehalocellsoftherespectiveidscontainingcommonedges(pa+pb).

Table 5

Physicalpropertiesofthediscretephaseandnumericalpropertiesofthe simula-tionsusedinverificationtestsforthesinglerisingbubbleproblem.

Parameter Symbol Value Bubble diameter dbub 4 mm

Bubble time step δtbub 5·10−5s

Box size L×D×H 0.15×0.15×0.15(m×m×m)

Grid cell size x 5 mm

Table 6

Physicalpropertiesofthegasandliquidphasesandnumericalproperties ofthesimulationsusedinverificationtests.

Parameter Symbol Value Liquid density ρl 1000 kg/m3

Liquid dynamic viscosity μl 1.002·10−3kg/m·s

Gas density ρg 1 kg/m3

Gas dynamic viscosity μg 1.85·10−5kg/m·s

Bubble diameter db 4 mm

Surface tension σ 72.86·10−3N/m

Flow time step δtf 10−3s

Bubble time step δtb 5·10−5s

Tolerance – 10−8

In both cases,the bubbles are introduced intothe domain through a number ofnozzles arranged in a square cross-sectional arrangementandplacedatthe bottomofthe domain.The superficialgas velocitiesremainconstant during the simulations.Onallverticalandbottomwalls,ano-slipboundaryconditionisimposed,whereasthetopwallhasafree-slip boundaryconditionmimickingafreesurface.Toavoidproblemswithmassconservationwhenthebubblesareintroduced andremovedfromthesystem,allverticalwallshaveanextraslitwithaprescribedstaticpressureoutletboundarycondition tofacilitateoutflowofexcessliquidorinflowofliquidintothevacuumcreatedbyoutflowofbubbles.Theslits’locationsare symmetricallyplacednearthetopofthecolumns.Thelengthandheight oftheslitsisequaltoonethirdofthedomain’s lengthandonegridcell,respectively.

(15)

Fig. 9. Single bubble rise velocity with time.

Fig. 10. ComputationaldomainsfortheT1and T2testcases.Blackzonesatthetopofthecolumnsindicateoutletslits.Blackspheresatthebottom representthesparger.Inbothcases,thewidthoftheslitsisequalto1/3ofthedomain’swidth,whereastheheightoftheslitsisequaltooneandtwo gridcellsizesfortheT1andT2cases,respectively.Theblacklineinfigure(a)correspondstoaheightofH=0.25 mandrepresentsthelinealongwhich dataiscollected.

Table 7

Geometryandspargerpropertiesfortheverificationtestsperformedonthesquarebubblecolumns. Here,dprepresentsthepitchdistance,vbisthesuperficialgasvelocityandDbisthebubble

diam-eter.DimensionsL,D andH correspondtox,y andz axes,respectively.

Case L×D×H, [m] #nozzles Pitch dp, [mm] Db, [mm] vb, [cm/s]

T1 0.15×0.15×0.45 49 6.25 4 0.49 T2 0.2×0.2×0.6 625 8 4 1 to 5

TheresultsoftheT1casearecomparedwithtime-averagedexperimentalmeasurementsofDeenetal.[51] andresults from theserial codereportedearlier [53].Fig. 11showsthe distributionof thetime-averagedaxial liquidvelocity along the x-axis at a fraction of the column height H of z

=

0

.

56H and fraction of the column depth D of y

=

0

.

5D. It can be seen fromFig.11that thesimulatedvelocitiesare ingoodagreement withtheexperimental measurementsandwith the reference data fromthe serial code. The minordeviations between simulated curvesfrom the presentwork can be attributedtothestochastic natureofbubblecollisionsintheDSMCandthus,slightlydifferentbehaviour oftheplumefor eachrealization.Themaximaatapproximatelyx

/

L

=

0

.

5 ofthetime-averagedaxialliquidvelocitiesareslightlyhigherthan

(16)

Fig. 12. Dependencyoftheintegralgashold-uponsuperficialgasvelocity.ComparisonofsimulateddatawithreferencesimulationsofDarmanaetal.[1] andexperimentalmeasurementsofHarteveldetal.[52].Dataisaveragedoverthetimeinterval10−20 s.

Table 8

ParallelperformanceofFoxBerry code.Comparisonoftimespentoncommunication(Tc),solvingthemomentumequations

(Tm),solvingthePPE(Tp),matrixassembly(Ta),fulltime(Tf),theoverallspeedupandtheparallelefficiency.Resultsare

averagedover10timestepsand192coresaretakenasareference.

Cores/192 Tc, [s] Tm, [s] Tp, [s] Ta, [s] Tf, [s] Speedup Efficiency, [%]

1 0.772 21.599 32.030 53.030 116.158 1.0 100.0 2 0.522 12.478 17.206 34.079 69.680 1.7 83.4 4 0.686 6.403 9.963 18.698 37.783 3.1 76.9 8 0.345 4.018 4.928 14.568 27.091 4.3 53.6 16 0.285 2.262 4.248 6.690 15.012 7.7 48.4

those of theserial code. Thiscan be attributedto the nearest-neighbourtreatment ofbubble-bubble collisions nearthe edges ofthe sub-domains.Tomaintainthe momentumconservationacrossthesub-domainborderswhichrendersit toa slightlymoredeterministiccollisiontreatment[25].

IntheT2case,resultsarecomparedwithexperimentalmeasurementsofHarteveldetal.[52] andnumericaldata,using theDBM,fromDarmanaetal.[25].Inthistestcase,thesimulationsareperformedusing24cores.Fig.12demonstratesthe integralgashold-upasafunctionofsuperficial gasvelocity.Thecalculatedvaluesoftheparallelizedcodeshow excellent agreementwiththeexperimentaldataaswellaswiththereferencesimulations.

6. Parallelperformanceandcomparisonwithpreviousapproaches

Inthissection,weanalyzetheparallel performanceoftheimplementationby consideringdifferenttypesofproblems. Alltestsareperformedon“Cartesius”,theDutchNationalSupercomputer.Everynode consistsof2x12coresofIntelXeon E5-2690v3 withabasicclock speedof2.6GHzandhyper-threadingswitchedoff.The numericalcodeiscompiledusing theIntelC++compilerv16.0.3,IntelMPIv5.1.3andTrilinos12.10.

6.1. Singlephaseflowsolver

TheparallelefficiencyofFoxBerry ismeasuredbasedontheflowintheinitialsectionofthesquareduct.Thedomainis discretizedon125

·

106 gridcells.Thechosen timestepis10−3 s.Eachtimesteptakesapproximately20inneriterations oftheSIMPLEalgorithm.

The strong scaling of FoxBerry is shownin Table 8 and demonstrates 48

.

4% ofthe parallel efficiency on 3072 tested cores (relative to the reference performance on 192 cores). The results in Table 8 also show a breakdown of the most

(17)

Table 9

MainpropertiesoftheP1andP2cases.NcellsanddxarethetotalnumberofEuleriancells

andtheirsize,respectively.<Nb>istheaveragenumberofbubblesinthefully

devel-opedsystemandvbisthesuperficialgasvelocity.DimensionsL,D andH correspondto

x,y andz axes,respectively.

Case L×D×H, [m] Ncells dx, [mm] #nozzles

<Nb> dp, [mm] Rb, [mm] vb, [cm/s]

P1 0.45×0.45×1.35 2.2·106 5 441

2.3·105 6.25 2 12

P2 0.50×0.50×2.00 4.0·106 5 3844

0.8·106 8.00 2 1

Fig. 13. StrongscalingoftheP1casebasedon10iterations.Thedomainconsistsof2.2·106cellsand2.3·105bubbles.Theresulton24coresistakenas

areference.

computationallyexpensivepartsofthecode,indicatingasignificantcontributionofthematrixassemblyintothetotaltime. This partcanbe furtherimproved byimplementation ofamore suitable algorithmforthepressure-velocity couplingfor transientproblems,e.g.thefractionalstepmethod.Inaddition,thesolutionofthePPEdemonstratespoorscalingafter768 cores,whichcanbeimprovedwithabetterpreconditioningtechnique.

6.2. Combinedcode

The parallel efficiency of the coupled Euler-Lagrange code directly depends on uniformity of the distribution of La-grangian entitiesinthe Euleriandomain[4,24].Inbubblecolumnreactors,thisdistributionis directlydeterminedby the geometryandplacementofthesparger.Thissectiondemonstratestheparallelefficiencyofthedevelopedcodeonexamples ofthetwomostcommonscenarioswhere:

(P1) the sparger partiallyoccupiesthebottom plate, whichleads toappearance ofaplume andanimbalanced workload fortheparallelDSMC.

(P2) the spargerfullyoccupiesthebottomplate,whichleadstouniformbubbleriseandawell balancedworkloadforthe parallelDSMC.

ThemainpropertiesoftheaforementionedcasesareshowninTable9.Allnozzlesarearrangedinsquaresandplacedat thebottomofthedomain.ThegoverningequationsarediscretizedonauniformCartesiangrid.Allthefollowingresultsare obtainedbyaveragingtherequireddataover10timestepsaftertheflowinthedomainisfullydeveloped.Itisimportant to note thatthe usedBartonconvectionschemedemands 3layers ofhalocells tocalculatetheconvective fluxescoming intoasub-domain.Thisresultsinanincreaseinthecommunicationmessagesizeandadverselyaffectsparallelperformance whenthedecompositionistoofine.

The parallel efficiencyandtheelapsedtime fortheP1caseare showninFig.13.The codescales well demonstrating a 48%combined parallelefficiencyon384cores,each timestep takes4.4secondsofwhich1.6secondsaretakenby the DSMC algorithm.Thedecreaseinthe parallelperformance ofthe CFDpartis duetothe lowdensityoftheEuleriangrid, causingMPIcommunicationstobedominantovertheworkloadoneachprocess.Additionally,thenon-evendistributionof thebubbles(seeFig.14)resultsinasignificantworkunbalanceamongtheprocessesintheDSMCalgorithm.Thisleads to only39%parallelefficiencyoftheDSMCalgorithmat384corespreventingthecoupledcodefromfurtherscaling.

Theproblemofunbalancedworkloadwillbecomemorepronouncedifthenumberofbubblesincreases,e.g. duetothe reductionoftheirsizes.ThisissueemphasizesthenecessityofhavingadynamicdomaindecompositionfortheLagrangian partinordertoimprovetheparallelperformanceofthecoupledCFD-DSMCcodeinsimulationssimilartotheP1case.This requires carefulcommunicationoftheEuleriandatato thedynamicallydetermined sub-domainsforthe forcecalculation

(18)

Fig. 14. Instantaneous bubble distribution and bubble velocity for the P1 case at different times of the simulation.

Fig. 15. StrongscalingoftheP2casebasedon10iterations.Thedomainconsistsof4·106cellsand106bubbles.Theresulton24coresistakenasa

reference.

step.Thiscommunicationstepiscurrentlycompletelyabsent.Therefore,onealsoneedstoconsiderifthiswillactuallypay offintermsofperformanceforgeneralmultiphaseflowproblems.

IntheP2case,theabsolutelyuniformdistributionofbubblesinthedomain(seeFig.16)leadstoasubstantialincrease of theparallel performance compared to theP1 case, asshown inFig. 15. The DSMC algorithm scales up to 768 cores, maintainingexcellent parallel efficiency,demonstrating thepotential forfurtherscaling. However, thegrid densityofthe Euleriansolveronlyallowstoscaleupto43%parallelefficiencyon768cores,whichleadstoanoverallparallelperformance of51%.Thiscorrespondsto4.5secondselapsedtime,amongwhichonly0.5secondsaretakenbytheDSMCalgorithm.

7. Results

7.1. Largescalebubblecolumn

Inthissection,a detailedanalysisofthehydrodynamicsinalargescalebubblecolumnispresented.Theconfiguration oftheconsideredcolumncorrespondstotheP1casedescribedinsection6.2.Thesimulationsareperformedfor95seconds ofthephysicaltimeandtheresultsareaveragedoverthetimeintervalbetween10 and95 seconds.Thelowerboundaryof theaveragingperiodcorrespondstothetimewhenthefullydevelopedflowconditionsarereached.

Fig.17 showsthedistribution ofthe time-averagevelocity vector fieldsat differentheightsofthe column. Ascan be seen, at the bottom of the column, the liquid velocity reaches approximately 1.3 m/s, while at the top it decreases to almost 0.2m/s. The highliquid velocity atthe bottomis directly relatedto thehighsuperficial gasvelocity andcan be explained asfollows. The bubbles already releasedinto the domain are decelerated due to the large mass ofthe liquid above them. Because of a lackof coalescence in thissimulation, the newbubbles entering thedomain collide withthe deceleratedbubblesandformadenseswarm. Thisswarmispushed bythenext setsofbubblesentering thesystemand thus,itexperiencesanincreaseintheaxialmomentum.Atthecenteroftheswarmtheliquidfractionislow andbubbles experiencelessdrag.Therefore,theyhavehighervelocitiescomparedtothebubblesattheedges.Thisdynamicsispreserved

(19)

Fig. 16. Instantaneous bubble distribution and bubble velocity for the P2 case at different times of the simulation.

Fig. 17. Distributionofthetimeaveragedvelocityvectorsatdifferentheightsofthedomain:a) H=0.005 m;b) H=0.18 m;c)H=0.36 m;d)H= 0.54 m;e)H=0.72 m;f)H=0.90 m;g)H=1.08 m;h)H=1.26 m.Dataisaveragedoverthetimeinterval10. . .95 s.

(20)

Fig. 18. Distributionofthetimeaveragedgasvolumefractioninthexyplaneatdifferentheightsofthedomain.Letterscorrespondtoheightsindicatedin Fig.17.Dataisaveragedoverthetimeinterval10. . .95 s.

intime andcauses theentrained liquidto behaveasa submerged jet whichdecays alongthe height duetothe viscous forcesandpresenceofthetopwall.

Figs.14cand14d showthat aftertheflowisdevelopedthe overalldistributionofthebubblesinthe domainremains unchanged.ThiscanbeclearlyseenfromFig.18,whichshowsthetime-averagegasvolumefractionatdifferentheightsin thedomain.Theresultsshowthatthecolumncanbeclearlysplitintotwoparts.Theflowinthebottompartisgovernedby thebubbleplume,whichmaintainsthehighaxialmomentumandinducesthemotionoftheliquid.Duringthesimulation, the bubble plume remains compact (see Figs. 18a-d) and centered. There is only a slight dispersion along the height. However, thebubbleplumebreaksathalftheheight ofthedomain(seeFigs. 18e-h).In thispart,thebubblesrise atthe centerofthecolumnwhiletheymovedownalongthesidewalls,becausetheforceduetotheliquidcirculationdominates the buoyancyforce ofthe bubbles. At the middleheight of the column(see Figs. 17d-e and 18d-e), the strength ofthe circulationzonesreducesandthedownwards movingbubbles arecarried awayby themoredominantliquidflow atthe centerofthedomain.

Thedistributionofthe time-averagevorticity magnitudeatdifferentheightsofthecolumnisshowninFig.19.Atthe bottom half of the domain the flow is highly rotational around the very pronounced corein the center. The regions of the highvorticity correlate withthe swirling effectof thebottom part ofthe plume, which can be observedin Fig. 14. Thisswirlingeffectiscausedby thehelicaltrajectoriesofthebubbles,whichisthemostenergyefficientwayfora

(21)

high-Fig. 19. Distributionofthetimeaveragedvorticitymagnitudeinthexyplaneatdifferentheightsofthedomain.Letterscorrespondtoheightsindicatedin Fig.17.Dataisaveragedoverthetimeinterval10. . .95 s.Thecolour legendisadjustedforbettervisualizationofthelowvaluesatthetopheights(figures

f-h).Beforetheadjustmentthemaximumvalueatthebottom(figurea)was50s−1.

Table 10

Thecirculation,,atdifferentheightsofthecolumn.iscalculatedbytheintegrationofavorticity acrossthefullareaoftheXYcross-sections.

Height a b c d e f g h

,[m2/s] 0.96 0.55 0.65 0.72 0.90 0.86 0.69 0.59

momentumgasplumetopenetrateaviscousfluid.Besidethecore,theflowisalmostirrotationalcomparedtothecentral region, except inthin layers nearthe walls, see Figs. 19c-d. At thetop half of thecolumn, the plume breakageand the large amountofbubbles leadtothe appearanceoftherotationalmotionin thewholedomain,whichresults inthehigh circulationatH

=

0

.

72 m andH

=

0

.

90 m,asshowninTable10.Also,despitethelowermagnitudeofthevorticityatthe top,thecirculationinthatregioniscomparabletotheoneatthebottom.

Theaforementioneddynamicscanalsobeseenfromthedistributionoftheturbulencekineticenergy(TKE),asshownin Fig.20.Whileatthebottomofthecolumnmostenergeticeddiesareconcentratedclosetothecoreoftheplume,thesecond half ofthecolumnshowsawide regionofhighTKE.Nevertheless,atthetopofthe column,the TKE remainslow inthe vicinitytothecornersandwalls,duetoviscouseffects.Inaddition,Fig.20indicatesalowregionoftheTKE atthecenter

(22)

Fig. 20. DistributionoftheTKE inthexyplaneatdifferentheightsofthedomain.LetterscorrespondtoheightsindicatedinFig.17.Dataisaveragedover thetimeinterval10. . .95 s.Thecolour legendisadjustedforbettervisualizationofthelowvaluesatthetopheights(figuresf-h).Beforetheadjustment themaximumvalueatthebottom(figurea)was50s−1.

ofthebubbleplume.Similartoasubmergedjet,theliquidinthisregionhasapotentialcore,whichcanalsobeobserved throughthezerovorticityatthecenterofthebubbleplumeinFigs.19b-c.However,theTKE significantlyincreasestowards theedgeofthebubbleplume,wheretheshear-stressesarehighandthefluidflowbecomeshighlyturbulent.Furtheraway fromtheplume,theTKE reducesindicatingadecreaseofthevelocityfluctuations.

7.2. Effectofcolumnscale

Theresultsandconclusionsobtainedfromthelaboratory-scalebubblecolumnmayfailtopredictthedynamicsin pilot-orindustrial-scalecolumns,whichiscausedbytheinfluenceofthegeometryandscaleeffects.Toassessthesignificanceof theseeffects,threebubblecolumnsofdifferentsizesarecompared,seeTable11.

TheT1andP1casesfromTable11havealreadybeendiscussedinsection5.3andsection 7.1,respectively.TheP3case representsadomain,whichisextendedtowardsindustrialscalesfromthecaseP1.Thenumericalandphysicalparameters oftheP3casecanbefound inTable6.Asinallcases,thespargerintheP3caseisrepresentedbynozzlesarrangedina squareandplacedatthecenterofthebottomofthecolumn.TheP3caseissimulatedfor27secondsofthephysicaltime using768cores.BothP1andP3caseshavebeensimulatedfor5daysorawalltimeof120hoursonrespectivenumberof physicalcores.

(23)

Table 11

Thecomparativelistofthemaincharacteristicsoflaboratory-scale(T1),pilot-scale (inter-mediate)(P1)andpilot-scale(P3)bubblecolumns.Ncellsanddxarethetotalnumberof

Euleriancellsandtheirsize,respectively.<Nb>istheaveragenumberofbubblesinthe

fullydevelopedsystemandvbisthesuperficialgasvelocity.ThedimensionsL,D andH

correspondtox,y andz axises,respectively.

Case L×D×H, [m] Ncells dx, [mm] #nozzles

<Nb> dp, [mm] Rb, [mm] vb, [cm/s] T1 0.15×0.15×0.45 8.1·104 5 49 5·103 6.25 2 0.49 P1 0.45×0.45×1.35 2.2·106 5 441 2.3·105 6.25 2 1 P3 1×1×10 107 10 625 5·105 8 4 0.6

Fig. 21. The instantaneous field of the axial liquid velocity in the mid-plane at different times for the T1 case.

Fig. 22. Theinstantaneousfieldoftheaxialliquidvelocityinthemid-planeatdifferenttimesfortheP1case.Theredrectangleindicatesthescaleofthe domainfromtheT1case.

(24)

Fig. 23. Theinstantaneousfieldoftheaxialliquidvelocityinthemid-planeatdifferenttimesfortheP3case.Theredandgreenrectanglesindicatethe scalesofthedomainsfromtheT1andP1cases,respectively.

Figs.21,22and23showtheinstantaneousaxialliquidvelocityfieldsattheverticalmid-planeofthedomainfortheT1, P1andP3cases,respectively.ItcanbeclearlyseenthatthemaximalaxialliquidvelocityintheT1casevariesonlyslightly alongwiththeheight ofthecolumn,unlike intheP1andP3cases,wheretheaxial liquidvelocity issignificantlyhigher atthe bottompart ofthedomain compared tothe toppart.This isbecause ofthe smalldistancebetweenthe opposite verticalwallsintheT1case.Largerecirculationzoneswithalengthcomparabletotheheightofthecolumnappearinthe

(25)

Fig. 24. The instant distribution of the bubbles in the domain at different times for the T1 case.

Fig. 25. TheinstantdistributionofthebubblesinthedomainatdifferenttimesfortheP2case.Theredrectangleindicatesthescaleofthedomainfrom theT1case.

domain shortlyafter thebubbles reachedthe top.Thesecirculation zonesthen travel downwards,disturbingcoreof the plumeleadingtomeanderingoftheplumefromtheverybottom.Incontrast,intheP1andP3cases,theriseofthebubble plume isaccompanied withmultipleeddiesofdifferentscales fromthevery beginning.Theseeddies dissipateenergyat differentsectionsoftheplumeleadingtounstablelargescalestructuresthatbreakintosmallerones.Thecoreoftheplume isrelativelymorestableatthebottombecausethesesmallereddiesdonothavetheenergytodisturbthebottompartof theplumecompletely.

The instantaneousdistributionofthebubblesinthethreedomainsisshowninFigs. 24,25and26.Thebubbleplume intheT1caseishighlyinfluencedbytheaforementionedrecirculationzones,whichpushtheplumetowardsthedifferent sidewalls(seeFig.24,t

=

15

.

0 and27.0seconds).Duetothedominanceofasinglelarge-scaleeddy,thebubbleplumemay

(26)

Fig. 26. TheinstantdistributionofthebubblesinthedomainatdifferenttimesfortheP3case.Theredandgreenrectanglesindicatethescalesofthe domainsfromtheT1andP1cases,respectively.

stayclosetooneofthewallsforalongperiodoftime,untilitischangedbyinstabilitiesarisinginthebulk.Thisdynamics is fundamentally differentfrom the P1andP3 cases,where the plume atthe bottom ofthe domain shifts only slightly fromthecenterandneverreachesthewalls.Inaddition,thebubbleplumeintheP1andP3casesbreaksshortlyafterthe bubblesenterthedomain.BasedontheindicatedscalesinFig.26,thebreakageofthebubbleplumeoccursatalmostthe sameheightsinbothcases.Theoverallinstabilityofthebubbleplumecontributestothealmostuniformdistributionofthe bubblesinthedomainabovethepointofbreakage.

ThecomparisonofthebubbledistributionintheP1andP3casesdemonstratestheinfluenceoftheheightofthedomain onthe risingbubbles.The frontofthe risingbubblesinthe P3caseconsistsofconstantly appearingsmallgroupsofgas inclusions. Thesegroupsexperience lessdragcompared tothe massivebubblefront buttheybreak fast,creatingregions withlowerpressure.Thebubblesfromtherisingfrontareattractedbytheseregions,whichleadstotheappearanceofthe

(27)

newgroups.ThesedynamicsarenotobservedintheT1case,wherethebubblesriseina“mushroom-like”manner[1],and intheP1case,wheretherisingbubblesrepresentadensecolumnmovinginthehelicalpath.

8. Conclusion

In thispapera parallel Euler-Lagrangeframework andits applicationto simulation ofdense gas-liquidflows are pre-sented.TheLagrangianphaseisresolvedbymeansoftheDSMCalgorithm,forwhichparallelizationiscarriedoutwithMPI, whereas theEulerian phase isresolved using the volume-averaged Navier-Stokesequations incorporated inthe in-house numericalframeworkFoxBerry.Thecoupledparallelcodehasbeenverifiedonseveralstandardproblemsanddemonstrates highaccuracy,whichisindependentofthenumberofusedcores.

The stochastic nature ofthe DSMC algorithmallows one to avoidlimitationsof theconventional deterministic (DBM) methods associatedwithparallelizationfordistributedmemoryplatforms. Thus,theDSMC methoddemonstratesa linear scaling up to 768 tested cores for 8

.

3

·

105 bubbles uniformly distributed across the domain. In the case of non-even

distribution ofbubbles,theDSMCalgorithmallows onetosimulate2

.

3

·

105 gasentitiesusing384coresandmaintaining

39% parallelefficiency.

Thedevelopedalgorithmsallow, forthefirsttime,tosimulateandanalyzeatallsquarecross-sectionedbubblecolumn usingfullfour-waycouplingoftheEulerianandLagrangianphases.Theresultsdemonstrateapronouncedsplitofthe do-mainintwosections.Inthebottomsection,thebubbleplumeoccupiesonlythecentralpartofthedomainanddetermines the overall dynamics ofthe Eulerianphase. Close to thecenter ofthe domain theliquid demonstrates ahigh rotational motion,whichisrelatedtotheswirlingbehaviour ofthebubbleplume.Inthetopsectiontheplumebreaks,whichleadsto amoreuniformdistributionofbubblesinthebulkandtheappearanceofalargeamountofunstableeddies.Thissplitting isnotobservedinasmallerscalebubblecolumn,wherethebubbleplumeremainspronouncedalongthewholeheightof thedomainandthedynamicsoftheplumeisgovernedbythelarge-scaleturbulentstructures.

Declarationofcompetinginterest

Theauthorsdeclarethattheyhavenoknownconflictsofinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.

Acknowledgements

This work is part of the TOP grant research programme “First principles based multi-scale modeling of transport in reactive three phase flows”withproject number716.014.001, financed by theNetherlandsOrganisation forScientific Re-search(NWO).Also,thisworkwassupportedbytheNetherlandsCenterforMultiscaleCatalyticEnergyConversion(MCEC), which isanNWO Gravitationprogramme fundedbytheMinistryofEducation,CultureandScienceofthegovernmentof the Netherlands.The authors also thankSURF SARA (www.surfsara .nl) andNWO forthe support in using the Cartesius supercomputer.

Appendix A. Domaindecomposition AlgorithmA.1Full3Ddomaindecomposition.

Input:N - number ofcellsinthedomainineachdirection,p -numberofprocesses 1: ξ = {round(√3p),round(p2

1),floor(p/(ξ1ξ2))} Basicnumberofsub-domainsineachdirection

2: d= N ξ Numberofsub-elementsineachsub-domain 3: Performafull3Ddecompositionofthedomainusing ξandd

4: s=k3=1ξk Numberofdecomposeddomains

5: r= |sp| Remainingsub-domains

6: if r=0 then

7: χ = {round(r),r/χ1} Remainingnumberofpseudo-2Dsub-domains

8: q= N χ Numberofsub-elementsineachsub-domain 9: η= ξ3d3 Startingindexin3rddirectionforpseudo-2Ddecomposition

10: Performpseudo-2Ddecompositionoftheremainingdomainusingχ andq, startingfromη

11: Buildmapsbetweenhaloandrealcells

12: end if

Appendix B. Momentumerrorallprocesses

(28)

Fig. B.27. Cumulativemomentainthethreedirectionscollectedinthehaloregionsofthecellsbelongingtotheneighbouringprocessesoftheremaining processids.Note:CI→CumulativeImpulseandSI→SumoftheImpulsescollectedinthehalocellsoftherespectiveids.

References

[1] D.Darmana,N.G.Deen,J.A.M.Kuipers,ParallelizationofanEuler-Lagrangemodelusingmixeddomaindecompositionandamirrordomaintechnique: applicationtodispersedgasliquidtwo-phaseflow,J.Comput.Phys.220 (1)(2006)216–248,https://doi.org/10.1016/j.jcp.2006.05.011.

[2] P.Ouro,B.Fraga,U.Lopez-Novoa,T.Stoesser,ScalabilityofanEulerian-Lagrangianlarge-eddysimulationsolverwithhybridMPI/OpenMPparallelisation, Comput.Fluids179(2019)123–136,https://doi.org/10.1016/j.compfluid.2018.10.013.

[3] A.D.Rakotonirina,A.Wachs,Grains3D,aflexibleDEMapproachforparticlesofarbitraryconvexshape- PartII:parallelimplementationandscalable performance,PowderTechnol.324(2018)18–35,https://doi.org/10.1016/j.powtec.2017.10.033.

[4] G.Pozzetti,X.Besseron,A.Rousset,B.Peters,Aco-locatedpartitionsstrategyforparallelCFD–DEMcouplings,Adv.PowderTechnol.(2018),https:// doi.org/10.1016/j.apt.2018.08.025.

[5] Y.Tian,S.Zhang,P.Lin,Q.Yang,G.Yang,L.Yang,Implementingdiscreteelementmethodforlarge-scalesimulationofparticlesonmultipleGPUs, Comput.Chem.Eng.104(2017)231–240,https://doi.org/10.1016/j.compchemeng.2017.04.019.

[6] V.Spandan,V.Meschini,R.Ostilla-Mónico,D.Lohse,G.Querzoli,M.D.deTullio,R.Verzicco,Aparallelinteractionpotentialapproachcoupledwith theimmersedboundarymethodforfullyresolvedsimulationsofdeformableinterfacesandmembranes,J.Comput.Phys.348(2017)567–590,https:// doi.org/10.1016/j.jcp.2017.07.036.

[7] S.Wang,K.Luo,S.Yang,C.Hu,J.Fan,ParallelLES-DEMsimulationofdenseflowsinfluidizedbeds,Appl.Therm.Eng.111(2017)1523–1535,https:// doi.org/10.1016/j.applthermaleng.2016.07.161.

Cytaty

Powiązane dokumenty

Эволюция шла от срубной постройки, восходящей к типу народного 1 Для деревянных храмов России характерно развитие путем блокировки

Wykorzystując program IMMI do prognozowania i obli- czeń rozprzestrzeniania się hałasu w środowisku, przeprowa- dzono symulacje, które pozwoliły na porównanie wartości po-

Wysokość kary w przypadku czy- nu polegającego na przerwaniu ciąży uzależniona została od stopnia rozwoju prenatalnego płodu, co wydaje się być rozwiązaniem

Il messaggio di Fatima si comprende alla luce dell’amore materno di Maria... Il messaggio di Fatima si comprende alla luce dell’amore materno

ђњќџюљіѧюѐїюȱїюјќȱѝќї҄ѐіђȱјќћѡџќѤђџѠѦїћђǯȱȱ

Jak pisze Jorge Luis Borges: „na obraz syreny z rybim ogonem mógł wpłynąć wygląd mito- logicznych trytonów, pół ludzi, pół ryb, bóstw morskich; w sztuce przedstawiane

Wydawca nie poinformował również we wstępie, że zapiski ułożone zostały przez niego nie w kolejności pojawiania się w rękopisie, lecz chronologicznie (według lat). na

Dans Le Figuier enchanté, nous assistons par ailleurs à une réécriture, dans sa dimension de base (récit d’un fils de campagne), d’une oeuvre québécoise majeure des années 1960,