Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly
flows
Kamath, S.; Masterov, M. V.; Padding, J. T.; Buist, K. A.; Baltussen, M. W.; Kuipers, J. A.M.
DOI
10.1016/j.jcpx.2020.100058
Publication date
2020
Document Version
Final published version
Published in
Journal of Computational Physics: X
Citation (APA)
Kamath, S., Masterov, M. V., Padding, J. T., Buist, K. A., Baltussen, M. W., & Kuipers, J. A. M. (2020).
Parallelization of a stochastic Euler-Lagrange model applied to large scale dense bubbly flows. Journal of
Computational Physics: X, 8, [100058]. https://doi.org/10.1016/j.jcpx.2020.100058
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy
Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
J.A.M. Kuipers
aaMultiphaseReactorsGroup,DepartmentofChemicalEngineering&Chemistry,EindhovenUniversityofTechnology,P.O.Box513,5600MB
Eindhoven,theNetherlands
bProcessandEnergyDepartment,DelftUniversityofTechnology,Leeghwaterstraat39,2628CBDelft,theNetherlands
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory:
Received3October2019
Receivedinrevisedform2April2020 Accepted18April2020
Availableonline4May2020
Keywords:
Parallelization Bubblecolumns Euler-Lagrangemodelling DirectSimulationMonteCarlo
Aparalleland scalablestochasticDirectSimulationMonteCarlo(DSMC)methodapplied tolarge-scaledensebubblyflowsisreportedinthispaper. TheDSMCmethodisapplied to speedup thebubble-bubblecollisionhandling relativeto theDiscrete Bubble Model proposed by Darmana et al. (2006) [1]. The DSMC algorithm has been modified and extended to account for bubble-bubble interactions arising due to uncorrelated and correlated bubble velocities. The algorithm isfully coupled withan in-house CFDcode andparallelizedusingtheMPIframework.Themodelisverifiedandvalidatedonmultiple coreswithdifferenttestcases,rangingfromimpingingparticlestreamstolaboratory-scale bubblecolumns.Theparallelperformanceisshownusingtwodifferentlargescalesystems: withanuniformandanon-uniformdistributionofbubbles.Thehydrodynamicsofa pilot-scale bubblecolumn is analyzedand the effectofthe column scaleis reportedviathe comparisonofbubblecolumnsatthreedifferentscales.
©2020TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Bubbly flowsare one ofthemorecomplex multi-scalemultiphaseflow problemswhich areencountered acrossmany fieldsof physicsand engineering.Small scalephysical changesin thepropertiesof thedifferentphases canhavea huge impact onthe largescale physics ofsystems.Such flows are studiedatthe DNS(DirectNumerical Simulations)level for representative systems to derive closures for interactions forces used in simulations executed at larger length and time scales. Methods such as the Discrete Bubble Model (DBM) and also the stochastic Euler-Lagrange (E-L) models (among whichDSMC methods)are examplesofclasses ofmethodswhere theseclosuresare applied.Thesemodels (DBM/DSMC) havebeenextensivelyused acrossmanylength andtimescales forthesimulation ofmulti-phaseflows andhaveproven toprovidegoodresultsonrelativelycoarsegridscomparedtofullyresolvedDNS, whileremainingmoreaccuratethanthe furthercoarsegrainedEuler-Euler(E-E)models.
However, duetothe largeamountof individuallytracked objectsinlarge-scale systems,the applicationofLagrangian Discrete Methods(LDM) inmulti-phase flows is complicateddueto the highcomputational effort andmemory require-ments. Thelimitationscan beovercome witheffectiveparallelizationofthenumerical code,whichcanbe realizedusing
*
Correspondingauthors.E-mailaddresses:j.t.padding@tudelft.nl(J.T. Padding),k.a.buist@tue.nl(K.A. Buist).
1 AuthorsS.KamathandM.V.Masterovhavecontributedequallytothiswork.
https://doi.org/10.1016/j.jcpx.2020.100058
2590-0552/©2020TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
Table 1
ListofrecentpublicationsonparallelimplementationsofLagrangianmethods.
Name Type CFD grid size #Lagr. elements Parallelization technique #Cores Efficiency Ouro et al. [2] CFD+IBM 52.59·106 3.57·105 MPI+OpenMP 314 ideal
Rakotonirinaa et al. [3] CFD+DEM - 2.304·108 MPI 768 90%
Pozzetti et al. [4] CFD+DEM 106 107 MPI 280 70%
Tian et al. [5] DEM - 1.28·108 GPU/MPI 256/128 62.59%/20%
Spandan et al. [6] CFD+IBM 2·109 2.5·104 MPI 1000 78%
Wang et al. [7] CFD+DEM 3·105 2·105 MPI 64 57.5%
Geneva et al. [8] CFD+IBM 1.6·107 7% vol. frac. MPI 580 ideal
Loisy et al. [9] CFD+DEM 1.68·107 8 MPI 512 90%
Liu et al. [10] CFD+DEM 1.25·105 104 MPI 1000 50%
Yue et al. [11] DEM 2·104 8 GPU 2880 x20 speedup
Mountrakis et al. [12] LBM+IBM 1.342·108 6.624·105 MPI 8192 79%
Ma et al. [13] DBM 6.5·104 1.7·105 OpenMP 12 99%
Goniva et al. [14] CFD+DEM 2.05·105 2.05·107 MPI 512 >100%
Berger et al. [15] DEM - 5·104 MPI+OpenMP 128 50%
Wu et al. [16] CFD+DEM 1.28·105 2·106 MPI+OpenMP 128 60%/20%
Niethammer et al. [17] MD - 1012 MPI+OpenMP 65536 82.5%
Liu et al. [18] CFD+DEM 1.6·106 5
.12·106 MPI+OpenMP 256/128 37.5%
Amritkar et al. [19] CFD+DEM - 1.05·106 MPI+OpenMP 64
-Kuan et al. [20] CFD+FT 6.4·104 - MPI 128 20%
Kafui et al. [21] CFD+DEM - 5·104 MPI 64 58%
Darmana et al. [1] CFD+DBM 6.5·106 105 MPI 32 50%
Maknickas et al. [22] DEM - 105 MPI 16 70%
Pohl et al. [23] LBM 1.078·109 - MPI 512 75%
standaloneandcombinedmodelsfordistributedandsharedmemoryplatforms,suchas:MessagePassingInterface(MPI), OpenMP and programming on Graphics Processing Units (GPU). For an overview of the recent parallelization strategies appliedtoLDMmethodsseeTable1.
ThemaindrawbackofMPI-basedcodesisinthenecessityofkeepinganoptimalloadbalancebetweendistributed pro-cesses, which isoften hard orevenimpossible toreach ina real casescenario.Forinstance, inthe Euler-Lagrange(E-L) framework,anon-evendistributionofthedispersedphasemayleadtoreductionoftheparallelperformanceorevenidling ofsomeprocessesduringexecutionoftheLagrangianpart.Thisproblemismainlycausedbythestaticdomain decomposi-tion basedontheEuleriangrid.Theobvioussolutionistouseaseparate dynamicdecompositionfortheLagrangianpart. The couplingbetweentwo numericaldomains can be achievedby meansof amap, that should bereconstructed during the update ofthe Lagrangian domaindecomposition.This approachwas successfully appliedby Pozzetti etal.[24], who reportedanexcellentparallelefficiencyoftheE-Lframeworkupto1400cores.However,inauniformlydistributedsystem, thisapproachmayalsoleadtoextracommunicationsand,therefore,toadecreaseinparallelefficiency.
ThecomputationalandparallelefficiencyofLDMshighlydependsonthetypeofcollisiondetectionmethodemployedin thealgorithm.Particlebasedsimulations,whichoftenutilizeasoft-sphereoraforcebasedcollisionmodel,employasingle loop-based algorithm,thusleading tooptimalparallel performanceonanyplatform,asshownby Niethammeretal.[17], Pohletal.[23] andTianetal.[5],whoperformedsimulationofsystemscontaining1012,109and108 particles,respectively. MostoftheseworksarebasedonthedecompositionofonlyLagrangianentitiesinthedomain,whichcaneasilyincorporate load-balancingwiththelateststateoftheartalgorithms.
Historically, bubble-based simulations have oftenemployed a hard-sphere model,which is an event-drivenalgorithm that requirescalculationoftheshortestcollisiontimebetweenallpossiblepairsofbubbles.Theefficientparallelizationof theDBMmodelondistributedmemoryplatformsisimpossibleduetothenecessityofcontinuoussynchronizationbetween allprocesses,whilesharedmemoryplatformslimittheamountofbubblesthatcanbesimulated.Severalrecentworksalso reported theuseofsoft spheremodels forbubblyflow simulations.However, thesesoftspheremodels generallyrequire determination ofa springconstant whichdefinesthe amountofoverlapthat is allowed.Thisis challenging sincea high springconstant leadstoverysmalltime steps,whileatoolowspringconstantleads tosignificantover-packing.Toavoid spuriousvelocitiesindensesystemsmodelledwithasoft-spheremodel,veryhighspringconstantsand,therefore,extremely low time-stepsare required to resolve thecollisions sufficientlyaccurate. Therefore, the numberof worksconducted on parallelizationofbubblyflowsatthisscaleisverylimited.
Darmana etal.[25] simulated
O(
105)
bubblesusinga mirrordomaintechnique fortheDiscreteBubble Model(DBM)combinedwithan MPI-basedCFDcode.Theseauthorsreportedonly50%ofparallel efficiencyon 32cores.Ma etal.[13] simulateda bubblecolumnwith6
.
5·
104 bubblesusingaDBMcodeparallelized withOpenMP andreported99%parallel efficiency on12threads. Sungkornetal. [26] havereporteda 40m3 stirred bio-reactor withabout15.6million bubbles usingaLBMbasedflow-solverandthebubblephasebeingsolvedonGPUs.Thisapproachusedaghost-particletechnique developed by Sommerfeldet al. [27].However, the volumedisplacement ofthe bubblesis notaccountedfor intheflow solvervia thevolumefractiontermandisonlycoupledthroughthevolumetricsourceterminthemomentumequations. Recently,Kamathetal.[28] havereportedaDSMCbasedstochasticEuler-LagrangemethodwhichaccountsforthevolumeFV M= −CV MρlVb Dbv Dbt − Dbu Dbt CV M=0.5 [25] FW=CWRbρlVb 1 D2 bw |u−v|2n W CW= ⎧ ⎪ ⎨ ⎪ ⎩ exp(−0.933Eo¨+0.179), 1≤Eo¨d<5 0.0007Eo¨+0.04, 5<Eo¨d≥33 0.179, Eo¨d>33 [29,30]
displacementofthebubblesintheEulerianphaseandalsohasthepotentialtobeparallellizedforlargescalebubblyflow simulations.
Thispaperpresents anefficient highly scalableparallel approachthat allows oneto simulatedense bubblyflows ina LagrangianformulationusingtheMPIframework.Thebubblecollisionsareresolvedby meansofDirectSimulationMonte Carlo(DSMC),proposedearlierbyKamathetal.[28].Totheauthors’knowledge,thisisthefirsttimeaDSMCbasedmethod isusedtosimulatedensebubblyflowsatthisscale.TheDSMCoperateswithacollisionprobabilitythatcanbecalculated locallyandthusthemethoddoesnotrequireglobalsynchronizationsanddemonstrateshighparallelefficiency.Asaresult, the fullycoupled CFDandDSMC code canbe applied to simulations oflarge scale bubblecolumns withmorethan 106 bubbles.
Thepaperisstructuredasfollows.Section2describestheDSMCmethodappliedtobubblesandthegoverningequations fortheliquidphase.Insection3,thenumericalapproachforbothphasesandtheircouplingisdiscussed.Theparallelization strategyforthegasandliquidphasesisdescribedinsection4.Insection 5,theverificationandvalidationoftheparallel code is reported.Section 6 is dedicated to the investigation ofthe parallel performance. Insection 7.1an example ofa simulationofalargescalebubblecolumnisshown.Finally,theconclusionsarepresentedinsection8.
2. Methodology 2.1. Discretephase
The discrete phase or bubble phase is solved in a Lagrangian framework where the bubble motion is governed by Newton’sequationofmotion(equation(1)).Theinteractionforcesincludedinthismodelarethegravitational(FG),pressure
(
FP)
,drag(
FD)
,lift(
FL)
,virtualmass(
FV M)
andwalllubrication(
FW)
force (equation(2)).TherelevantclosuresusedfortheseforcescanbefoundinTable2.
ρ
bVb dv dt=
F− (
ρ
b d(
Vb)
dt)
v (1)F
=
FG+
FP+
FD+
FL+
FV M+
FW (2)The DSMC methodsgenerallyemploy a parcelapproach wherea group ofdiscrete particles ofthesame size and ve-locity isrepresented by one simulatedparticle. The resolution ofthe DSMC method is also governed by the numberof discretephaseentitiesperparcel.Inthecurrentwork,theparcelsizeforallsimulationsistakenas1,toobtainanaccurate representationof the Lagrangianphase. Moreover, the extension tolarger parcel sizes requiresclosures determined with carefulconsiderationofthedifferentforcesactingonthebubblesasaswarmandmappingfunctionsforthecalculationof theporosity.Thisiscausedbythenon-lineartrajectoryofthebubblesduetocouplingwiththeliquid.Theparcelgrowth cannotbeassumedtobeisotropic.Moreover,parcelsofsimilarsizeshouldbeabletobreak-upandcoalescebasedonthe fluid-parcelinteractionsandtheinducedturbulenceintheliquid.Thisbehaviour iscomplexduetothefourwaycoupling andthisaspectwillbeconsideredinourfuturework.
ThemaingoaloftheDSMCmethodistopredictthefrequencyandprobability ofcollisions ofparticles.Thisstochastic approach ismuch more efficientthan deterministic collision models.The method was initially developedandapplied to molecularsystems thathavefinite/largeKnudsen numberwithno externalforce fieldsaffectingthepositionandvelocity
of the parcels [32]. The modified Nanbu method is a DSMC type method used by Pawar et al. [33] to simulate small droplets inspraymodellingwithafew alterations.Twomodificationsaremadetothealgorithmreportedby Pawaretal. [33] byKamathetal.[28].Firstofall,theradialdistributionfunctionatcontactisaddedtoaccountfortheincreaseinthe collisionfrequencyandtheprobabilityofcollisionsfordenseclusters.Secondly,anearestneighbourcollisioncheckisadded accountingfordiscretephase entitieswithcorrelatedvelocitieswhicharenot accountedforby thepreviouslymentioned algorithms.
2.2. Liquidphasehydrodynamics
Theliquiddynamicsareresolvedbymeansofthevolume-averagedNavier-Stokesequations:
∂
ε
l∂
t+
σε
lu·
dA=
0 (3)ρ
l∂
ε
lu∂
t+
ρ
l σε
luu·
dA= −
ε
l∇
p−
σε
lτ
l·
dA+
ε
lρ
lg+
(4)where
isthevolumeofthegridcell, A isthefaceareaofthegridcell,
ε
l isthelocalliquidvolumefraction,represents
anexternalsourcetermduetothepresenceofthegasphaseand
τ
l istheviscousstresstensorgivenby:τ
l= −
μ
e f f,l(
∇
u)
+ (∇
u)
T−
2 3I(
∇ ·
u)
(5) whereμ
e f f,l=
μ
L,l+
μ
T,l (6)The
μ
L,l andμ
T,l arethedynamic molecular andeddy viscosity,respectively.Toclosethe systemwe usethesubgridscalemodelproposedbyVremanetal.[34]:
μ
T,l=
ρ
l2.
5C2sβ
bα
i jα
i j,
α
i j=
∂
uj∂
xiβ
b= β
11β
22− β
122+ β
11β
33− β
132+ β
22β
33− β
232, β
i j=
2/3α
miα
mj (7)whereCsistheSmagorinskyconstantequalto0.1.
3. Numericalsolutionmethod 3.1. Timemarching
The numerical solution of the governing equations for our Euler-Lagrange model is obtained using a time-marching techniquesimilartotheonereportedinDarmanaetal.[25] wheretheequationsforthediscreteandcontinuousphaseare solvedsequentially,whileaccountingforthecouplingviathemomentumsourceterms.Threetimestepsaredefinedinthis model:theflowtime-step(
δ
tf low),theLagrangianphasetimestep(δ
tbub)andthecollisiontime-step(δ
tcoll)suchthat:δ
tf low=
N
iδ
tbub, δ
tbub=
δ
tcoll (8)where
N
iisfixedforagivensimulationsuchthatthetimeintegrationofallinteractionforcesactingonagivenbubbleiscarriedout inan accuratemanner.Inthiswork
N
i=
20 tomaintainsufficientaccuracyforboththeforceintegrationandalso theDSMC collisions.
δ
tcoll is calculatedonthefly bythe DSMCalgorithmfromthe estimatedcollisionfrequencyforeach discrete phase entity,asreportedby Kamathetal. [28].This isexplainedin moredetail inthecoming sections.In DBM,
δ
tcolliscalculatedasthetime-steptoreachthenextphysicalcollision,withwhichthewholesystemisupdated.3.2. Coupling
DSMCalgorithms havecomefarintermsofdescribingthehydrodynamicsandheattransfer,whenimplementedatthe molecular level orin applications where particle-particleor droplet-droplet collisions are the main physicalphenomena occurringinthesystemofinterest[32,35,36].Thecontinuousphaseiseitherabsent,weaklycoupledorcoupledonlyusing momentumbasedsourceterms,whichresultsinacompletelycollisiondrivensystem.Thisisnotwhatweencounterinthe currentsystemofbubblyflows.Apartfromtherightcollisionfrequency,thephasefractionneeds tobeevaluated forthe discretizedvolume-averagedNavier-Stokesequations(seeequation(3) and(4)).
v
n+1 vn dv=
tn+δt bub tn Fn mb dt (9)The force integrationhasbeentestedwitha fourthorderexplicit Runge-Kuttaschemeanda firstorderexplicit Euler scheme.The first orderschemeis morecomputationally efficient,when thetime-step constraintisdue tothe collisions. Both schemeswere checkedon asingle bubbleriseproblem(seesection 5.2) formanytime-stepsuntil theterminalrise velocityisestablished.
The bubblesare moved intime using thetime-marching technique described insection 3.1.During each bubbletime step,theforces(using equation (9))arecalculated atthediscretebubblelocationsusingthe quantitiesmappedfromthe Euleriangridcellsusingthepolynomialfilterdescribedinsection3.2.DuetoNewton’sthirdlawofmotion,areactionforce iscollected tobemappedontothemomentum nodesintheEulerianframework,tobe appliedattheendofthebubble time-step.Thecollisionsequenceistheninitiated accordingtotheAlgorithm3.1.Thisisrepeateduntilthediscretephase hasmovedforafull
δ
tf low.Theforcescollectedinthevolumetricsourcetermaretime-averagedovertheflowtime-stepδ
tf low,sincethey arecalculatedmultipletimesinoneflow time-step.Theweights fortheaveraging arecalculatedbasedontheratio
δ
tbub/δ
tf low toachievethecorrectmomentumbalance.Withthenewbubblepositions,thevolumefractionineachEuleriancelliscalculatedusingthesamepolynomialmappingfunction.
Inanyparticlebasedtechnique,thetwomajorstepsaretheforce calculationstepandthecollisiondetectionstep.Ina hard-spherecollisionmodel,amongwhichtheDSMC,theforce calculationandthecollisiondetectionloopsareseparated. Typicallythemostexpensivestepisthecollisiondetectionstep,especiallywhendonetriviallyleadingtoatimescalingof theorderof
O(
N2)
where N isthetotalnumberofdiscretephaseentitiesinthesystem. Theapplicationofcelllists and neighbourlistsreducethenumberofcomparisonsconsiderablyandreducethetimescalingtoO(
pN)
wherethefactorp dependsonthesizeofthesearchingscopeorthecut-offdistance(rcut).Eventhen,populatingthesedatastructures takesconsiderableexecutiontime comparedtotheremainingpartofthealgorithm.AVerletlistfurtherreducesthenumberof timestheabovementioneddatastructures needtobe populatedorrefresheddepending ontheshelldistance Rshell.This
hasbeenimplementedandisbeingutilizedinthisworkforevaluatingtheneighboursinthesearchingscopesofdifferent bubbles.
The modified DSMC collision detection step is detailed in algorithm 3.1. The main additions relative to the already existingalgorithmproposedbyPawar etal. [33] are:a)anexplicittreatmentofnearestneighbourcollisionstoaccountfor correlatedanduncorrelateddiscretephasevelocities,andb)incorporationoftheradialdistributionfunction gi j toaccount
forthe increase incollision frequencyindense or clusteredarrangementsof particlesin 3D space[40]. Amore detailed explanation can be found in the work of Kamath etal. [28]. A speed up in the DSMC algorithm over the deterministic DBMfromDarmanaetal. [25] hasbeenreportedbyKamathetal. [28] withminimallossofaccuracyfordifferentcollision regimes.Moreover,thereasonforthespeed-upisstatedclearlyandtheserialcomputationaltimesforaDEMtime-stepfor variednumberofbubblesandvoidfractionshavealsobeenreported.Inthiswork,anMPIparallelizedversionofthesame algorithmispresented.
The term gi j is given by Ma and Ahmadi [41] for a mono-disperse system (see equation (10)). This is extended to
poly-dispersesystemsbySantosetal. [42] (seeequation(11)). gii
=
(
1+
2.
5ε
p+
4.
5904ε
2p+
4.
515439ε
3p)
1−
ε
pε
max 30.67802 (10) gi j=
1 1−
ε
p+
g(
r)
contact,ii+
1 1−
ε
p didj¯
ddi j (11)Algorithm3.1DSMCalgorithm.
1: for eachparticleidi do
2: Chooseabubble/particleidi∈X suchthat X= {n:n is a positive integer and n∈Ntot}whereNtot representstotalnumberofparticlesinthe
domain/sub-domain. 3: while t< δtbubdo
4: Calculatethecollisionfrequency fifori basedonequation
fi=
j∈Rs,i |vi j| π 4(d 2 i+d2j) nj 4 3πR3s,i gi jwherevi jistherelativevelocitybetweenparticles/bubblesi andj,d isthediameter,nj istheparcelsize,Rs,i isthesearchingscopesizefor
particle/bubblei andgi jistheradialdistributionfunctionatcontactfordiscreteentityi withparticletype j.
5: λi=|vfii| 6: tp,i=min λi 3|vi|,δtbub− tcompl
wheretp,i→discretephasetimestep,λi→meanfreepath,tcompl→completedtimewithinthetimeframeofδtbub
7: Rne w
s,i =max(|vi|tp,i,|vi j|maxtp,i)
8: Choosearandomχ= {x∈ R|0≤x≤1}
9: j=int[χNi]+1 whereNi→numberofneighboursinRs,i
10: Pi j= |vi−vj|π4(d 2 i+d 2 j) njtp,i 4 3πR3s,i gi j 11: ifχ>Nj i−Pi j&(vi−vj)· (ri−rj)<0 then
12: Collidediscretephaseentitiesi andj
13: else if Stkbub= τbub
τl
<1 then 14: Checkfornearestneighbourcollision 15: else
16: nocollisions 17: end if
18: Updaterianddri
19: if|dri|>Rshellthen
20: neighbourlistrebuild=true 21: else
22: neighbourlistrebuild=false 23: end if
24: t=t+ tp,i
25: end while
26: end for
whered represents
¯
theSautermeandiameteroftheparticleswithinthesearchingscope,andε
pistheaveragesolids/bubblevolumefractionintheneighbourhood ofparticlei. 3.3.1. Fluidphasenumericalscheme
Allnumericalsimulations fromthepresentworkareperformedusingFoxBerry - anin-housedevelopedframeworkfor solving unifiedtransportequations.Thediscretizationofthegoverningequations(3) and(4) isdone onastructured grid withstaggeredarrangementofvariables2 usingtheFiniteVolumemethod.TheSIMPLEalgorithmisusedforthe pressure-velocitycoupling,asdescribedbyPatankar[43].AllconvectivefluxesarediscretizedwiththesecondorderBartonscheme [44] andalldiffusivefluxesarediscretizedwiththesecondordercentraldifferencescheme.Tokeepthenumericalstencil compact, adeferredcorrection methodis appliedtoconvective fluxes, resultingina 7-diagonalmatrixina 3D case.The time integration iscarried out usingthe first order backward Euler method.All discretized terms are treated implicitly, exceptsourcetermsoriginatingfromthediscretephase,whichareconsideredexplicitly.
Alllinearsystemsaresolvedbymeansofanin-houseBiCGStab(2)method[45] andthesolutionofthePressurePoisson Equation(PPE)isadditionallyacceleratedwiththemultilevelMLsolver[46] fromtheTrilinoslibrary[47].
4. Parallelizationstrategy 4.1. Domaindecomposition
To obtain the optimalparallel performance, the developedcode utilizes a full 3D decomposition insteadof the often usedpencilorstripedecompositions.Thisallowsforreductionofthesub-domaininterface
A
3 proportionallytoitsvolumeV
, which helpsto decrease the communication messagesize betweensub-domains and, thus, the communicationtime.2 Discretizedscalarfieldsarestoredatthecenterofacontrolvolume,whilecomponentsofvectorfieldsarestoredatfacecenters.
3 Tosimplifyfurtherexplanationanddiscussion,theinternalsub-domaincellsarereferredtoasthecomputationalvolumeV,whereasthecellsadjacent
Fig. 1. An example of a full 3D decomposition for various number of processes (p). Sub-domains are coloured randomly for better visualization.
Fig. 2. Distributionofon-edgemomentumcellsbetweenadjacentprocesses1and2.Redpointsrepresentacopyofbluepointswhichbelongtothe
2process.
An exampleofthe cubic domaindecomposedwithdifferent numberofprocesses isshownin Fig. 1. The decomposition algorithm(AlgorithmA.1)isshowninAppendixA.
4.2. Parallelizationstrategyfluidsolver
Due tothe staggered arrangement ofvariables, two adjacentsub-domains
1 and
2 shareone layer ofmomentum
cells(seeFig.2),whichresultsintheduplicationofthematrixrowsduringtheassemblystepofthemomentummatrix.To preservetheinitialsizeofthedistributedmatrixandtoreducethepotentialincreaseofthesolutiontimeandthenumber ofiterationsoflinearsolvers,theruleofthe“positiveowner”isapplied.Thus,allsharedmomentumpointsonthepositive (right)sideofthe
1 sub-domainareconsideredasanextralayerofhalocells,whiletheirphysicalrepresentationbelongs
tothe
2sub-domain.
Theapplicationofthe“positiveowner”ruleresultsinanextracommunicationstepduringtheassemblyofthePressure PoissonEquation(PPE),causedbythedirectdependenceofcoefficientsinthePPEmatrixonthecentralcoefficientsofthe momentummatrices[43].
Tosimplifytheaccessto thisdata,every sub-domainperformsa searchinitsneighbouring sub-domainstodetermine theownershipoftherequiredelements.Thisoperationisperformedonceanditsresultisstoredintheformofahashtable, whichmapsthelocallymissingandremotelydetectedelementswiththeinformationoncorrespondingremoteprocessID. TheconstructedhashtableisusedateveryiterationoftheSIMPLEalgorithmtoobtaintheoff-processdataandtopopulate thelocalvectorsofthediagonalelementsfromthemomentummatrices.
4.3. ParallelizationstrategyDSMC
Parallelization strategy fortheDSMC partfollowsthesame domaindecompositiontechnique asthe fluidsolver men-tionedabove.Generallyparticlebasedsimulationsfollowtheirowndomaindecompositionasdescribed inAbrahametal. [48]. Sucha decompositioncanbe uniformornon-uniform,basedon thedynamicsofthe problemitself.Darmana etal. [25] havereportedamirrordomaintechnique todecomposethedomain.Theadvantageofsuch amethodliesinits load balancing,duetoan evendistributionofbubblesacrossall domains.Theparallelefficiencyofsuch atechniquelastsuntil a maximumwithin 50 processors, asshowninthe same work[25]. Asthe systembecomeslarger, the amountof
com-Algorithm4.1Bubblephasecalculationstepsinasub-domain
P
.1: for n<δtf low
δtbub do
2: Settbub=0.
3: Calculateinterfaceforcesandvelocityofbubbles∀i∈Ntot
4: Bubble/particleremovalfromthesub-domains/domain 5: Bubble/particletransport
6: Bubble/particleintroductionintothesub-domains/domain
7: if neighbourlistrebuild=true then seeoperation19inAlgorithm3.1
8: Refreshcell-linkedlistsandneighbourlists∀i∈Ntot
9: else
10: Donotupdatecelllinkedlistsandneighbourlists 11: end if
12: Transferrealbubbledatatoneighbouringhalocells 13: ExecuteAlgorithm3.1∀i∈Ntot+Nghost
14: CalculatevolumeofbubblesindifferentcellsandmapthemtotheEuleriansub-domain.
15: Applydatareductiontohalocellsofcurrentsub-domaintorealcellsofneighbouringsub-domainforthebubblevolumesandthemomentum sourceterms.
16: end for
munications forsuchatechnique willgetlargerduetotimelysynchronizationsamongprocessesandassuchtheparallel efficiencydropsquitequickly.
Anotherimportantdisadvantageofthemirrordomaintechniqueisthememoryrequirement.Thisisespeciallyapparent whentheproblemsizesgoupsuchthatthereare106
−
107bubblesinthesystemorevenmore.Everyprocessshouldhold enoughmemorytomaintainatleast7doublesforpositionandvelocityvectorsandbubblesize,aswellasmemoryforthe complexdatastructureswhichincludethecelllinkedlistsandneighbourlists.4.4. Algorithm
Theoveralldiscrete phasealgorithmisdescribed inacombinationofAlgorithms4.1and3.1.TheDSMCalgorithmuses the domaindecompositionof theEulerianpartitself toavoidextra overheadduetocommunicationofthe Euleriandata required forthe force calculation. Moreover the two communicationsteps that are mentioned in Algorithm4.1 are only between the neighbouring domain processes. The additional performance gain is also due to the fact that local DSMC processes canrunalmost independentlyofeachother inthesub-domains, unlikethemirror-domaintechnique employed forthediscretebubblemodel(see[25])wherethesmallestcollisiontime-stepacrossalldomainshastobedeterminedand broadcasttoallprocessors.
4.4.1. Datastructuresandflagging
Cell linked lists andVerlet lists are very well known forcollision detectionin severalparticle based techniques[49]. They are always used to minimize the number of particle-pair comparisons to resolve interactions among the particles themselves.The celllinkedlistusedinthecurrentimplementationisstoredin2arrays:one isalistofheadids andthe other isalistoflinkedids.An adjacencylistisusedtostoretheVerlet list.Apartfromthese,aparticleid
→
cellid map isalsostoredtoavoidunnecessarycellid calculations.Thecellidsarethenfurtherusedtoidentifythelocationtype (using thecellflags)ortheneighbourhoodoftheparticlesinquestion.The memory allocations forthesedata structures arekept contiguous toavoid excessiverefreshing ofthe CPUcache. Moreover, theid storageinthememoryfortheneighbourlistisdoneintheorderofitsdistancefromtheparticleunder consideration.Quickaccesstothesevariablesforthedistancecalculation(duringthecollisionfrequencycalculation)results in as manycache hits as possible(see Fig. 3). This isnecessary because the collision frequency calculation is the most expensive step in Algorithm 3.1. SIMD vectorization operations have also shown to improve performance drastically for largenumbersofparticles[50].Thesewillbeconsideredforfurtherperformanceimprovementsinourfuturework.
Amapofflagsoncellsisusedtoidentifythelocationtypeofthebubbles/particlesandalsofortheboundarydetection (see Fig.4).Thismapisnecessaryforboundarydetection(like walls)andhalocells,preventingextrachecksforthe bub-bles/particlesinthebulk. Thisflaggingisdone onceduring initializationofthesimulation,eitherbasedonthe boundary conditionsfromtheEulerianpartorbasedonexplicitboundaryconditionsprovidedfortheLagrangianpart.
4.4.2. Bubblesinthehaloregion
Since the DSMC process in every sub-domain is independent, itis important to check theconsistency of the physics in the form of momentum conservation across sub-domains. Several stochastic methods in literature are known not to conserve momentum at the individual particle level (see [27]) but rather at an overall statistical level. Fig. 3 shows a 2D representationofasinglelayerparticledecompositionacrossneighbouringsub-domains, includingthehaloregions in colour.
ForthepureDSMCalgorithm,theexchangeofmomentumbetweenrealandhaloparticles/bubblesinthecurrentprocess need notbe exactlythesameasbetweenthe particles/bubblesofhaloandrealcellsofaneighbouringprocess.Thiswill only happenifthechosen collisionpairs areexactlythesameacross theneighbouringsub-domains. Sincetheprocess of
Fig. 3. Decomposition(1layerofghostcells)ofthediscretephaseshowninconnected2Dsub-domainswithaverletlistillustrationatthetopandthe neighbour-listmemoryallocationpatternshownatthebottom.Greycellsrepresenttheinternalcellsandcoloured cellsrepresentcorrespondinghalocells.
Fig. 4. MapofDSMCflagsshownforasub-domainforadomaindecompositionof5cores.Theflagsarenumberedbasedonthetypeofcells:1.Internal cell4.Wall7.Nearaboundary8.Halocells9.Nearaprocessboundary.
collisionpairselectionisrandom,thenetamountofmomentumlossorgainisalsorandom.Tominimizethemomentum exchangeerrors,anearestneighbourcollisionisusednearsub-domainboundaries.
Table 3
Physicalpropertiesofthediscretephaseandnumericalpropertiesofthesimulationsusedinverification testsfortheimpingingparticlestreamproblemI1.
Parameter Symbol I1 I2
Solid density ρl(kg/m3) 2000 2000
Particle diameter dp(mm) 2 2
Particle time step δtp(s) 5·10−5 5·10−5
Nozzle diameter Dnozzle(m) 0.25 0.1
System size L×W×H(m×m×m) 0.5×0.5×0.5 0.25×0.25×0.25 Inlet mean velocity vmean(m/s) 2.5 2.5
Standard deviation v(m/s) 1.0 1.0
5. Parallelverificationandvalidation
Inourprevious work[28],weverified andvalidatedthedevelopedDSMCmethodfortwolimitingconditions:firstfor a pureDEM(withnobackgroundfluidphase),andsecond forafullycoupledgas-liquidsystem.Thisalsoincluded mono-disperse andpoly-disperse cases. The calculation of the total number of collisions and also the collision frequency was verifiedforvaryingsolidsandgasfractions.Forvalidation,acomparisonwasmadewiththedeterministicDiscreteParticle Model(DPM)andDiscreteBubbleModel(DBM),respectively.
In thecurrentwork,similar testcaseshave beensetup toverifytheparallel implementationoftheDSMC algorithm. Naturally,itcannotbeexpectedfromastochasticalgorithmtoexecutetheexactsamesequenceofcollisionswhenrun mul-tipletimes,evenwhenrunonasinglecore.Thisispossibleinthedeterministiccaseswhentheinitialparticledistribution isidenticaleverysingletime.However,forastochasticalgorithmthestatisticalpropertiesofthecollisionsandmomentum exchangeshouldnotchangeuponparallelization.Thefollowingproblemsaresetupfortheverification:
•
Impingingparticlestreams•
Singlebubbleriseacrosssub-domains•
Bubblecolumns5.1. Impingingparticlestreams
Thesystemconsistsoftwoinjectionnozzlesorientedsuchthattheypartlyfaceeachother.Thesolidparticlesflowinto thesystemthroughthesenozzles.Thesystemboundariesactasexitsfortheparticles.Thedomainisdecomposedbasedon thenumberofprocessorsprovided.Thesolidparticlesareprovidedwithinletvelocitiesbasedontheoverallmassflowrate. To obtaina velocitydistribution,a fluctuatingvelocity, basedon aGaussian distributionis addedtothe averagevelocity. Thewidthofthedistributionisvariedtoobtaindifferentvelocityprofiles fromthenozzles.Twotestcases,I1andI2have beensetup(seeTable3forthesimulationparameters)withfollowinggoals,
1. To verify with varying number of cores, consistency in the collision frequency andalso its independence from the domaindecomposition.(I1)
2. Toquantifymomentumerrorsthatoccuracrosstheneighbouringsub-domainsbecauseofamismatchincollisionpairs duetoindependentlyrunningDSMCloops.(I2)
A schematicoftheproblemsetup forI1withdifferentdecompositionsisreportedinFig. 5.The comparisonoftheserial versionofthemethodwiththedeterministicmethod(DPM)hasbeenreportedinKamathetal. [28] forbothmono-disperse andpoly-dispersecasesatvaryingsolidfractions.
Thesimulationhasbeenexecutedfor5decompositions:12cores,23cores,24cores,48coresand96cores.Theoverall collision frequency,the instantaneous collision frequency andthe total amount of collisions occurringin the systemare reported in Fig. 6. The results are almost independent of the number of cores used with a maximum relative error of 0.2%.Notethatthisalgorithmdoesnotusethenearestneighbourcollisionsasthe Stkp
→ ∞
withoutanycontinuousfluidmedium.
Asmallerrepresentativesystemofthesameproblemissimulated(I2)on5corestoevaluatethemomentum conserva-tionerroracrossneighbouringsub-domains(see Table3).ThiserrorisexpectedtoreduceintimefortheDSMCalgorithm with respect to the absoluteamount ofmomentum transferred across the sub-domain dueto the randomization ofthe particleids,collisionpartnersandsecondlybecausethesearchingscopeisisotropic.
Aschematic representationoftheproblemalong withtheprocess id numberingisshowninFig. 7.The relativeerrors arecalculatedformomentumconservationineachsub-domainwithalloftheirneighboursasfollows:
mom
=
|
pa
+
pb|
Fig. 5. Impinging particle streams problem setup with different decompositions (I1).
Fig. 7. Impinging particle streams problem setup decomposed into 5 processes (I2).
Table 4
Momentumconservationerrorsacrossprocessesduetodomaindecompositionfor theDSMCalgorithmreportedfortheproblemI1.Note:Thisanalysisiscarriedout foradomaindecompositionof5processesandtheerrorsreportedareatthetime of2.0seconds.
Process boundary % Error Process boundary % Error
0↔1 1.108 1↔3 0.331
0↔2 0.553 1↔4 1.272
0↔3 4.921 2↔3 1.485
1↔2 5.747 3↔4 1.204
wherepa representsthemomentumstoredinthepartofthehaloregionofsub-processa thatissharedwithsub-process
b andsimilarlypb.Thesemomentumerrors(
mom)arereportedinTable4andshownforprocessid 0inFig.8.Itcanbe
seenthatthemaximumerrorsreportedareforprocessids0
↔
3and1↔
2.Itshouldbenotedthattheoverlapbetween thesespecificsub-domainsisasinglerowofcellsalongthediagonal(seeFig.7b).Thenetamountofmomentumexchanged ortheamountofcollisions occurringherearenearlytwoordersofmagnitudelowerthanattheother processboundaries. Thecumulativeerrorwithrespecttotheamountofmomentumexchangedreducesasmorecollisionsoccur.Nevertheless, to reduce this error, a nearest neighbour collision is forcednear the process boundaries inthe case of bubblyflows.Thisreducesthemomentumerrorsignificantly(tolessthan0.5%)asthebubblepositionsarecommunicated acrosstheprocessboundariesandthustherelativedistancesarethesameforagivenpairineachsub-process.
5.2. Singlebubbleriseacrosssub-domains
Forthesecondtest,asinglebubbleisinitiatedinthemiddleofadomainwithsixno-slipboundarywalls.Thetestcase differsfromthepreviousastheLagrangianparthereiscoupledwiththeflow-solver.Notethatbecausetherearenoother bubbles to collidewith, thepurpose ofthistest isto check whetherthedomain decompositionhasan influence onthe momentumcouplingbetweentheLagrangianandEulerianphase.ThesimulationsettingsarespecifiedinTables 5and6. Twotestsarecarriedouttocheck:
1. thetemporalaccuracyoftheforceintegrationschemewithrespecttothechosentime-step. 2. thespatialstabilityoftheforcecalculationwithchangingsub-domains.
The temporal accuracy of the first order explicit Euler scheme is checked with a fourth order explicit RK4 scheme (see Fig.9a). There are negligible differenceswhich indicate the sufficiency of theflow time-step division into different bubble time-steps. It is clearly seen fromFig. 9bthat thebubble passesthrough the sub-domains without anyjump in velocity arising duetoentryorexit fromthe sub-domains, whichindicates propercommunicationofflow velocitiesand thepressurefieldacrosssub-domains.
5.3. Bubblecolumns
Finally,thefour-waycouplingoftheLagrangianandEulerianphasesisverifiedviasimulationoftwo tallsquare cross-sectionedbubblecolumns(seeFig.10).Apartfromthedimensions,thenozzledistributioninthespargerforbothproblems isdifferent,ultimatelyinducingdifferentkindsofflowsinbothsystems.ProblemsetupparametersarelistedinTable7.
Fig. 8. Cumulativemomentainthethreedirectionscollectedinthehaloregionsofthecellsbelongingtotheneighbouringprocessesofprocessid0.Note: CI→CumulativeImpulseandSI→SumoftheImpulsescollectedinthehalocellsoftherespectiveidscontainingcommonedges(pa+pb).
Table 5
Physicalpropertiesofthediscretephaseandnumericalpropertiesofthe simula-tionsusedinverificationtestsforthesinglerisingbubbleproblem.
Parameter Symbol Value Bubble diameter dbub 4 mm
Bubble time step δtbub 5·10−5s
Box size L×D×H 0.15×0.15×0.15(m×m×m)
Grid cell size x 5 mm
Table 6
Physicalpropertiesofthegasandliquidphasesandnumericalproperties ofthesimulationsusedinverificationtests.
Parameter Symbol Value Liquid density ρl 1000 kg/m3
Liquid dynamic viscosity μl 1.002·10−3kg/m·s
Gas density ρg 1 kg/m3
Gas dynamic viscosity μg 1.85·10−5kg/m·s
Bubble diameter db 4 mm
Surface tension σ 72.86·10−3N/m
Flow time step δtf 10−3s
Bubble time step δtb 5·10−5s
Tolerance – 10−8
In both cases,the bubbles are introduced intothe domain through a number ofnozzles arranged in a square cross-sectional arrangementandplacedatthe bottomofthe domain.The superficialgas velocitiesremainconstant during the simulations.Onallverticalandbottomwalls,ano-slipboundaryconditionisimposed,whereasthetopwallhasafree-slip boundaryconditionmimickingafreesurface.Toavoidproblemswithmassconservationwhenthebubblesareintroduced andremovedfromthesystem,allverticalwallshaveanextraslitwithaprescribedstaticpressureoutletboundarycondition tofacilitateoutflowofexcessliquidorinflowofliquidintothevacuumcreatedbyoutflowofbubbles.Theslits’locationsare symmetricallyplacednearthetopofthecolumns.Thelengthandheight oftheslitsisequaltoonethirdofthedomain’s lengthandonegridcell,respectively.
Fig. 9. Single bubble rise velocity with time.
Fig. 10. ComputationaldomainsfortheT1and T2testcases.Blackzonesatthetopofthecolumnsindicateoutletslits.Blackspheresatthebottom representthesparger.Inbothcases,thewidthoftheslitsisequalto1/3ofthedomain’swidth,whereastheheightoftheslitsisequaltooneandtwo gridcellsizesfortheT1andT2cases,respectively.Theblacklineinfigure(a)correspondstoaheightofH=0.25 mandrepresentsthelinealongwhich dataiscollected.
Table 7
Geometryandspargerpropertiesfortheverificationtestsperformedonthesquarebubblecolumns. Here,dprepresentsthepitchdistance,vbisthesuperficialgasvelocityandDbisthebubble
diam-eter.DimensionsL,D andH correspondtox,y andz axes,respectively.
Case L×D×H, [m] #nozzles Pitch dp, [mm] Db, [mm] vb, [cm/s]
T1 0.15×0.15×0.45 49 6.25 4 0.49 T2 0.2×0.2×0.6 625 8 4 1 to 5
TheresultsoftheT1casearecomparedwithtime-averagedexperimentalmeasurementsofDeenetal.[51] andresults from theserial codereportedearlier [53].Fig. 11showsthe distributionof thetime-averagedaxial liquidvelocity along the x-axis at a fraction of the column height H of z
=
0.
56H and fraction of the column depth D of y=
0.
5D. It can be seen fromFig.11that thesimulatedvelocitiesare ingoodagreement withtheexperimental measurementsandwith the reference data fromthe serial code. The minordeviations between simulated curvesfrom the presentwork can be attributedtothestochastic natureofbubblecollisionsintheDSMCandthus,slightlydifferentbehaviour oftheplumefor eachrealization.Themaximaatapproximatelyx/
L=
0.
5 ofthetime-averagedaxialliquidvelocitiesareslightlyhigherthanFig. 12. Dependencyoftheintegralgashold-uponsuperficialgasvelocity.ComparisonofsimulateddatawithreferencesimulationsofDarmanaetal.[1] andexperimentalmeasurementsofHarteveldetal.[52].Dataisaveragedoverthetimeinterval10−20 s.
Table 8
ParallelperformanceofFoxBerry code.Comparisonoftimespentoncommunication(Tc),solvingthemomentumequations
(Tm),solvingthePPE(Tp),matrixassembly(Ta),fulltime(Tf),theoverallspeedupandtheparallelefficiency.Resultsare
averagedover10timestepsand192coresaretakenasareference.
Cores/192 Tc, [s] Tm, [s] Tp, [s] Ta, [s] Tf, [s] Speedup Efficiency, [%]
1 0.772 21.599 32.030 53.030 116.158 1.0 100.0 2 0.522 12.478 17.206 34.079 69.680 1.7 83.4 4 0.686 6.403 9.963 18.698 37.783 3.1 76.9 8 0.345 4.018 4.928 14.568 27.091 4.3 53.6 16 0.285 2.262 4.248 6.690 15.012 7.7 48.4
those of theserial code. Thiscan be attributedto the nearest-neighbourtreatment ofbubble-bubble collisions nearthe edges ofthe sub-domains.Tomaintainthe momentumconservationacrossthesub-domainborderswhichrendersit toa slightlymoredeterministiccollisiontreatment[25].
IntheT2case,resultsarecomparedwithexperimentalmeasurementsofHarteveldetal.[52] andnumericaldata,using theDBM,fromDarmanaetal.[25].Inthistestcase,thesimulationsareperformedusing24cores.Fig.12demonstratesthe integralgashold-upasafunctionofsuperficial gasvelocity.Thecalculatedvaluesoftheparallelizedcodeshow excellent agreementwiththeexperimentaldataaswellaswiththereferencesimulations.
6. Parallelperformanceandcomparisonwithpreviousapproaches
Inthissection,weanalyzetheparallel performanceoftheimplementationby consideringdifferenttypesofproblems. Alltestsareperformedon“Cartesius”,theDutchNationalSupercomputer.Everynode consistsof2x12coresofIntelXeon E5-2690v3 withabasicclock speedof2.6GHzandhyper-threadingswitchedoff.The numericalcodeiscompiledusing theIntelC++compilerv16.0.3,IntelMPIv5.1.3andTrilinos12.10.
6.1. Singlephaseflowsolver
TheparallelefficiencyofFoxBerry ismeasuredbasedontheflowintheinitialsectionofthesquareduct.Thedomainis discretizedon125
·
106 gridcells.Thechosen timestepis10−3 s.Eachtimesteptakesapproximately20inneriterations oftheSIMPLEalgorithm.The strong scaling of FoxBerry is shownin Table 8 and demonstrates 48
.
4% ofthe parallel efficiency on 3072 tested cores (relative to the reference performance on 192 cores). The results in Table 8 also show a breakdown of the mostTable 9
MainpropertiesoftheP1andP2cases.NcellsanddxarethetotalnumberofEuleriancells
andtheirsize,respectively.<Nb>istheaveragenumberofbubblesinthefully
devel-opedsystemandvbisthesuperficialgasvelocity.DimensionsL,D andH correspondto
x,y andz axes,respectively.
Case L×D×H, [m] Ncells dx, [mm] #nozzles
<Nb> dp, [mm] Rb, [mm] vb, [cm/s]
P1 0.45×0.45×1.35 2.2·106 5 441
2.3·105 6.25 2 12
P2 0.50×0.50×2.00 4.0·106 5 3844
0.8·106 8.00 2 1
Fig. 13. StrongscalingoftheP1casebasedon10iterations.Thedomainconsistsof2.2·106cellsand2.3·105bubbles.Theresulton24coresistakenas
areference.
computationallyexpensivepartsofthecode,indicatingasignificantcontributionofthematrixassemblyintothetotaltime. This partcanbe furtherimproved byimplementation ofamore suitable algorithmforthepressure-velocity couplingfor transientproblems,e.g.thefractionalstepmethod.Inaddition,thesolutionofthePPEdemonstratespoorscalingafter768 cores,whichcanbeimprovedwithabetterpreconditioningtechnique.
6.2. Combinedcode
The parallel efficiency of the coupled Euler-Lagrange code directly depends on uniformity of the distribution of La-grangian entitiesinthe Euleriandomain[4,24].Inbubblecolumnreactors,thisdistributionis directlydeterminedby the geometryandplacementofthesparger.Thissectiondemonstratestheparallelefficiencyofthedevelopedcodeonexamples ofthetwomostcommonscenarioswhere:
(P1) the sparger partiallyoccupiesthebottom plate, whichleads toappearance ofaplume andanimbalanced workload fortheparallelDSMC.
(P2) the spargerfullyoccupiesthebottomplate,whichleadstouniformbubbleriseandawell balancedworkloadforthe parallelDSMC.
ThemainpropertiesoftheaforementionedcasesareshowninTable9.Allnozzlesarearrangedinsquaresandplacedat thebottomofthedomain.ThegoverningequationsarediscretizedonauniformCartesiangrid.Allthefollowingresultsare obtainedbyaveragingtherequireddataover10timestepsaftertheflowinthedomainisfullydeveloped.Itisimportant to note thatthe usedBartonconvectionschemedemands 3layers ofhalocells tocalculatetheconvective fluxescoming intoasub-domain.Thisresultsinanincreaseinthecommunicationmessagesizeandadverselyaffectsparallelperformance whenthedecompositionistoofine.
The parallel efficiencyandtheelapsedtime fortheP1caseare showninFig.13.The codescales well demonstrating a 48%combined parallelefficiencyon384cores,each timestep takes4.4secondsofwhich1.6secondsaretakenby the DSMC algorithm.Thedecreaseinthe parallelperformance ofthe CFDpartis duetothe lowdensityoftheEuleriangrid, causingMPIcommunicationstobedominantovertheworkloadoneachprocess.Additionally,thenon-evendistributionof thebubbles(seeFig.14)resultsinasignificantworkunbalanceamongtheprocessesintheDSMCalgorithm.Thisleads to only39%parallelefficiencyoftheDSMCalgorithmat384corespreventingthecoupledcodefromfurtherscaling.
Theproblemofunbalancedworkloadwillbecomemorepronouncedifthenumberofbubblesincreases,e.g. duetothe reductionoftheirsizes.ThisissueemphasizesthenecessityofhavingadynamicdomaindecompositionfortheLagrangian partinordertoimprovetheparallelperformanceofthecoupledCFD-DSMCcodeinsimulationssimilartotheP1case.This requires carefulcommunicationoftheEuleriandatato thedynamicallydetermined sub-domainsforthe forcecalculation
Fig. 14. Instantaneous bubble distribution and bubble velocity for the P1 case at different times of the simulation.
Fig. 15. StrongscalingoftheP2casebasedon10iterations.Thedomainconsistsof4·106cellsand106bubbles.Theresulton24coresistakenasa
reference.
step.Thiscommunicationstepiscurrentlycompletelyabsent.Therefore,onealsoneedstoconsiderifthiswillactuallypay offintermsofperformanceforgeneralmultiphaseflowproblems.
IntheP2case,theabsolutelyuniformdistributionofbubblesinthedomain(seeFig.16)leadstoasubstantialincrease of theparallel performance compared to theP1 case, asshown inFig. 15. The DSMC algorithm scales up to 768 cores, maintainingexcellent parallel efficiency,demonstrating thepotential forfurtherscaling. However, thegrid densityofthe Euleriansolveronlyallowstoscaleupto43%parallelefficiencyon768cores,whichleadstoanoverallparallelperformance of51%.Thiscorrespondsto4.5secondselapsedtime,amongwhichonly0.5secondsaretakenbytheDSMCalgorithm.
7. Results
7.1. Largescalebubblecolumn
Inthissection,a detailedanalysisofthehydrodynamicsinalargescalebubblecolumnispresented.Theconfiguration oftheconsideredcolumncorrespondstotheP1casedescribedinsection6.2.Thesimulationsareperformedfor95seconds ofthephysicaltimeandtheresultsareaveragedoverthetimeintervalbetween10 and95 seconds.Thelowerboundaryof theaveragingperiodcorrespondstothetimewhenthefullydevelopedflowconditionsarereached.
Fig.17 showsthedistribution ofthe time-averagevelocity vector fieldsat differentheightsofthe column. Ascan be seen, at the bottom of the column, the liquid velocity reaches approximately 1.3 m/s, while at the top it decreases to almost 0.2m/s. The highliquid velocity atthe bottomis directly relatedto thehighsuperficial gasvelocity andcan be explained asfollows. The bubbles already releasedinto the domain are decelerated due to the large mass ofthe liquid above them. Because of a lackof coalescence in thissimulation, the newbubbles entering thedomain collide withthe deceleratedbubblesandformadenseswarm. Thisswarmispushed bythenext setsofbubblesentering thesystemand thus,itexperiencesanincreaseintheaxialmomentum.Atthecenteroftheswarmtheliquidfractionislow andbubbles experiencelessdrag.Therefore,theyhavehighervelocitiescomparedtothebubblesattheedges.Thisdynamicsispreserved
Fig. 16. Instantaneous bubble distribution and bubble velocity for the P2 case at different times of the simulation.
Fig. 17. Distributionofthetimeaveragedvelocityvectorsatdifferentheightsofthedomain:a) H=0.005 m;b) H=0.18 m;c)H=0.36 m;d)H= 0.54 m;e)H=0.72 m;f)H=0.90 m;g)H=1.08 m;h)H=1.26 m.Dataisaveragedoverthetimeinterval10. . .95 s.
Fig. 18. Distributionofthetimeaveragedgasvolumefractioninthexyplaneatdifferentheightsofthedomain.Letterscorrespondtoheightsindicatedin Fig.17.Dataisaveragedoverthetimeinterval10. . .95 s.
intime andcauses theentrained liquidto behaveasa submerged jet whichdecays alongthe height duetothe viscous forcesandpresenceofthetopwall.
Figs.14cand14d showthat aftertheflowisdevelopedthe overalldistributionofthebubblesinthe domainremains unchanged.ThiscanbeclearlyseenfromFig.18,whichshowsthetime-averagegasvolumefractionatdifferentheightsin thedomain.Theresultsshowthatthecolumncanbeclearlysplitintotwoparts.Theflowinthebottompartisgovernedby thebubbleplume,whichmaintainsthehighaxialmomentumandinducesthemotionoftheliquid.Duringthesimulation, the bubble plume remains compact (see Figs. 18a-d) and centered. There is only a slight dispersion along the height. However, thebubbleplumebreaksathalftheheight ofthedomain(seeFigs. 18e-h).In thispart,thebubblesrise atthe centerofthecolumnwhiletheymovedownalongthesidewalls,becausetheforceduetotheliquidcirculationdominates the buoyancyforce ofthe bubbles. At the middleheight of the column(see Figs. 17d-e and 18d-e), the strength ofthe circulationzonesreducesandthedownwards movingbubbles arecarried awayby themoredominantliquidflow atthe centerofthedomain.
Thedistributionofthe time-averagevorticity magnitudeatdifferentheightsofthecolumnisshowninFig.19.Atthe bottom half of the domain the flow is highly rotational around the very pronounced corein the center. The regions of the highvorticity correlate withthe swirling effectof thebottom part ofthe plume, which can be observedin Fig. 14. Thisswirlingeffectiscausedby thehelicaltrajectoriesofthebubbles,whichisthemostenergyefficientwayfora
high-Fig. 19. Distributionofthetimeaveragedvorticitymagnitudeinthexyplaneatdifferentheightsofthedomain.Letterscorrespondtoheightsindicatedin Fig.17.Dataisaveragedoverthetimeinterval10. . .95 s.Thecolour legendisadjustedforbettervisualizationofthelowvaluesatthetopheights(figures
f-h).Beforetheadjustmentthemaximumvalueatthebottom(figurea)was50s−1.
Table 10
Thecirculation,,atdifferentheightsofthecolumn.iscalculatedbytheintegrationofavorticity acrossthefullareaoftheXYcross-sections.
Height a b c d e f g h
,[m2/s] 0.96 0.55 0.65 0.72 0.90 0.86 0.69 0.59
momentumgasplumetopenetrateaviscousfluid.Besidethecore,theflowisalmostirrotationalcomparedtothecentral region, except inthin layers nearthe walls, see Figs. 19c-d. At thetop half of thecolumn, the plume breakageand the large amountofbubbles leadtothe appearanceoftherotationalmotionin thewholedomain,whichresults inthehigh circulationatH
=
0.
72 m andH=
0.
90 m,asshowninTable10.Also,despitethelowermagnitudeofthevorticityatthe top,thecirculationinthatregioniscomparabletotheoneatthebottom.Theaforementioneddynamicscanalsobeseenfromthedistributionoftheturbulencekineticenergy(TKE),asshownin Fig.20.Whileatthebottomofthecolumnmostenergeticeddiesareconcentratedclosetothecoreoftheplume,thesecond half ofthecolumnshowsawide regionofhighTKE.Nevertheless,atthetopofthe column,the TKE remainslow inthe vicinitytothecornersandwalls,duetoviscouseffects.Inaddition,Fig.20indicatesalowregionoftheTKE atthecenter
Fig. 20. DistributionoftheTKE inthexyplaneatdifferentheightsofthedomain.LetterscorrespondtoheightsindicatedinFig.17.Dataisaveragedover thetimeinterval10. . .95 s.Thecolour legendisadjustedforbettervisualizationofthelowvaluesatthetopheights(figuresf-h).Beforetheadjustment themaximumvalueatthebottom(figurea)was50s−1.
ofthebubbleplume.Similartoasubmergedjet,theliquidinthisregionhasapotentialcore,whichcanalsobeobserved throughthezerovorticityatthecenterofthebubbleplumeinFigs.19b-c.However,theTKE significantlyincreasestowards theedgeofthebubbleplume,wheretheshear-stressesarehighandthefluidflowbecomeshighlyturbulent.Furtheraway fromtheplume,theTKE reducesindicatingadecreaseofthevelocityfluctuations.
7.2. Effectofcolumnscale
Theresultsandconclusionsobtainedfromthelaboratory-scalebubblecolumnmayfailtopredictthedynamicsin pilot-orindustrial-scalecolumns,whichiscausedbytheinfluenceofthegeometryandscaleeffects.Toassessthesignificanceof theseeffects,threebubblecolumnsofdifferentsizesarecompared,seeTable11.
TheT1andP1casesfromTable11havealreadybeendiscussedinsection5.3andsection 7.1,respectively.TheP3case representsadomain,whichisextendedtowardsindustrialscalesfromthecaseP1.Thenumericalandphysicalparameters oftheP3casecanbefound inTable6.Asinallcases,thespargerintheP3caseisrepresentedbynozzlesarrangedina squareandplacedatthecenterofthebottomofthecolumn.TheP3caseissimulatedfor27secondsofthephysicaltime using768cores.BothP1andP3caseshavebeensimulatedfor5daysorawalltimeof120hoursonrespectivenumberof physicalcores.
Table 11
Thecomparativelistofthemaincharacteristicsoflaboratory-scale(T1),pilot-scale (inter-mediate)(P1)andpilot-scale(P3)bubblecolumns.Ncellsanddxarethetotalnumberof
Euleriancellsandtheirsize,respectively.<Nb>istheaveragenumberofbubblesinthe
fullydevelopedsystemandvbisthesuperficialgasvelocity.ThedimensionsL,D andH
correspondtox,y andz axises,respectively.
Case L×D×H, [m] Ncells dx, [mm] #nozzles
<Nb> dp, [mm] Rb, [mm] vb, [cm/s] T1 0.15×0.15×0.45 8.1·104 5 49 5·103 6.25 2 0.49 P1 0.45×0.45×1.35 2.2·106 5 441 2.3·105 6.25 2 1 P3 1×1×10 107 10 625 5·105 8 4 0.6
Fig. 21. The instantaneous field of the axial liquid velocity in the mid-plane at different times for the T1 case.
Fig. 22. Theinstantaneousfieldoftheaxialliquidvelocityinthemid-planeatdifferenttimesfortheP1case.Theredrectangleindicatesthescaleofthe domainfromtheT1case.
Fig. 23. Theinstantaneousfieldoftheaxialliquidvelocityinthemid-planeatdifferenttimesfortheP3case.Theredandgreenrectanglesindicatethe scalesofthedomainsfromtheT1andP1cases,respectively.
Figs.21,22and23showtheinstantaneousaxialliquidvelocityfieldsattheverticalmid-planeofthedomainfortheT1, P1andP3cases,respectively.ItcanbeclearlyseenthatthemaximalaxialliquidvelocityintheT1casevariesonlyslightly alongwiththeheight ofthecolumn,unlike intheP1andP3cases,wheretheaxial liquidvelocity issignificantlyhigher atthe bottompart ofthedomain compared tothe toppart.This isbecause ofthe smalldistancebetweenthe opposite verticalwallsintheT1case.Largerecirculationzoneswithalengthcomparabletotheheightofthecolumnappearinthe
Fig. 24. The instant distribution of the bubbles in the domain at different times for the T1 case.
Fig. 25. TheinstantdistributionofthebubblesinthedomainatdifferenttimesfortheP2case.Theredrectangleindicatesthescaleofthedomainfrom theT1case.
domain shortlyafter thebubbles reachedthe top.Thesecirculation zonesthen travel downwards,disturbingcoreof the plumeleadingtomeanderingoftheplumefromtheverybottom.Incontrast,intheP1andP3cases,theriseofthebubble plume isaccompanied withmultipleeddiesofdifferentscales fromthevery beginning.Theseeddies dissipateenergyat differentsectionsoftheplumeleadingtounstablelargescalestructuresthatbreakintosmallerones.Thecoreoftheplume isrelativelymorestableatthebottombecausethesesmallereddiesdonothavetheenergytodisturbthebottompartof theplumecompletely.
The instantaneousdistributionofthebubblesinthethreedomainsisshowninFigs. 24,25and26.Thebubbleplume intheT1caseishighlyinfluencedbytheaforementionedrecirculationzones,whichpushtheplumetowardsthedifferent sidewalls(seeFig.24,t
=
15.
0 and27.0seconds).Duetothedominanceofasinglelarge-scaleeddy,thebubbleplumemayFig. 26. TheinstantdistributionofthebubblesinthedomainatdifferenttimesfortheP3case.Theredandgreenrectanglesindicatethescalesofthe domainsfromtheT1andP1cases,respectively.
stayclosetooneofthewallsforalongperiodoftime,untilitischangedbyinstabilitiesarisinginthebulk.Thisdynamics is fundamentally differentfrom the P1andP3 cases,where the plume atthe bottom ofthe domain shifts only slightly fromthecenterandneverreachesthewalls.Inaddition,thebubbleplumeintheP1andP3casesbreaksshortlyafterthe bubblesenterthedomain.BasedontheindicatedscalesinFig.26,thebreakageofthebubbleplumeoccursatalmostthe sameheightsinbothcases.Theoverallinstabilityofthebubbleplumecontributestothealmostuniformdistributionofthe bubblesinthedomainabovethepointofbreakage.
ThecomparisonofthebubbledistributionintheP1andP3casesdemonstratestheinfluenceoftheheightofthedomain onthe risingbubbles.The frontofthe risingbubblesinthe P3caseconsistsofconstantly appearingsmallgroupsofgas inclusions. Thesegroupsexperience lessdragcompared tothe massivebubblefront buttheybreak fast,creatingregions withlowerpressure.Thebubblesfromtherisingfrontareattractedbytheseregions,whichleadstotheappearanceofthe
newgroups.ThesedynamicsarenotobservedintheT1case,wherethebubblesriseina“mushroom-like”manner[1],and intheP1case,wheretherisingbubblesrepresentadensecolumnmovinginthehelicalpath.
8. Conclusion
In thispapera parallel Euler-Lagrangeframework andits applicationto simulation ofdense gas-liquidflows are pre-sented.TheLagrangianphaseisresolvedbymeansoftheDSMCalgorithm,forwhichparallelizationiscarriedoutwithMPI, whereas theEulerian phase isresolved using the volume-averaged Navier-Stokesequations incorporated inthe in-house numericalframeworkFoxBerry.Thecoupledparallelcodehasbeenverifiedonseveralstandardproblemsanddemonstrates highaccuracy,whichisindependentofthenumberofusedcores.
The stochastic nature ofthe DSMC algorithmallows one to avoidlimitationsof theconventional deterministic (DBM) methods associatedwithparallelizationfordistributedmemoryplatforms. Thus,theDSMC methoddemonstratesa linear scaling up to 768 tested cores for 8
.
3·
105 bubbles uniformly distributed across the domain. In the case of non-evendistribution ofbubbles,theDSMCalgorithmallows onetosimulate2
.
3·
105 gasentitiesusing384coresandmaintaining39% parallelefficiency.
Thedevelopedalgorithmsallow, forthefirsttime,tosimulateandanalyzeatallsquarecross-sectionedbubblecolumn usingfullfour-waycouplingoftheEulerianandLagrangianphases.Theresultsdemonstrateapronouncedsplitofthe do-mainintwosections.Inthebottomsection,thebubbleplumeoccupiesonlythecentralpartofthedomainanddetermines the overall dynamics ofthe Eulerianphase. Close to thecenter ofthe domain theliquid demonstrates ahigh rotational motion,whichisrelatedtotheswirlingbehaviour ofthebubbleplume.Inthetopsectiontheplumebreaks,whichleadsto amoreuniformdistributionofbubblesinthebulkandtheappearanceofalargeamountofunstableeddies.Thissplitting isnotobservedinasmallerscalebubblecolumn,wherethebubbleplumeremainspronouncedalongthewholeheightof thedomainandthedynamicsoftheplumeisgovernedbythelarge-scaleturbulentstructures.
Declarationofcompetinginterest
Theauthorsdeclarethattheyhavenoknownconflictsofinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.
Acknowledgements
This work is part of the TOP grant research programme “First principles based multi-scale modeling of transport in reactive three phase flows”withproject number716.014.001, financed by theNetherlandsOrganisation forScientific Re-search(NWO).Also,thisworkwassupportedbytheNetherlandsCenterforMultiscaleCatalyticEnergyConversion(MCEC), which isanNWO Gravitationprogramme fundedbytheMinistryofEducation,CultureandScienceofthegovernmentof the Netherlands.The authors also thankSURF SARA (www.surfsara .nl) andNWO forthe support in using the Cartesius supercomputer.
Appendix A. Domaindecomposition AlgorithmA.1Full3Ddomaindecomposition.
Input:N - number ofcellsinthedomainineachdirection,p -numberofprocesses 1: ξ = {round(√3p),round(p/ξ2
1),floor(p/(ξ1ξ2))} Basicnumberofsub-domainsineachdirection
2: d= N ξ Numberofsub-elementsineachsub-domain 3: Performafull3Ddecompositionofthedomainusing ξandd
4: s=k3=1ξk Numberofdecomposeddomains
5: r= |s−p| Remainingsub-domains
6: if r=0 then
7: χ = {round(√r),r/χ1} Remainingnumberofpseudo-2Dsub-domains
8: q= N χ Numberofsub-elementsineachsub-domain 9: η= ξ3d3 Startingindexin3rddirectionforpseudo-2Ddecomposition
10: Performpseudo-2Ddecompositionoftheremainingdomainusingχ andq, startingfromη
11: Buildmapsbetweenhaloandrealcells
12: end if
Appendix B. Momentumerrorallprocesses
Fig. B.27. Cumulativemomentainthethreedirectionscollectedinthehaloregionsofthecellsbelongingtotheneighbouringprocessesoftheremaining processids.Note:CI→CumulativeImpulseandSI→SumoftheImpulsescollectedinthehalocellsoftherespectiveids.
References
[1] D.Darmana,N.G.Deen,J.A.M.Kuipers,ParallelizationofanEuler-Lagrangemodelusingmixeddomaindecompositionandamirrordomaintechnique: applicationtodispersedgasliquidtwo-phaseflow,J.Comput.Phys.220 (1)(2006)216–248,https://doi.org/10.1016/j.jcp.2006.05.011.
[2] P.Ouro,B.Fraga,U.Lopez-Novoa,T.Stoesser,ScalabilityofanEulerian-Lagrangianlarge-eddysimulationsolverwithhybridMPI/OpenMPparallelisation, Comput.Fluids179(2019)123–136,https://doi.org/10.1016/j.compfluid.2018.10.013.
[3] A.D.Rakotonirina,A.Wachs,Grains3D,aflexibleDEMapproachforparticlesofarbitraryconvexshape- PartII:parallelimplementationandscalable performance,PowderTechnol.324(2018)18–35,https://doi.org/10.1016/j.powtec.2017.10.033.
[4] G.Pozzetti,X.Besseron,A.Rousset,B.Peters,Aco-locatedpartitionsstrategyforparallelCFD–DEMcouplings,Adv.PowderTechnol.(2018),https:// doi.org/10.1016/j.apt.2018.08.025.
[5] Y.Tian,S.Zhang,P.Lin,Q.Yang,G.Yang,L.Yang,Implementingdiscreteelementmethodforlarge-scalesimulationofparticlesonmultipleGPUs, Comput.Chem.Eng.104(2017)231–240,https://doi.org/10.1016/j.compchemeng.2017.04.019.
[6] V.Spandan,V.Meschini,R.Ostilla-Mónico,D.Lohse,G.Querzoli,M.D.deTullio,R.Verzicco,Aparallelinteractionpotentialapproachcoupledwith theimmersedboundarymethodforfullyresolvedsimulationsofdeformableinterfacesandmembranes,J.Comput.Phys.348(2017)567–590,https:// doi.org/10.1016/j.jcp.2017.07.036.
[7] S.Wang,K.Luo,S.Yang,C.Hu,J.Fan,ParallelLES-DEMsimulationofdenseflowsinfluidizedbeds,Appl.Therm.Eng.111(2017)1523–1535,https:// doi.org/10.1016/j.applthermaleng.2016.07.161.