Faculty of Computer Science, Electronics, and Telecommunications D

(1)

AGH

University of Science and Technology in Krakow

Faculty of Computer Science, Electronics, and Telecommunications D

EPARTMENT OF

C

OMPUTER

S

CIENCE

P H .D. T HESIS

A DRIAN K ŁUSEK

U SING SUPERMODELING IN COMPUTER SIMULATION OF CANCER DYNAMICS

S

UPERVISOR

:

Witold Dzwinel Ph.D Prof.

A

UXILIARY

S

UPERVISOR

: Paweł Topa Ph.D Prof.

Krakow 2020

(2)

Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

Wydział Informatyki, Elektroniki i Telekomunikacji K

ATEDRA

I

NFORMATYKI

R ^OZPRAWA D ^OKTORSKA

A ^DRIAN K ^ŁUSEK

W YKORZYSTANIE SUPERMODELOWANIA

W KOMPUTEROWEJ SYMULACJI DYNAMIKI NOWOTWORU

P

ROMOTOR

:

prof. dr hab. in˙z. Witold Dzwinel

P

ROMOTOR POMOCNICZY

: dr hab. in˙z. Paweł Topa

Kraków 2020

(3)

Copyright c 2020 Adrian Kłusek

AGH UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE

Declaration

I hereby declare that the work in this thesis is my own original work except where indicated in the text.

Various parts of this work were supported by:

W National Science Center under Grant No. 2013/10/M/ST6/00531 HARMONIA: ’Multi-scale model of tumor dynamics as a key component of the system for optimal anti-cancer therapy,’

(Principal Investigator: Professor Witold Dzwinel);

W National Science Center under Grant No. 2016/21/B/ST6/01539 OPUS: ’Fast isogeometric finite element method solver as a key component in supermodeling of cancer proliferation,’

(Principal Investigator: Professor Maciej Paszy´nski);

and

the Dean Projects:

W 15.11.230.394: ’Efektywne algorytmy predykcji dynamiki przykładowych systemów:

dyskretnego oraz ci ˛agłego, zrealizowane w ´srodowisku GPU/CUDA’, (Principal Investigator:

Adrian Kłusek).

August 2020

(4)

Acronyms

2D Two-Dimensional.

3D Three-Dimensional.

ABC Approximate Bayesian Computation.

ABS-SMC Approximated Bayesian Computation – Sequential Monte-Carlo.

AGH AGH University of Science and Technology.

AWESUMM Adaptative Wavelet Environment for in Silico Universal Multi-scale Modeling.

BM Basement Membrane.

CA Cellular Automata.

CFD Computational Fluid Dynamics.

CPU Central Processing Unit.

CS Complex Systems.

CT Computer Tomography.

CUDA Compute Unified Device Architecture.

DA Data Assimilation.

DEVS Discrete Event System Specification.

DNA Deoxyribonucleic Acid.

DNS Direct Numerical Simulation.

DSMC Direct Simulation Monte-Carlo.

ECM Extracellular Matrix.

EMS Earth Modeling Systems.

FDM Finite Difference Method.

FEM Finite Element Method.

GPU General Processing Unit.

GT Ground Truth.

HIF-1α Hypoxia-inducible Factor 1-alpha.

IGA Isogeometric Analysis.

(5)

Acronyms 6

LBG Lattice Boltzmann Gas.

MD Molecular Dynamics.

MMPs Matrix Metalloproteinas.

MSc Master’s Thesis.

MTD Mean Tumor Diameter.

NCN National Science Center.

ODE Ordinary Differential Equation.

OPAL Occam Plausibility Algorithm.

PDE Partial Differential Equation.

PI Principal Investigator.

RC Reservoir Computing.

SIMT Single-Instruction Multiple Threads.

SOC Self-Organized Criticality.

SPH Smoothed Particle Dynamic.

TAF Tumor Angiogenic Factor.

UV Ultraviolet.

VEGF Vascular Endothelial Growth Factor.

A. Kłusek Using supermodeling in computer simulation of cancer dynamics

(6)

Acronyms 7

USING SUPERMODELING IN COMPUTER SIMULATION OF CANCER DYNAMICS

Adrian Kłusek

Abstract

Data assimilation (DA) is a key procedure that synchronizes a computer model with real observations. However, in the case of complex and multi-scale system modeling (such as a tumor evolution simulation), the solution space can expand exponentially with the number of parameters. This results in an unacceptably large computational overhead, which currently makes the use of tumor modeling an unrealistic dream as a prognostic tool in anticancer therapies.

However, we believe that, by having a sufficiently fast model of cancer evolution, remission/degradation (due to anti-cancer therapy), and finally its recurrence, we could develop a prediction/correction scheme similar to that of weather/climate prediction that could be employed for supporting the decision processes in anti-cancer therapies.

This cancer model should not only be quite computationally efficient but also sufficiently accurate in simulating the main mesoscopic (observable) phenomena of cancer evolution. On the other hand, it should allows for the fast data-assimilation process to a model that must smoothly follow the incoming real diagnostic data. To this end, we first developed (1) a 3D numerical heterogeneous model of cancer evolution that describes the main processes of its proliferation and degradation, (2) the boundary and initial conditions for modeling melanoma cancer, (3) the CPU implementation of the model by using a few numerical engines (based on both the finite difference [FDM] and finite element [FEM] methods), and (4) GPU implementations of the tumor models for these solvers.

Moreover, we devised a novel and faster data-assimilation scheme that can overcome the curse of dimensionality and model overparameterization. To this end, we defined the supermodeling of a tumor as a kind of ensembling scheme that consists of a few sub-models representing various instances of a baseline cancer model. The sub-model parameters are fixed and selected with respect to expert knowledge. The sub-models are synchronized by one the most sensitive dynamical variable of a tumor model. Therefore, assuming that the supermodel consists of only a few coupled sub-models, the number of parameters in a baseline tumor model can greatly outnumber the quantity of the coupling factors in the supermodel (in our case, 34 and 6, respectively). This makes the supermodel the second level of abstraction of a model of cancer with a small number of latent parameters (the coupling factors of the sub-models) that can be matched to real observations in a relatively short time. In this thesis, we demonstrate that the supermodeling of cancer can be an interesting modeling paradigm that can be applied in the future in both predictive oncology and as a generic data-assimilation meta-procedure.

(7)

Acronyms 8

WYKORZYSTANIE SUPERMODELOWANIA W KOMPUTEROWEJ SYMULACJI DYNAMIKI NOWOTWORU

Adrian Kłusek

Streszczenie

Asymilacja danych (DA) to kluczowa procedura synchronizuj ˛aca model komputerowy z rzeczywistymi obserwac- jami. Jednak w przypadku modelowania zło˙zonych i wieloskalowych systemów, takich jak symulacja ewolucji nowotworu, ogromna liczba parametrów mo˙ze uniemo˙zliwić ich dopasowanie do danych w realistyczny czasie, potrzebnym do okre´slenia metody leczenia i oceny jej rokowań. Pojemna, rosn ˛aca wykładniczo przestrzeń parametrów, powoduje niedopuszczalnie du˙zy narzut obliczeniowy, który sprawia, ˙ze obecnie u˙zywane modelowania dynamiki nowotworów jako narz˛edzia prognostycznego w terapiach przeciwnowotworowych jest niere- alne. Uwa˙zamy jednak, ˙ze maj ˛ac szybki obliczeniowo model ewolucji raka, jego zaniku (z powodu leczenia lekami/radioterapi ˛a) i wreszcie jego nawrotu, mo˙zemy opracować schemat typu prognoza/korekta, podobny do tego wykorzystywanego w przewidywaniu pogody/klimatu, który mo˙zna zastosować w celu wspierania pro- cesów decyzyjnych w terapiach przeciwnowotworowych w onkologii predykcyjnej. Ten model raka powinien być nie tylko bardzo wydajny obliczeniowo, ale wystarczaj ˛aco dokładny, aby umo˙zliwić symulacj˛e głównych mezoskopowych (obserwowalnych) procesów towarzysz ˛acych ewolucji raka. Z drugiej strony powinien pozwolić na szybk ˛a adaptacj˛e danych do modelu, która powinna umo˙zliwić z kolei płynne dostosowanie modelu do kole- jnych obserwacji. W tym celu najpierw opracowali´smy: (1) numeryczny, heterogeniczny model ewolucji raka w 3D, który opisuje główne procesy jego wzrostu i zaniku, (2) warunki brzegowe i pocz ˛atkowe dla modelowania czerniaka zło´sliwego, (3) implementacj˛e modelu raka za pomoc ˛a kilku silników numerycznych (FDM i FEM) na procesorach wielow ˛atkowych, (4) implementacje modeli nowotworów dla tych solwerów w ´srodowisku procesora graficznego (GPU). Ponadto opracowali´smy nowatorski i szybki schemat asymilacji danych, który mo˙ze rozwi ˛azać problem przekleństwa wymiarowo´sci w zło˙zonym modelu raka opisywanym wieloma parametrami. W tym celu definiujemy supermodel guza jako rodzaj schematu ł ˛aczenia podmodeli bazowych, który składa si˛e z kilku ró˙znie sparametryzowanych podmodeli. Parametry s ˛a ustalone w pod-modelach na podstawie wiedzy eksperckiej. Pod- modele s ˛a zsynchronizowane poprzez tylko jedn ˛a, najbardziej wra˙zliw ˛a zmienn ˛a dynamiczn ˛a modelu nowotworu i nawzajem ze sob ˛a przez t ˛a zmienn ˛a sprz˛e˙zone. Dlatego, zakładaj ˛ac ˙ze supermodel to kilka poł ˛aczonych podmodeli, liczba parametrów w podstawowym modelu guza mo˙ze znacznie przewy˙zszyć liczb˛e czynników sprz˛e- gania w supermodelu (w naszym przypadku dla trzech pod-modeli, ten stosunek to odpowiednio 34:6). Mo˙zna zatem potraktować supermodel jako drugi poziom abstrakcji podstwowego modelu raka z mał ˛a liczb ˛a ukrytych parametrów (t.j. współczynników sprz˛egania podmodeli), które mo˙zna w stosunkowo krótkim czasie dopasować do rzeczywistych obserwacji. W tej pracy pokazuj˛e, ˙ze supermodeling raka mo˙ze być interesuj ˛acym paradygmatem modelowania, który mo˙zna zastosować w przyszło´sci w onkologii predykcyjnej. Co wi˛ecej mo˙ze stanowić meta- procedur˛e w procesie asymilacji danych do ka˙zdego zło˙zonego i wieloparametrycznego modelu komputerowego.

(8)

Acronyms 9

Acknowledgements

My deep gratitude goes first to my supervisor, Professor Dr Witold Dzwinel, who expertly guided me through my MSc and PhD study education and who shared the excitement of more than six years of discovery. His enthusiasm kept me constantly engaged with my research. I would like to thank the assistant supervisor, Professor Dr Paweł Topa for tight cooperation and precious advices and opinions about this thesis. I cannot forget about my friends who contributed in various ways to my research: Gregory Duane from University of Bergen, Keshav Pingali (University of Texas at Austin), Oleg V. Vasilyev (University of Colorado at Boulder), Maciej Paszy´nski, Marcin Kurdziel, Aleksander Byrski, Wojciech Turek and Marcin Ło´s (all from Department of Computer Science AGH).

Finally, I acknowledge my fiancée, Anna, who was supportive like no one else in the moments of resignation and always motivated me to finish what I started. Above all, I am indebted to my family whose value to me only grows with age.

(9)

1. Introduction

1.1. Motivation

Data assimilation is a key component of computer simulation that synchronizes a model with a real system based on limited observations. However, the classical data-assimilation procedures are of limited use due to the exponential expansion of the solution space with the number of parameters. Particularly, for the multi-scale computer models of complex systems such as climate models, biological systems, and collective behavior models, matching the values of many parameters connected to various spatio-temporal scales of the modeled phenomenon to the real data (the ground truth – GT) is necessary. The large number of model parameters causes a problem that is called the "curse of dimensionality", which usually results in prohibitively huge inaccuracies in any prediction.

The fact that the computational time of a single simulation of complex 3D models can be substantial, the prospects of obtaining a satisfying solution deteriorate further for this very hard ill-posed inverse problem. This is one of the important reasons that the direct application of computer simulation in the prediction of cancer dynamics is still underestimated by clinicians.

Consequently, the clinical application of computer simulation for predicting tumor dynamics in personalized anti-cancer therapies is still in its infancy despite the 50-year history of cancer modeling. In my opinion, this poor development is caused mainly by the inappropriate conceptual approaches represented by the state-of-the-art research in tumor modeling; e.g., the lack of both the proper data-assimilation approaches and tumor models that are accurate, reliable, and computationally realistic. In this dissertation, we propose a novel idea for a prognostic system for personalized anti-tumor therapy. We present its computer implementation and discuss the experiments.

The system is based on the recent concept of supermodeling, which is employed in climate/weather modeling. In this thesis, we demonstrate that supermodeling can be treated as a new data-assimilation paradigm that can help make the prediction of tumor dynamics possible on the basis of computer modeling in the nearest future.

1.2. Dissertation theses and main goals of research

The supermodeling paradigm, where the "supermodel" consists of a few coupled instances of imperfect baseline models (sub-models), was first used by climatologists for the better approximation and tuning of weather fore- casting and climate modeling [1]. Here, we demonstrate that a supermodel can be treated as a general paradigm for the modeling and simulation of complex multiple-parameter systems and as an efficient metaprocedure for data assimilation. The problem of tumor dynamics modeling is a non-trivial example of its application.

The generic thesis of this dissertation is as follows:

Supermodeling, in which different "sub-models" partially synchronize with one another, is the next abstraction layer of the data-assimilation procedure, which allows for a radical increase in its performance in the context of computational time and the number of learning parameters.

12

(12)

1.2. Dissertation theses and main goals of research 13

Here is a more specific thesis:

Supermodeling allows for efficient data assimilation to a multiple-parameter complex cancer model, which makes the prediction of its dynamics (i.e., growth, remission, and recurrence) possible in a realistic amount of computational time.

To be more precise, "a radical increase in performance" means a several-times reduction in the number of parameters that must be adapted, while "a realistic amount of computational time" means no more than a few hours of wall clock time on a modern-day mid-range server.

To support these theses, two main goals must be achieved:

1) The development and implementation of an ultra-fast yet still complex 3D model of cancer (i.e., incorpo- rating the main phenomena of cancer growth/remission/recurrence as well as realistic boundary and initial conditions);

2) The design, development, and implementation of a novel efficient data-assimilation strategy based on the supermodel.

This dissertation is a substantial extension of the master’s thesis defended by Adrian Kłusek, entitled Devel- opment of the supermodel of tumor proliferation, AGH University of Science and Technology (defended in June, 2016) [2]. The following extensions are reported here:

1. We extended and generalized a heterogeneous (discrete-continuous) tumor model and its layout to enable the simulation of not only its growth but also the cancer’s remission and recurrence due to anticancer therapy [3].

2. We radically improved the classical numerical model of a tumor (Finite difference method – FDM) implementation [4]. The model was parallelized and tuned to a multi-GPU computer architecture [3, 5]. In this way, we obtained a generic and extremely efficient computer model of tumor dynamics that can be matched to a variety of solid tumor types. The experiments with melanoma dynamics described in the scope of the MSc thesis [6–8] were performed for much larger tumors with dense vascularization.

3. We tested a competitive modern numerical solver (e.g., finite element methods [9]) for tumor evaluation.

The author actively participated in both matching the tumor model to the isogeometric solver as well as overseeing its parallel (GPU) implementation.

4. The author developed, implemented, tested, and visualized the data-assimilation (DA) strategy based on the supermodel.

This dissertation was realized in the scope of two NCN (National Science Center) research projects:

1. Harmonia call: Project No. 2013/10/M/ST6/00531 (2013-2016) Multi-scale model of tumor dynamics as a key component of the system for optimal anti-cancer therapy (PI: Professor Witold Dzwinel [AGH], Profes- sor Oleg Vasilyev [The University of Colorado Boulder])

2. Opus call: Project No. 2016/21/B/ST6/01539 (2016-2019) Fast isogeometric FEM solver as a key component in supermodeling of cancer proliferation. (PI: Professor Maciej Paszy´nski)

For dissertation consistency, we also provide a brief presentation of the results obtained by the research groups that realized the grants above. Our contribution to the reported publications is clearly explained. The thesis presents the author’s contribution in the domain of computer science recognized by the Polish National Science Center as Scientific computing, simulation, and modeling tools.

(13)

1.3. Structure of dissertation 14

1.3. Structure of dissertation

In the second chapter of this dissertation, we define the supermodel paradigm in the context of its former applications in climate modeling. We present its informal explanation, two mathematical formulations, and the state-of-the-art training methods. As a proof of concept, we present our simulation results obtained for the toy models known as Lorenz63 [10] and disturbed Lorenz63, where the sub-models differ in the values of the parameters. We discuss the results of attractor and trajectory learning for various numbers of sub-models and sizes of the training data. We concluded that (1) the number of sub-models cannot be large (just a few), (2) the supermodel must give astonishingly good results for trajectory learning (not only attractor learning, as was considered earlier), and (3) one should be very careful in selecting the training data size and the sub-model parameters due to the under- and over-fitting problems and local minimum issues of the loss function minimization. We also discuss the difference between the complex multi-scale models and the supermodel.

In the third chapter, we familiarize the reader with the basic knowledge of cancer biology. This may help the reader understand the complex problems that are connected to cancer modeling. In the subsequent chapter, we briefly present the various cancer modeling approaches. Consequently, we present the prediction/correction concept of a prognostic system for personalized anti-tumor therapy, and we discuss the role and necessity of developing a computationally efficient cancer model and data-assimilation scheme.

Next, we demonstrate our mathematical 3D single-phase cancer models and the model of their vascularization (developed earlier) as well as its extensions, which allow for (1) the simulation of larger tumors and (2) a tumor’s dynamics in the presence of anti-cancer drugs. We describe our classical FDM solver and the details of its implementation in a multi-GPU environment. We present timings and snapshots from the simulation of cancer dynamics, including a melanoma tumor. We conclude that the model is sufficiently complex and fast enough to be used as a baseline model for sub-model and supermodel construction. We compare this model to other state-of- the-art solvers, and we show the potential of its further speed-up by using an isogeometric FEM approach. Then, we describe the obtained results by using an isogeometric model. We also discuss the important factor of our data- assimilation strategy; namely, a sensitivity analysis of the tumor model. This procedure allows us to find the most sensitive dynamical variables that can be used for sub-model coupling, thus decreasing the supermodel parameters.

Unlike the original application of the supermodel in climatology (which is used to increase climate forecast accuracy by the tight coupling of a few very complex heterogeneous climate models), we propose to employ the supermodel in a much broader context in the fifth chapter. Herein, we formulate the hypothesis that a loosely coupled supermodel can be used as a second abstraction layer in the data-assimilation procedure. The sub-models can represent both arbitrary selected baseline model instances or various local minimums of the loss function (i.e., the error between the ground truth and the model predictions) in the parameter space; e.g., found during the initial phases of sub-model generation (using the Bayesian data-assimilation algorithm, for example). We show that they can be coupled only through the most sensitive dynamical variable, which substantially decreases the number of free parameters that should be assimilated to the ground truth.

Unlike in a classical data-assimilation process where all of the model parameters are matched (e.g., by using the Bayessian-, variational-, or Kalman filter-based algorithms [11–13] to real [ground truth] data), we demonstrate that the training procedure will be used for matching only a few coupling factors in the supermodel, which can be treated as latent model parameters. We discuss the advantages as well as the problems we encountered when using this concept for data assimilation, both for toy dynamical systems and complex multi-scale models.

In support of this hypothesis, we present the preliminary results of a supermodel of melanoma growth (the results obtained earlier in [5]). We also simulate cancer growth/remission/recurrence that mimics the "in vitro"

conditions in which we can observe its dynamics in isolated tissue.

The conclusions and future work are discussed in the last chapter.

(14)

2. Supermodels in simulation of complex systems

2.1. Methods and limits of computer modeling

Computer modeling and simulation

Computer modeling and simulation is an interdisciplinary branch of science, which was one of the principal stimulating factors of the development of computer science as a new discipline [14]. In general, it is the third research paradigm after experiment and theory (data-intensive science is the fourth one) [15–17]. Although there are many ways of understanding the terms "model" and "simulation" (e.g., they can have very different meanings in medicine and physics), we herein concentrate on the methods and tools whose role is to explain and predict a baseline process evolution on the basis of those rules extracted from the systematized knowledge. This type of modeling is vastly different from so-called surrogate models [16], which are mainly data-driven; its role is the fast approximation of the considered process and the prediction of its future dynamics (e.g., in the case of solving inverse problems) without an explanation factor. These two types of modeling paradigms can be used separately.

Although, modern explanatory knowledge-based models of real systems must cooperate with box data-driven paradigms for the purpose of data assimilation and supplementing time-consuming numerical procedures (e.g., mesh generation).

Below, we present a slightly extended picture (as compared to its original version from Figure 2.1) from the paper written by the supervisor of this thesis [18], which displays the taxonomy of the major simulation paradigms and methods. Despite the fact that the above-mentioned paper was written 21 years ago and a plethora of new methods and algorithms have since been devised, the main paradigms remain the same. The gap between discrete models and continuous ones was filled at the end of the 90’s by the dissipative particle dynamics method [19, 20]

and its clones [17] as well as the specialized Lattice Boltzmann gas (LBG) paradigm [21].

To solve the issue of modeling the physical phenomena in various spatio-temporal scales, the scheme of the coarse graining of a particle method (such as molecular dynamics) [22] was implemented. In the new methods that encompass larger and larger spatio-temporal scales, the definition of a particle changes with the following scale:

elementary particle; atom; molecule; collection of molecules; clump of fluid; droplet; individual (crowd dynamics [23]); astronomical object. Due to the coarse graining of interacting objects, the interactions between particles also evolved from elementary physical interactions to heuristic pairwise forces. Simultaneously, the numerical schemes used for solving the Newtonian equations were greatly simplified, allowing us to increase the number of particles and, simultaneously, extend the volume of a simulated piece of matter [22, 24]. In the extreme case, both the time-step and translation of the particles are normalized to {0,1}, and the possible movements of particles after a collision are limited to a few possible directions on the lattice (lattice gas methods [21]). In the other approximated methods, both the collisions and particle translations are randomized (the Direct Simulation Monte Carlo method – DSMC [25]). Recapitulated in Stephen Wolfram’s famous book "A New Kind of Science" [26], the new paradigm of computer simulation (namely, cellular automata [CA]) has become popular in simulating a variety of real-world phenomena on a square lattice (2D and 3D) by using simple local automata rules. As shown in [27], the CA

(15)

2.1. Methods and limits of computer modeling 16

Figure 2.1: Modeling paradigms in various spatio-temporal scales (on basis of [18])

paradigm can be efficiently hybridized with the dynamic particle method (the particle automata method – PAM), allowing for the simulation of multi-scale phenomena such as tumor evolution [28].

On the other hand, continuous methods are based on solving partial differential equations (PDEs) from computational physics such as diffusion-reaction equations and Navier-Stokes or Maxwell equations. These are used to calculate the continuous density field(s) of a particular physical quantity on a discrete mesh. Because this is a huge discipline that is comprised of fluid dynamics, computational mechanics, and many other engineering fields, we will not present its detailed taxonomy here [29]. In the course of this thesis, we will use a few numerical approaches to simulate the evolution of cancer based on two types of numerical solvers: the finite difference and finite element methods (FDM and FEM, respectively). To avoid numerical problems with complex boundary conditions (multi- phase flows, flows through granular media, the simulation of suspensions, breaking, cracking, crushing, etc.) and enable us to simulate the interactions between the fluid and solid phases, the Smoothed Particle Dynamics (SPH) method was devised [30]. Moreover, to focus the attention on the interface between the various phases, SPH can be combined [31] with the powerful level set method [32, 33].

Apart from discrete and continuous paradigms, the complex nature of simulated phenomena and intricate boundary conditions enforces the development of heterogeneous continuous-discrete simulation methods. For example, an expanding tumor exerts pressure on the adjoining tissue (both can be represented as Cahn-Hilliard fluids [34]), simultaneously remodeling the discrete structure of the tissue vascularization network. Typical hybrid models use the mean-field particle approximation (e.g., the particle-in-cell algorithm [35]), the fluidization of discrete spheres in the direct numerical simulation (DNS) [36], and many other specific algorithms that couple density fields with discrete objects [37]. During the course of this thesis, we will present our approach to this issue by modeling the interactions of density fields (e.g., the densities of tumor cells) with the discrete structure of a vascular network.

(16)

2.1. Methods and limits of computer modeling 17

Apart from the modeling algorithms, methods, and simulation paradigms, many modern modeling and simulation meta-paradigms exist, such as the following:

a) DEVS: Discrete Event System Specification is a timed-event system. It is a modular and hierarchical formal- ism for modeling and analyzing general discrete event systems. It can be described by state transition tables, continuous formalisms (such as differential equations), and hybrid continuous-state and discrete-event systems [38];

b) Bond graphs: graphical representations of physical dynamic systems. This allows for the conversion of a system into a state-space representation. It is similar to a block diagram or signal-flow graph, with the major difference being that the arcs in bond graphs represent the bi-directional exchange of physical energy, while those in block diagrams and signal-flow graphs represent a uni-directional flow of information [39].

Other simulation tools include modern event base languages such as Modelica and plenty of multipurpose (graphical, textual, specific) and domain-specific modeling languages [40].

Limits

According to the physical form of the Church–Turing thesis formulated by David Deutsch in 1985 [41], a universal virtual reality generator (computing device) can simulate every physical process. However, some processes can be undecidable and computationally irreducible, which means they are unable to shortcut a program. Thus, the limits of the modeling and simulations are determined by (1) the quality of the virtual reality generator and (2) the degree of irreducibility of the simulated problem [42].

This means that the former represents a technological limit imposed on the power of current computer systems;

i.e., computational speed, memory, and arithmetic precision. They in turn decide on the spatio-temporal size, complexity, and accuracy of the simulated system. Particularly, the technological limit determines the maximal number of simulated particles and the number of finite elements for discrete and continuous simulation methods, respectively. High-performance simulations in computational molecular biology (protein functions) and weather/climate predictions require the top computational power to simulate billions of atoms and finite elements [43]. However, even the largest molecular dynamics simulation employing trillions of particles (i.e.,10¹²) is able to simulate a cube of matter with 5-micrometer-long edges. Similarly, the maximal resolution of the grid in a global weather simulation is about1 km²(now, even 100 meters for the most modern Earth-modeling systems [EMS]) with an Earth surface of about0.5 billion km²[44]. Of course, we can still use the approximations discussed in the previous section (coarse graining) or tricks such as using surrogate models [45]. In this way, we can increase the size of the modeled objects or control the spatial and temporal resolutions in dedicated areas and moments of time, respectively. For these simulations, we can use a supercomputer [46] as well as specially designed FPGAs [47].

The second limit is formal (yet more demotivating). This means that the final state of a given complex phenomenon cannot be formally decided before the end of the calculations (the "halting" problem). The first reason is the chaotic behavior of physical systems that can even be found in toy models (such as logistic maps) [48]. A plethora of physical systems exist whose behavior strongly depends on the initial conditions. A small fluctuation in the initial conditions can radically change the system dynamics. In the scope of this thesis, we try to predict the time evolution of simple dynamical systems such as the Lorenz63 system. It appears that we are able to predict both their local behavior (in terms of trajectory; e.g., weather prediction) and global behavior (in terms of the system’s strange attractor; e.g., climate dynamics). However, we are not able to follow the system trajectory exactly during an optimally long time interval.

Moreover, dynamical systems can be intrinsically complex. A system becomes complex if it consists of multiple objects, and its behavior emerges from these mutual object interactions. This emergent behavior cannot be deducted from a single object description. However, a dynamical system’s structure can be even more complicated. The term

(17)

2.2. Data assimilation 18

"system of complex systems" relates to complex systems (CS) that are constructed of mutually interacting CSs in multiple spatio-temporal scales. Notwithstanding, complex systems in various scales can be totally different. For example, the blood flow in large blood arteries can be simulated by fluid dynamic equations, where the blood is treated as a homogeneous (non-Newtonian) liquid [49, 50], while the blood in micro-capillaries can be represented as a multi-phase fluid with scattered discrete elements (red blood cells, white blood cells, fibrinogens, etc.). It may happen that, in order to explain a phenotypic response in a tissue, the phenomena on a genotype molecular level or protein level should be modeled. This type of heterogeneity is connected with multiphysics [51]; i.e., simulations that involve multiple physical models or multiple simultaneous physical phenomena. The computational models and simulations that encompass many spatio-temporal scales are called multi-scale models [52, 53]. At least three important factors make multi-scaling extremely difficult:

a) Time-stepping is defined by the fastest modes connected to the "most" microscopic scales. This considerably influences the efficiency of the entire model;

b) Multi-scale models are usually overparameterized, which makes data assimilation difficult and the model sus- ceptible to overfitting;

c) Models from various scales must be glued together (scale bridging [53, 54]), which is usually a very time- consuming and unreliable procedure.

If fluctuations in one of the neighboring scales propagate through the following scales with the time-stepping defined for each of them, multi-scale models can reflect the reality well and (in the case of huge available computing power) can have a high explanation value. However, in the case of self-organized critical phenomena [27] where a microscopic event can have a fast and direct impact on the higher spatio-temporal scales, multi-scale modeling might not help. In self-organized criticality (SOC), we enforce that the modeled system has the ability to be tuned itself without any help from the outside by parameter tuning. This SOC system without any additional schemes may generate even better results. For example, SOC utilization may improve graph coloring [55], which is one of the well-known computational optimization problems [56]. The authors of [55] present the fact that SOC reduces the chance of stucking in the local minimum by finding effective patterns in the random search used in the optimization.

A very important limit of modeling and simulation resulting from these discussed above is the inverse problem of matching the model to reality through the data-assimilation procedure. We discuss this issue in the following sections.

2.2. Data assimilation

The classical data-assimilation (DA) procedure, which synchronizes a computer model with a real phenomenon through a set of observations, is an ill-posed inverse problem that suffers from the curse of dimensionality issue when used to estimate the parameters of a complex model. Consequently, the time complexity of DA methods grows exponentially with the number of parameters. This makes them useless in the face of multi-scale and complex models such as models of tumor evolution and climate and weather dynamics (e.g., [3, 57]).

Popular DA data procedures are formulated on the basis of variational and Bayesian frameworks [58]. Data- assimilation algorithms can be divided onto two main groups: sequential Monte Carlo-based methods (e.g., [59]), and Kalman filter-based methods (e.g., [60]) (Kalman filtering is the equivalent of the popular 4D-Var algorithm [61] for a perfect model). These algorithms have formed the core of many other DA procedures (e.g., [62]).

Over the years, the majority of research in this direction has been primarily focused on improving the prediction accuracy of tasks ranging from small-scale problems (e.g., [63]) to weather prediction [64]. Recently, more and

(18)

more studies have attempted to speed up data-assimilation methods to enable their use with extremely complex multi-scale models (e.g., [3,57,65]). For Kalman filter-based data assimilation, the studies [3] have proposed faster implementations of the algorithms by using numerical or programming tricks and hybridization with the Bayesian and Monte Carlo approaches (e.g., [66]). Nevertheless, the greatest issue that arises with sequential Monte Carlo- based DA methods (e.g, the approximate Bayesian computation with sequential Monte Carlo [ABC-SMC] algorithm [67]) is the requirement that a very large number of simulations need to be performed. Additionally, the number of required simulations increases exponentially with the number of model parameters (see, e.g., [68]).

To improve the classical DA schemes, current studies usually introduce either small algorithmic nuances (i.e., [69, 70]) or algorithm implementations that support multi-thread and multiprocessor massive parallelization (i.e., [65]). However, the aforementioned optimization approaches do not change the basic paradigms nor radically improve the DA performance.

In the era of deep learning, formal predictive models are often replaced (or supplemented) with faster data models for which the role of data assimilation is played by the learning of a black box in estimating the parameters (e.g., neural network). In general, training a black box is a simpler procedure than data assimilation is to a formal model. An interesting data-modeling concept that is very competitive with formal models in the prediction of spatio-temporal patterns in chaotic systems is that of echo state machines [71, 72]; particularly, the reservoir computing (RC) approach [73]. No prior model based on physics or any other knowledge is used. The combination of this surrogate model with formal approaches is tempting, but it is still an unresolved issue. As mentioned at the beginning of this section, one of the biggest problems of DA is the high computational complexity that arises from the exponential grow of parameter space. In the case of multi-scale and multi-parameter models, this complexity grows much faster due to the complex shape of the loss function and the size of the population of solutions. Thus, the only direction that gives the prospect of improvement is model ensembling.

Consequently, we can have a few inaccurate but still very complex models (we can call them sub-models) of the same phenomenon. They can be heterogeneous (such as they are in climatology [1]), or they can represent various instances of the same model (e.g., differing in parameter values and the initial or boundary conditions). In the latter case, the difference in the parameter values may come from the model’s formal inaccuracies – uncertainties connected with the factors that represent the fastest spatio-temporal scales or the unknown phenomena that the model lacks. The decision fusion of the baseline model responses (employing the majority voting, average, or weighted average formulas, for example) is taken as the response of the model ensemble. Usually, this is more accurate than a single model’s outcome. The sub-models act independently, and the ensemble response depends on the fusion rule.

In contrast to regular ensembling, a supermodel represents a very different approach. The sub-models of a supermodel actively interact with one another during the entire simulation process. The behavior of the respective dynamic variables of the sub-models partly synchronize step-by-step. Simultaneously, the synchronized model is attracted to the ground truth (data). The principal difference between a model ensemble and a supermodel is that the sub-models of a supermodel are trying to become as similar to each other as possible due to the process of synchronization. In a model ensemble, the sub-models are independent; in the case of their divergence, their averaging can produce large artifacts. Therefore, a model ensemble may require more sub-models than a supermodel (as well as an additional procedure for their control). More formal arguments for the supermodel can be found in [74]. In the following section, we present the basics of a supermodel as well as two toy examples of its application.

(19)

2.2.1. Supermodeling — mathematical formulation

The supermodel technique is an idea that was first investigated in the SUMO FET European Project¹, which aimed to improve the quality of climate prediction [7, 10, 75, 76]. A supermodel is an interconnected ensemble of models that combines the strengths of individual models to create a model that outperforms all of them. This is possible because, by connecting individual models, they can synchronize or partially synchronize; through optimal connections, the systematic errors of the individual models partly cancel each other out as the models evolve. The connections among the models are "learned" from observations so as to combine the strengths of the individual models. Crucially, a supermodel combines different models as the simulation or prediction evolves. Thus, it is unique and markedly different from the standard approach of running models separately and post-processing their output.

The supermodel concept is schematically explained in Figure 2.2. As shown in [7], the couplings between the sub-models can radically change the position of the sub-model attractors in the phase space. The final attractor can be located in another domain of the phase space that represents completely different behavior than the sub-model attractors show.

Figure 2.2: Scheme explaining supermodel concept. In phase space divided into three separate volumes (I, II, III) representing various system behaviors, one can observe two trajectories (1,2); these correspond to two independent models with different parameter sets. After model coupling, two trajectories synchronize with each other and start to penetrate another phase space domain preferred by ground truth data. Finally, supermodel converges to different attractor than Trajectories 1 and 2. [7]

From a mathematical point of view, supermodeling is defined as follows. First, we assume that we have a measured reality, which we treat as a ground truth (GT). This GT information can be represented by a data set of sampled measurementsxgt(a time series of a finite length). Furthermore, we employ a set ofM imperfect models

1http://projects.knmi.nl/sumo/

(20)

(called sub-models) labeledµ, which are previously pre-trained or just hypothesized to mimic reality as best they can. Below, we present the definitions of both the model ensemble and the supermodel adapting into the same notation².

The model ensemble represents a decoupled dynamical system whose dynamics are governed by a set of ordinary/partial(parabolic) differential equations:

dxⁱ_µ

dt = f_µⁱ(xµ), (2.1)

wherexµ= (x¹_µ, ..., x^D_µ) is a Dµ-dimensional state vector of sub-modelµ, and f represents the transformation.

As the model ensemble, we define a model combined from a differently parameterized combination of previously introducedM basic models µ (for example). In general, the sub-models can also be completely heterogeneous.

For parabolic PDEs, the left part of Eq. 2.1 can be replaced with the partial time derivative. The model ensemble is defined as follows:

dxsumo

dt = Fsumo(xsumo, Wsumo), (2.2)

whereWsumoare the ensemble parameters,xsumodenotes the entire state of the ensemble model, andFsumo

is the transformation function.

In the case of the supermodel idea, the sub-models are no longer independent [77]. Here, we consider the more general "connected supermodel" approach demonstrated in Figure 2.2. Similar to the model ensemble, we assume that the supermodel consists of theM sub-models defined by Eq. 2.1; however, we introduce the couplings between them as follows:

dxⁱ_µ

dt = f_µⁱ(xµ) +X

ν

Cµν(xⁱ_µ− xⁱν), (2.3)

whereC = {Cµνⁱ } is the coupling tensor. The supermodel behavior is calculated as the average of the sub- models:

xsumo(C, t)≡ 1 M

X

µ

xµ(C, t). (2.4)

Additionally, we define the synchronization quality as follows:

eⁱ(t) = 1 lp

X

(µ,v)

1 N

k=0

X

N −1

[xⁱ_µ(k∆t)− xⁱv(k∆t)]², (2.5)

whereN is the number of samples that discretize the trajectory, lp is the number of couplings between the sub-models, and∆t is the discretization interval. As shown in [7, 75], the C tensor coefficients can be trained by using "ground truth" vectorxgtby minimizing the weighted squared errorE(C) in K following time-steps ∆t;

i.e.,

E(C) = 1 K∆t

i=1

X

K

Z ti+∆

ti

|x^sumo(C, t)− x^gt(t)|²γ^tdt. (2.6)

Error functionE(C) measures an accumulated numerical error, which includes the imperfections in the definition of the initial conditions. Discount valueγ^tis from the (0,1) interval [75, 76]; this decreases the contributing factors of the increases in the internal errors.

2www.projects.knmi.nl/sumo/

(21)

In the "connected supermodel," the values of the couplings are positive in order to stimulate the synchronization (a consensus) between the sub-models. Due to the linear connections, the negative couplings can lead to imperfections and instabilities that cause system divergence [7, 75]. The author uses two training methods for matching the coupling values. The first one is called "nudging" [10] and concerns two procedures in the model. The first procedure is called "nudging to reality"; i.e., during the training, we use an additional coupling in the real data by modifying Eq. 2.3 to the form of Eq. 2.7:

∂xⁱ_µ

∂t = f_µⁱ(xµ, t) +X

v

C_µvⁱ (xⁱ_v− xⁱµ) + Cgt(xⁱ_gt(t)− xⁱµ), (2.7)

while the process of learning is defined by Equation 2.8:

∂xⁱ_µ

∂t = a(xⁱ_ν− xⁱµ)(xⁱ_gt(t)− 1 M

X

µ

xⁱ_µ)− ε

(C_µνⁱ − C^max)²+ ε

(C_µνⁱ − δ)², (2.8) wherea is a learning rate, and ε together with_(Ci ^ε

µν−Cmax)² and_(Ci^ε

µν−δ)² are constraints on theC_µνⁱ parameters to enforce theC_µνⁱ parameters to be within range(δ, Cmax). For more details about the origin of the method and the possible values of theε parameter, see [78]. A Lyapunov function argument shows that the method converges [79].

The second training method is similar to a machine-learning task. We define a cost function (Eq. 2.6) and try to find its minimum for selected supermodel parametersC_µνⁱ that employ the conjugate gradient scheme [80]. The second one (the so-called "weighted supermodel") is very similar to the model ensemble concept [81]; this is a simplification of the "connected supermodel" [77]. The "weighted supermodel" technique does not use couplings between the sub-models. The supermodel of the whole complex system is defined by the weighted average dynamics of the individual imperfect models. This system uses weightsw_µⁱ with values greater than or equal to zero as well as conditionP

µw_µⁱ = 1. The equation for the i-th model is as follows:

∂xⁱ

∂x =X

µ

w_µⁱf_µⁱ(x). (2.9)

Unlike the model ensemble, the "weighted supermodel" uses training for matching the sub-model weights.

At the very beginning, we assume that we have an observed (real) trajectory in the phase space to which the supermodel is "attracted" as well as a starting point (that is the first observation). From the first observed point, we run each of the sub-models for several time-steps. Obtained endpoints are now threated as new starting points from which we run all of the sub-models again. This procedure is repeated many times; it can be stopped once we reach the assumed accuracy. In this process, we generate a very large number of short trajectories.

In order to maintain a reasonable number of trajectories, we choose only those trajectories that are closest to reality for a specific time point for further processing. After conducting a reasonable number of short predictions, the statistics are calculated regarding how often a given sub-model was located closest to the observed reality. This statistic is then converted to the weighting factor of the sub-model.

Summarizing, the supermodel technique can be understood as a way of extracting an additional (and not for- malized) portion of knowledge that makes the sub-models different and is hidden in the GT data. The couplings between the sub-models (latent parameters) are responsible for the knowledge exchange between the sub-models as well as between the sub-models and the data. These two sources of knowledge are shown in Figure 2.3 as macro and micro factors, respectively.

2.2.2. Supermodeling — proof of concept

Herein, we try to answer one of the basic questions of the supermodel: how does the accuracy of the short-time (weather) and long-time (climate) predictions depend on the number of sub-models and the size of the training

(22)

Figure 2.3: Scheme of supermodel and data assimilation. Models are paired with each other byCij and with data byKjlearning weights. In data assimilation, only a fewCijandKiparameters are matched. ’A’ and ’B’ represent factors of reality.

data? As toy examples of the ground truth (the reality), we use the Lorenz63 (three dynamic variables) [81] and perturbed Lorenz63 (six dynamic variables) systems [82]. The imperfect models for both consist of the coupled Lorenz63 systems with parameters that differ from those in the GT models. We can say that the macro factors (representing the basic physical phenomena) are represented by the equations, while the micro factors (associated with the specific microscopic properties of fluid) are connected with the parameters. The values of the GT parameters comprise the knowledge that the supermodel is trying to learn.

In the following sections, we briefly present the test concept and the results.

Toy dynamical systems

For evaluating the supermodel of dynamic systems from two different perspectives, we have chosen two toy examples that play the role of the ground truth. The first one, which was realized in the first test batches, is the Lorentz63 system; it is defined as follows:

∂xv

∂t = σ(yv− x^v) (2.10)

∂yv

∂t = xv(ρ− z^v)− y^v (2.11)

∂zv

∂t = xvyv− βz^v, (2.12)

with the standard parameters enumerated in Eq. 2.19. We assume that, in the sub-models (imperfect models), the parameters differ from these standard values by 30 to 40%. To simulate a more complicated system, we used the modified version of Lorenz 63 in our tests, which is defined as the visible Lorenz 63 perturbed by a hidden Lorenz 63 system [82]. The dynamics of this system are described by the following set of ODEs:

∂xv

∂t = σ(yv− x^v) + εzh (2.13)

∂yv

∂t = xv(ρ− z^v)− y^v (2.14)

(23)

∂zv

∂t = xvyv− βz^v+ δ(xh+ η) (2.15)

∂xh

∂t = σ(yh− x^h) (2.16)

∂yh

∂t = xh(ρ− z^h)− y^h (2.17)

∂zh

∂t = xhyh− βz^h (2.18)

σ = 10, ρ = 28, β = 8/3, ε = 1, δ = 5, η = 1. (2.19) This model consists of six dynamical variables and is treated as the GT. In this case, the GT is approximated by a supermodel consisting of coupled sub-models that are defined by only three ODEs (see Eqs. 2.10, 2.11, and 2.12) with the parameters differing by 30-40% from the standard Lorenz63 system.

The learning algorithm uses the conjugate gradient algorithm for finding the minimum of the function given by Eq. 2.6. We assume the appropriate maximum number of iterations of the algorithm.

Results

We performed a large number of experiments that scrutinized the predictive power of the supermodel. We compare the results that they produced to the ground truth dynamics represented by both the exact (Eqs. 2.10, 2.11, 2.12, and 2.19) and perturbed (Eqs. 2.13, 2.14, 2.15, 2.16, 2.17, 2.18, and 2.19) Lorenz63 systems. We focused on two factors that influenced the prediction accuracy: (1) the lengthK of a fragment of the training trajectory (Eq. 2.6) sampled with number of time-steps∆t; and (2) number of sub-models M . We trained the system starting from a random point (the same as for the respective tests) positioned on the Lorenz63 (classical and perturbed) attractors.

M=/K= 200 500 1,000 M=/K= 200 500 1,000

3 10.32 5.57 5.10 3 11.751 9.294 1.179

5 1.28 0.41 0.005 5 1.233 1.068 1.173

8 1.76 0.45 0.004 8 NaN NaN NaN

Table 2.1: Prediction error for supermodel approximating Lorentz63 system (left) and perturbed Lorenz63 system (right) with variousM and training trajectory lengths K∆t. Error was calculated for N predicted time-steps.

The trajectory was extrapolated by the supermodel for "left" the long-time (attractor prediction andN = 5, 000 time-steps) and "right" the short-time dynamics (trajectory prediction andN = 100, 200 time-steps). In Table 2.1, we see that the long-time prediction error decreases withK for the Lorenz63 system (left panel). However, as shown in Figure 2.4, the attractor is well-approximated forM = 5, 8, while the supermodel produces a closed cycle forM = 3, which signals a problem with the underfitting.

Consequently, the overfitting problem starts to appear forM = 8 and learning cycle K = 200 (as shown in Table 2.1 – left panel). Note that the number of parameters grows withm· M². The overfitting effect is better seen for the perturbed Lorenz63 system (Table 2.1 – right panel), where the prediction error becomes extremely large forM = 8 for K <= 1, 000. However, the correct reconstruction of the perturbed Lorenz63 system is still possible forM = 5 (K = 200, 1, 000) (see Figure 2.5 – bottom panel) and for M = 3 for two learning rounds withK = 1, 000 time-steps each (then, the error drops from 9.29 [in dimensionless units] for K = 500 to 1.18 for

(24)

Figure 2.4: Snapshots of Lorenz63 attractors and supermodel predictions: a) (M = 3, K = 30); b) (M = 3, K = 1, 000); c) (M = 5, K = 1, 000); (M = 8, K = 1, 000). Learning trajectories are same length and were sampled with different resolutions: (a)K = 30; and (b, c, d) K = 1, 000. Attractor predicted by supermodel is marked in red (N = 5, 000 time-steps), while "ground truth" attractor is shown in background. Learning phase in (b, c, d) consists of two rounds withK = 1, 000.

K = 1, 000). The unexpectedly small decrease of the prediction error for a greater number of sub-models M = 5 andK = 1, 000 results from using only one series of training in this case. Then, the learning procedure is stuck in a local minimum of the error function (see Eq. 2.6).

In Figures 2.6 and 2.7, we present the preliminary results of the trajectory prediction by using the supermodel.

In the case of short-time system evolution, we can observe in Figure 2.6 (b, d) that, forM = 3 and K = 1, 000, the trajectory predictions are very poor for both the Lorenz63 and perturbed Lorenz63 systems. They slightly improve (especially in the Lorenz63 case) assuming that the training fragments of the trajectories are of the same length, but they are sampled at a much lower resolution (K = 30) (Figure 2.6 [a, c]). However, as shown in Figure 2.7, the predictions improve greatly when the supermodel is trained on shorter fragments of trajectoriesL = K∆ = 30∆.

In the case of Lorenz63 (upper panel), we can observe excellent prediction forM = 8, and we can see that the error decreases with the number of sub-models.

As shown in Figure 2.7 (bottom panel), the situation is a bit the opposite for the perturbed Lorenz63 system.

We see that, although the total prediction error decreases withM , a supermodel consisting of fewer sub-models generalizes the trajectory shape much better than those with greaterM . Anyway, still more experiments are re-

(25)

quired in the trajectory prediction case, as it is not obvious how the prediction error will depend on the length of the learning fragments of the trajectories.

The results we obtained are still ambiguous; e.g., different error behaviors for M and K are obtained for trajectories that are located at different sites of the attractor. For example, as shown in Figure 2.6 and Figure 2.7, the long and entangled fragments of the testing trajectories produce quite different prediction errors.

Figure 2.5: Snapshots from long-time prediction for Lorenz63 (upper panels) and perturbed Lorenz63 (bottom panels). Learning trajectories are of different lengths (K = 200 and K = 1, 000) and were sampled with same resolution (∆). Attractor predicted by supermodel is marked in red (N = 5, 000 time-steps), while ground truth attractor is shown in background. First column is for supermodel consisting of three sub-models (M3) withK = 200 learning time-steps. Second column is for M3 and two series of learning phase with K = 1, 000. Last column is for M5 with onlyK = 200 training time-steps. Prediction errors for upper panels (from left): 1.92, 1.76, and 0.58; for bottom panels: 1.75, 1.11, and 1.23.

(26)

Figure 2.6: Snapshots from short-time prediction for Lorenz63 (a, b) and perturbed Lorenz63 (c, d) assuming M = 3. Training trajectories are same length and were sampled with different resolutions: (a, c) K = 30; (b, d) K = 1, 000. Testing trajectory are same length (N = 100 time-steps).

(27)

2.3. Idea of supermodel application in data assimilation 28

Figure 2.7: Snapshots from short-time prediction by using supermodel consisting ofM = 3, 5, 8 sub-models (in columns from left) for classical (upper row) and perturbed (bottom row) Lorenz63 systems. Trajectory predicted by supermodel is marked in red (N = 200 time-steps), while ground truth trajectory is shown in background (blue). Supermodel was trained on short fragment of trajectories with length ofK = 30∆ not shown in this figure.

Example of overfitting forM = 8 is shown in bottom figure.

Conclusions

We have demonstrated that the supermodel behavior follows that which was observed for the data models in machine learning (e.g., the ensemble learning schemes). The increase of the number of sub-models (i.e., the increase of the number ofC coupling coefficients) that may cause the overfitting problem can be compensated for by a larger amount of training data (i.e., longer or more-densely sampled trajectories). We have shown that the long-time prediction accuracy can be improved by using appropriately long teaching sequences and a proper (well-balanced, with a teaching length ofK) number of sub-models. This property is observed both in the case of the standard Lorenz63 and the more difficult perturbed Lorentz63 systems. However, though a short learning phase can result in a very crude approximation of the attractor (long-time behavior), the short-time predictions can improve when compared to those obtained for long-time training. Anyway, our experiments have confirmed the results signaled in [76] that a supermodel consisting of imperfect sub-models (standard Lorenz63) can, with adequate accuracy, predict both the short- and long-time behavior of the much-more-complicated and twice-as- large perturbed Lorenz63 system.

2.3. Idea of supermodel application in data assimilation

As shown above, the supermodel works quite well on toy examples such as the Lorenz 63 dynamical system, giving very good prognoses not only for the attractor (climate) shape but also for the dynamics of a single trajectory. The crucial assumption of the sub-model synchronization is that the sub-models should not be too "distant"

from each other. Some mathematical conditions are described in [83]; however, more constructive and general proof of the supermodel’s convergence still does not exist, so we should rely on experiments. As shown in [83], the

(28)

convergence conditions could be improved by pre-training the sub-models by using a classical data-assimilation procedure. However, we skipped this problem here, realizing that it needs additional deep mathematical and ex- perimental research (which greatly surpasses the scope of this thesis). Nevertheless, we can assume that the approximated values of the model parameters (and their dispersion) are known by experts, and the sub-models will eventually synchronize. We would like to emphasize that the number of parameters was extremely small in the experiments on the Lorenz63 model (only three), while the sub-models were tightly coupled throughout all of the dynamical variables. Consequently, the number of coupling factors (latent supermodel parameters) highly outnum- bered the quantity of the parameters of the baseline model (for three sub-models: 18 versus 3, respectively). One can conclude that the supermodel in this particular case (and in the case of climate modeling research [1]) is constructed for increasing the accuracy of the prognoses, disregarding the problem of the computational and storage complexities.

More interesting is the case of the Lorenz63 model perturbed with not only double the number of parameters and dynamic variables but also half the number of couplings. In other words, the sub-models are coupled through only three out of six dynamical variables. The "perturbed Lorenz63" variables of a sub-model (Eqs. 2.16, 2.17, and 2.18) are not coupled to the corresponding variables of the other sub-models. Meanwhile, although the evolution of such system is highly complex (the chaotic model is perturbed by the other chaotic and de-synchronized model, introducing a sort of randomness to the overall system dynamics [82]), the supermodel produces good predictions both in terms of the attractor shape and trajectory course. In the cases of the Lorenz 63 and perturbed Lorenz 63 models, all of the parameters (and dynamic variables) are "equally valuable"; i.e., the sensitivity of the baseline models on their fluctuations are similar, which is reflected in the dispersion of the eigenvalues calculated from the Jacobian matrix (for Lorenz 63) [84].

We presume that, for more complex models (but with considerably lower variabilities), a single (or a few at most) dominant dynamic variable(s) in terms of sensitivity exist through which the sub-models can be synchronized. This way, for a supermodel consisting of three sub-models where the baseline model has tens of parameters, the number of couplings (latent parameters) is only six (in fact, three – due to the symmetry). This hypothesis was verified for both the HANDY (a small and very flexible sociological model [ [83]]) and the 3D model of melanoma dynamics [8].

In Figure 2.8, we present the schemes of tightly (upper) and loosely (lower) coupled supermodels. The description is given in the table caption. The proposed scheme of the supermodel design (shown in the first panel in Figure 2.8) consists of three phases:

1. The short pre-training of each sub-model by using a standard data-assimilation procedure. This research will be conducted in the future. Currently, the sub-models are generated by disturbing the ground truth GT baseline model parameters;

2. Coupling the imperfect models through a single (or as few as possible) most sensitive dynamical variable(s).

This can be done by means of sensitivity analysis procedures [85];

3. Assimilation of the GT data by matching the coupling (latent) parameters using the procedures presented in [77].

It can be clearly seen that the supermodel from the lower part of Figure 2.8 can be treated as the second abstraction level for the data-assimilation and parameter-matching procedures. Instead of matching tens of parameters of a complex baseline model (which can be too computationally demanding), the real data can be assimilated to the supermodel through the small number of latent coupling parameters.

According to the motivation of this thesis, such a way of data assimilation can be a breakthrough in employing sophisticated cancer models as a valuable tool in predictive oncology. However, the efficiency of such a tool will

(29)

Figure 2.8: (upper): Original supermodel scheme [1]. S1, S2, and S3 sub-models with respective parameter sets P1, P2, and P3 (#P1;2;3 = N) and described by A and B dynamical variables are tightly synchronized by coupling sub- models through A and B dynamical variables. Synchronized supermodel is attracted to ground truth throughout additional meta-parameter (Ki); (lower): Novel idea of using supermodel scheme as DA procedure. We assumed that sub-models are coupled through only single most sensitive dynamical variable and supermodel coupling factors are trained and matched to ground truth data. Figure comes from [86].

(30)

also depend on the high computational efficiency of the tumor model. In the following chapters, we focus on developing the cancer models, numerical solvers, and implementations that we will further use for the supermodel of 3D cancer dynamics in the presence of anticancer treatment. This model is much more complicated than the toy models presented in the previous sections of this chapter. On the other hand, we cannot expect such variability as what we observed in the Lorenz models in the considered simulation time slot. Particularly, we are interested in the evolution of the mean tumor diameter and other "integral" and measurable tumor characteristics. In the case of smaller spatio-temporal scales where chaotic behavior might be observed, we should use the different multi- scale models that have been considered in this dissertation. We hope that the averaged effects of the small-scale cancerous tissue variability is present in the measured data. We believe that one can follow the mutual interaction of these effects by using the prediction-correction procedure described in the following sections of the dissertation.

Faculty of Computer Science, Electronics, and Telecommunications D

AGH

University of Science and Technology in Krakow