marcin nowak msc

(1)

AGH

University of Science and Technology in Krakow

Faculty of Electrical Engineering, Automatics, Computer Science and Electronics DEPARTMENT OFCOMPUTER SCIENCE

M

ASTER OF

S

CIENCE

T

HESIS

M

ARCIN

N

OWAK

M

ULTISCALE APPLICATIONS COMPOSITION AND

EXECUTION TOOLS BASED ON SIMULATION MODELS

DESCRIPTION LANGUAGES AND COUPLING LIBRARIES

SUPERVISOR:

(2)

O ´SWIADCZENIE AUTORA PRACY

O´SWIADCZAM, ´SWIADOMY ODPOWIEDZIALNO ´SCI KARNEJ ZA PO ´SWIAD

-CZENIE NIEPRAWDY, ˙ZE NINIEJSZ ˛A PRAC ˛E DYPLOMOW ˛A WYKONAŁEM OSOBI ´SCIE I SAMODZIELNIE, I NIE KORZYSTAŁEM ZE ´ZRÓDEŁ INNYCH NI ˙Z WYMIENIONE W PRACY.

(3)

Akademia Górniczo-Hutnicza

im. Stanisława Staszica w Krakowie

Wydział Elektrotechniki, Automatyki, Informatyki i Elektroniki

KATEDRAINFORMATYKI

P

RACA MAGISTERSKA

M

ARCIN

N

OWAK

N

ARZ ˛

EDZIA DO KONSTRUOWANIA I WYKONYWANIA

WIELOSKALOWYCH APLIKACJI W OPARCIU O J ˛

EZYKI

OPISU MODELI SYMULACYJNYCH I BIBLIOTEKI Ł ˛

ACZ ˛

ACE

OPIEKUN PRACY: dr in˙z. Katarzyna Rycerz

(4)

(5)

Acknowledgements

I wish to thank my supervisorDr. Katarzyna Rycerzfor her invaluable help and patient guidance she has provided during my work on this Thesis. Without her support and contribution this Thesis would not have been possible.

I would like to express my gratitude to Dr. Marian Bubak for his accurate advice.

I would also like to thank my colleague Pawe l Pierzcha la with whom I worked on the initial version of the MUST User Support Tool.

Next, the thanks for the help with integrating MUST User Support Tool with the GridSpace virtual laboratory go to Eryk Ciepiela and Daniel Harezlak from ACC CYFRONET.

Last, but not least, I would like to help Ma lgorzata Palej for her suggestions and proofreading this work.

This work is related with the Mapper project which receives fund-ing from the EC's Seventh Framework Programme (FP7/2007-2013) under grant agreement no RI-261507.

(6)

(7)

Abstract

Multiscale applications are crucial in better understanding of the processes from various elds of science. Modeled processes frequently require substantial computing capabilities. This is the reason why the grid and the cloud computing environments are chosen as suitable platforms for executing multiscale applications.

However, execution of multiscale applications in remote distributed environments is relatively complicated. Therefore, we developed MUST User Support Tool which aids execution of multiscale applications in the grid and the cloud environments as a part of this Thesis. The proposed tool provides a simple interface which may be used to fa-cilitate the work with multiscale applications in remote distributed environments.

Furthermore, we thoroughly compare the grid and the cloud com-puting environments. Various aspects ranging from the denitions of the grid and the cloud, through computing and programming models to performance results are taken into consideration.

This Thesis is organized as follows: Chapter 1 presents the prob-lems discussed, the scope and main goals of this Thesis. Chapter 2 introduces multiscale applications, discusses their main requirements and describes examples from various elds of science. Chapter 3 de-scribes model description languages commonly used to describe sin-gle and multiscale models frequently used in multiscale applications. Various libraries aiding development of multiscale applications are de-scribed in Chapter 4. Chapter 5 is a detailed comparison of the grid and the cloud computing environments. Chapter 6 presents various middleware tools used for accessing the remote environments. MUST User Support Tool is introduced and exhaustively described in Chap-ter 7. ChapChap-ter 8 presents MUST implementation details and the pos-sibilities of expansion. Performance results of executing a scientic application using MUST User Support Tool with various setups are presented in Chapter 9. Finally, Chapter 10 summarizes this Thesis and presents its results and conclusions.

Appendix A is a glossary of terms and abbreviations used in this Thesis. Appendix B contains various MUST User Support Tool usage examples and installation details. Appendix C presents The Compari-son of Cloud and Local HPC approach for MUSCLE-based Multiscale Simulations paper related to this Thesis.

(8)

(9)

3 Model description languages 27 3.1 SBML and CellML . . . 27 3.2 MML and CxA . . . 29 3.3 Comparison . . . 32 3.4 Summary . . . 33 4 Coupling libraries 35 4.1 AMUSE . . . 35 4.2 MCT . . . 36 4.3 MUSCLE . . . 37 4.4 Comparison . . . 39 4.5 Summary . . . 40 5 Infrastructures 41 5.1 Grid . . . 41 5.2 Cloud . . . 43 5.3 Comparison . . . 44 5.4 Summary . . . 48 6 Accessing infrastructures 49 6.1 Local resourcesPortable Batch System . . . 49

6.2 Grid resourcesGridSpace . . . 49

6.3 Cloud resourcesAmazon Web Services . . . 51

6.3.1 Elastic Compute Cloud . . . 51

(10)

6.3.3 Elastic Block Storage . . . 52

6.3.4 Simple Queue Service . . . 52

6.4 Summary . . . 52

7 MUST User Support Tool 55 7.1 Concept of MUST . . . 55 7.2 Requirements . . . 56 7.3 Use cases . . . 57 7.4 Architecture . . . 58 7.4.1 MUST ArchitectureGrid . . . 59 7.4.2 MUST ArchitectureCloud . . . 63 7.5 Summary . . . 67 8 Implementation details 69 8.1 MUST implementation . . . 69 8.2 Expansion possibilities . . . 72 8.3 Tools used . . . 72 8.4 Summary . . . 73 9 Case study 75 9.1 ISR2D performance results . . . 75

9.2 Performance results interpretation . . . 78

9.3 Summary . . . 79

10 Summary 81 10.1 MUST User Support Tool . . . 81

10.2 Grid and cloud comparison . . . 82

List of Figures 85 List of Tables 85 References 87 Appendices 91 A Glossary 91 B Examples 93 B.1 Example GridSpace experiment . . . 93

B.2 Example usage . . . 98

(11)

B.3.1 Prerequisites . . . 99

B.3.2 Access machine and nodes conguration . . . 100

B.3.3 EC2 instance conguration . . . 101

B.4 Summary . . . 101 C PublicationComparison of Cloud and Local HPC approach

(12)

(13)

1 Introduction

This Chapter introduces the user support tool that allows automatic execu-tion of multiscale applicaexecu-tions in distributed environments (MUST User Sup-port Tool) proposed in this Thesis and its main requirements. The MUST tool facilitates the work with multiscale applications in the grid and the cloud environments. This Chapter also species the goals and the scope of this Thesis (Sections 1.11.2). Sections 1.3 and 1.4 present the organization of this Thesis and the contribution of other authors respectively.

1.1 Problem outline

The need for high-performance computing and storing vast amounts of data is constantly growing. Not everyone, however, is able to build and maintain their own supercomputers or mass storage devices. Therefore, this need is nowadays frequently fullled by remote systems. Details concerning the infrastructure or the physical localization of such systems are irrelevant to the end-users. Both grid and cloud computing provide applicable resources. Grid computing focuses on large-scale computations, commonly in the form of batch jobs. Grid systems process few large requests (e.g. jobs allocating several hundreds of nodes). On the other hand, cloud systems primarily allow users to deploy their services. Consequently, cloud systems process sizeable numbers of small requests.

Multiscale applications depend on such substantial computing capabili-ties. Simulation of three-dimensional models is a key task in numerous elds of science. Models at dierent spatial and temporal scales are widely used. Multiscale applications depend on numerous model description languages, coupling libraries and work-ow management systems. Integration and ex-ecution of dierent models is a crucial task. Unfortunately, it often has to be performed manually and, in consequence, is very time-consuming. In this paper, we propose a user support tool which assists in automatic execution and distribution of multiscale applications on various infrastructures.

Multiscale applications are generally used by non-programmers. There-fore, their execution on distributed architectures should be relatively uncom-plicated.

The user support tool presented in this Thesis was built as a part of the GridSpace virtual laboratory which allows users to create experiments. Created experiments can be executed locally (on a computing cluster) or remotely (on the grid). The GridSpace virtual laboratory provides a user-friendly web interface. MUST can be used to facilitate work with multiscale applications in distributed environments. It allows users to automatically

(14)

distribute their GridSpace experiments and execute them on the grid and the cloud infrastructures.

Utilization of the resources provided by grid and cloud computing can be a great opportunity to develop multiscale applications.

1.2 Goals and scope

There were two main goals of this Thesis:

• Design and development of a user support tool which allows automatic distributed execution of multiscale applications in various infrastruc-tures (hereinafter referred to as MUSTMUST User Support Tool). The main requirements:

Support for distributed execution of multiscale applica-tions. The purpose of the tool is to allow distributed execution of multiscale applications i.e. applications which use coupling li-braries to link multiple separate single-scale models. In this Thesis we focus on support for multiscale applications built using MUS-CLE coupling library. The MUSMUS-CLE coupling library was chosen based on the conclusions of Chapter 4. The distributed execu-tion of MUSCLE-based applicaexecu-tions must not require substantial changes to the existing applications. The distribution should be based on the existing features of the supported library.

Support for both grid and cloud distributed infrastruc-tures. The proposed tool should allow execution of multiscale applications on both grid and cloud infrastructures. Live trans-mission of the standard input and error streams from the remote environments should be possible as well as optional upload of the input sandbox lesallowing automatic deployment of sim-ple multiscale applications (assuming that other prerequisites are available on the remote machines).

Access from the GridSpace virtual laboratory level. The proposed tool should be accessible from the GridSpace virtual lab-oratory web interface. Automated execution and distribution of the GridSpace experiments based on multiscale application should be possible using the proposed tool.

A detailed description of the requirements is presented in Section 7.2. Chapters 7 and 8 describe the MUST tool in detail.

(15)

Theoretical comparison. The aim of the throughout theo-retical comparison of grid and cloud computing is to highlight dierences and similarities between them. Aspects such as pro-gramming model, computing model, usability, standarization, etc. should be taken into consideration.

Performance results. Performance results of the grid and the cloud infrastructures based on the execution of a scientic multi-scale application using various setups should be compared. The comparison should specify, describe and separately confront all the steps necessary to execute an application in the distributed environment including the preparation step, the execution itself and downloading results from the remote machines.

Ease of access, diculty of installation and amount of changes required in legacy applications. Middleware tools used for accessing the grid and the cloud infrastructures should be described and compared as well as the APIs exposed and stan-dards/proprietary solutions used. Installation and conguration steps required should also be discussed.

A comparison of the grid and the cloud infrastructures is presented in Chapter 5 while the performance results are discussed in Chapter 9. To achieve these goals, we developed the MUST tool and then performed appropriate tests.

1.3 Organization of this Thesis

This Chapter presented the problem outline and briey introduced MUSTa user support tool facilitating execution of multiscale applications in various distributed environments. The subsequent chapters present theoretical con-cepts related to multiscale applications and distributed environments (the Grid and the Cloud) as well as tools used for execution and description of multiscale applications on various levels (ranging from low level job schedul-ing systems through middleware couplschedul-ing libraries to high-level description tools and execution environments).

The concept of multiscale applications is presented in Chapter 2 which discusses their requirements thoroughly and presents problems related to ac-curate model description and comprehensible inter-model communication. Eventually, a few examples of multiscale applications from various elds of science are described (with particular emphasis on dierent spatial and tem-poral scales at which each application operates).

(16)

The subsequent Chapter (3) focuses on an ecient and understandable description of multiscale applications. Therefore, we present various model description languages. Firstly, some recognized and widely used languages concentrating on single model description such as Cell Markup Language and Systems Biology Markup Language are presented. Then, we describe the concept of Multiscale Modeling Language (MML) enabling a description of multiple models and inter-model interactions. CxAa format preceding MMLis also shortly discussed as a format closely related to MUltiScale Coupling Library and Environment (MUSCLE). Finally, a comparison of all the languages is presented.

Chapter 4 presents Astrophysical Multipurpose Software Environment (AMUSE), Model Coupling Toolkit (MCT) and MUltiScale Coupling Library and Environment (MUSCLE)three dierent coupling libraries facilitating building and execution of multiscale applications. First of all, each library is separately described. Then, all the libraries are compared with emphasis on aiding execution of multiscale applications in distributed environments.

Chapter 5 leaves the subject of multiscale applications and focuses on generic distributed environments. The Grid and the Cloud computing envi-ronments are exhaustively described and compared. A theoretical compari-son starting with quotations of various Grid and Cloud denitions is followed by a presentation of more pragmatic aspects, such as programming model, computing model and usability.

The middleware tools enabling access to the Grid and the Cloud environ-ments are described in Chapter 6. It includes job scheduling systems used for local computing clusters as well as an API allowing access to various Cloud-based resources (on the example of Amazon Web Services API). In this Chapter, we also introduce and describe the GridSpace virtual labora-tory. Utilization of the previously listed middleware tools in MUST is also mentioned in the summary.

The MUST tool is described in detail in Chapter 7, a general concept of the tool is specied, followed by a presentation of detailed requirements and a description of some use cases. Subsequently, the tool's layered architecture is described and depicted, divided into two sectionsone for the Grid and one for the Cloud architecture. Detailed diagrams showing cooperation of consecutive layers are also included. We list as well the steps performed before, during and after execution of the particular multiscale application using the MUST tool, which are depicted on the suitable sequence diagrams. Chapter 8 describes MUST's implementation details, with a class dia-gram included, showing the similarities and implementation dierences on both Grid and Cloud computing infrastructures. The chapter ends with a discussion about possibilities of expansion, divided into supporting both new

(17)

execution environments and other coupling libraries.

Various performance results are included in Chapter 9. The tests per-formed include a division into various stages of execution (including submis-sion, execution and results gathering stages). A scientic application (In-stent restenosis 2D, described in detail in Section 2.3) was used to achieve conclusive results. All tests were performed on both Grid and Cloud in-frastructures. The chapter is summarized by a discussion of the measured performance results.

Appendix A is the short glossary of uncommon terms and abbreviations used in this Thesis.

Appendix B shows various examples of MUST usage. A sample GridSpace experiment which may be launched using MUST is included as well as its de-tailed description. Some MUST's command line usage are also demonstrated. The Chapter ends with a short description of the MUST installation.

Appendix C includes The Comparison of Cloud and Local HPC approach for MUSCLE-based Multiscale Simulations paper (by K. Rycerz and co-authors M. Nowak, P. Pierzchala, M. Bubak, E. Ciepiela and D. Harezlak).

1.4 Contribution of other authors

MUST was developed as a part of the GridSpace virtual laboratory (GS)[44]. GS enables development, sharing, execution and reusability of the so-called experimentssets of high-level scripts created by scientists (more details in Section 6.2).

The initial version allowed running MUSCLE-based multiscale applica-tions on the grid infrastructure. It was developed by Pawe l Pierzcha la and the author of this Thesis. Pawe l Pierzcha la focused on integration with GridSpace while the author of this Thesis concentrated on the usage of the grid resources. MUSCLE usage, live results streaming, etc. was a result of a joint work of Pawe l Pierzcha la and the author of this Thesis.

After nishing the initial version, Pawe l Pierzcha la continued building a graphical tool which allows mapping groups of single scale simulations to computing nodes. That JavaScript based tool was developed to cooperate with the GridSpace virtual laboratory as an external Web application[2].

Concurrently, the author of this Thesis developed a tool which allowed launching MUSCLE multiscale applications on the Amazon Web Services-based cloud infrastructure.

(18)

(19)

2 Multiscale applications

In this Chapter, we introduce the concept of multiscale applications (Section 2.1). The common requirements of multiscale applications are presented in Section 2.2 while Section 2.3 presents some exemplary multiscale problems and applications from various elds of science.

2.1 Introduction

Multiscale modeling is nowadays used in multiple elds of science. Examples include Physiology, Computational Biology, Engineering and Nano-material Science [1].

The multiscale model may be perceived either as a single model span-ning many spatial and temporal scales or as a set of coupled single scale models (as presented on Figure 1). Models at various scales require simula-tion at dierent levels of detail (ranging from intermolecular to macroscale interactions).

T T

L L

Figure 1: The same model shown as a single multi-scale model and decom-posed to the set of single-scale models (based on [1]).

Multiscale applications cannot be easily decomposed by naturethey con-sist of many components which are often tightly coupled. Each component may be simulated separately using dierent techniques (by molecular dynam-ics, cellular automata or Monte-Carlo methods, etc.).

2.2 Requirements

Multiscale applications are widely used as tools helping to understand com-plex processes. Multiple single scale simulations forming one multiscale ap-plication are often computationally heavy. Receiving results in real time is crucial in some appliances (e.g. when aiding health-related decisions).

(20)

Computing, communication and storage requirements

Multiscale applications may require very large computing capabili-ties (sometimes exceeding PetaFlops [1]). Communication between compu-tational processes (both at the intra- and inter-simulation level) may easily become the application's bottleneck.

This is the main reason why the concurrent execution and an eective communication between single scale models is a crucial task. There are sev-eral approaches to communication between single scale models (as described in [3], e.g. the timing of a particular simulation may be regulated by another one, there may be a rollback needed, etc.).

Signicant amounts of data are often used and/or produced by multiscale applications. This is why many multiscale applications require the utiliza-tion of large and easily accessible storage devices.

Simulation dependencies and model description

Dependencies between single scale models are highly diversied. Models may have to be simulated concurrently or sequentially. Simulation of the same period in various spatial scales may require varied computing capabil-ities resulting in dierent simulation times (i.e. a simulation of 1 second of blood ow may take signicantly longer when simulated on cell level than when simulated on vessel levelnevertheless simulation results on each level may aect another one).

The description of models themselves and their cooperation is an important requirement. Models should be described in a clear, structured way so that they could be easily understandable and reusable. XML-based model description languages (e.g. SBML, CellML) usually meet these re-quirements. Although there are many formats available a single standard has not been chosen yet (as discussed further in Chapter 3).

While there are a few model description languages describing single scale models, there are signicantly less languages which would describe connec-tions and dependencies between models (i.e. languages which would model a whole multiscale application). For instance, the MML language (described also in Chapter 3) attempts to achieve this goal.

There are multiple tools facilitating execution of multiscale applications. For example, the AMUSE library [18] is used for simulation of stellar sys-tems. Other examples include MCT [19, 20] and MUSCLE [21], which are more general purpose tools. MCT operates on a slightly lower level than MUSCLE. MUSCLE uses the CxA model description format (described in Section 3.2. The tools mentioned above, allowing simulation of existing mod-els, are described more widely in Chapter 4.

(21)

2.3 Examples

Mutliscale problems from various elds of Science are briey described below. Example domains include Physiology, Flow Control and Fusion processes. All the presented applications span many spatial and temporal scales and require supercomputing capabilities to be modeled.

Physiology

The rst example of a multiscale application is a multiscale model of in-stent restenosis created in the COAST project [4].

Coronary artery disease remains the most common cause of death in Europe. It refers to stenosis of coronary arteries caused by accumulation of atheromatous plaque[1, 5]. Possible treatment involves the use of a metal frame (stent) to maintain an open vessel lumen. Unfortunately, there is a common complicationin-stent restenosis (ISR)which is the return of the vessel lumen to a size similar to that before intervention.

Figure 2: A Scale Separation Map depicting the dependencies between single scale models forming the simulation of In-Stent Restenosis (based on [6, 5]).

Modeling ISR may help to understand and prevent this ailment. Processes participating in ISR act on scales from microns up to centimeters. Temporal scales involved are also widely separated (from seconds to months). The factors such as administered drugs or changing blood pressure aect the patient on multiple scales, ranging from individual cell characteristics up to

(22)

global hemodynamics. Figure 2 shows a simplied Scale Separation Map (a diagram type described in Section 3.2) proposed by Evans et al. in [6], depicting dependencies between single scale models forming a simulation of ISR.

A real time simulation of 3D ISR models is crucial in many areas (e.g. de-signing patient-specic chemotherapy and radiotherapy applications). This simulation however requires up to thousands of processors.

Flow Control

Figure 3: A Scale Separation Map depicting the single scale models used to simulate water ow (based on [1]).

Flow control model developed by the Universite de Geneve in collabora-tion with the Ecole Superieure d'Ingenieurs en Systemes Industriels Avances Rhone-Alpes is another example of a multiscale problem.

Simulating ow of canals and rivers is an important task. It may help to maintain the adequate water levels needed in agriculture, water transport, etc., and, which is even more important, it may help to avoid ooding.

Three models at dierent scales are involved in simulating the full system of irrigating canals[1, 9]:

(23)

1. One-dimensional shallow water equation for simulating the water ow in long canal sections. This model considers the sediment transport which may have a signicant inuence on the water ow (i.e. irrigation eciency reduction).

2. Two-dimensional shallow water equation for simulating branching and large water pools.

3. Three-dimensional, free-surface model is used in simulations of a de-tailed ow of water in gates and/or in descriptions of the sediment transport. This simulation uses Lattice Boltzmann Model for Free Sur-face Flow[10].

The third model requires supercomputing capabilities. Figure 3 shows a Scale Separation Map which includes 1D shallow water model, 3D free-surface model and scales at which each model is used.

Fusion

Another example of a multiscale problem includes modeling fusion pro-cesses proposed as a part of the ITER[11] project. The project focuses on a description of core plasma in a tokamak. A tokamak1 _{is a toroidal device} using a magnetic eld to conne plasma. The range of scales modeled in the application is depicted in Figure 4.

Hydrogen atoms collide in the core of the Sun and fuse into heavier He-lium atoms. The fusion of two atoms produces great amounts of energy (as calculated in Einstein's formula E = mc2). The ITER project models fusion processes so that they can be used to produce commercially available energy. To achieve that, complex states need to be modeled.

An example of such a state is an equilibrium describing the reference plasma state and a series of equilibria varying in the edge current and edge pressure proles. A stability analysis needs to be performed on each equilib-rium in order to show the stable region and instabilities at its boundaries[1].

Computational Biology

The last example of a multiscale problem presented in this Chapter is the modeling of bile acid and xenobiotic system, performed as a part of the NucSys program[8].

An analysis of interactions between several components of a biological system may help to understand the system as a whole. Eorts have been

1_{TokamakRussian ÒÎðîèäàëüíàÿ ÊÀìåðà ñ ÌÀãíèòíûìè Êàòóøêàìèa}

(24)

Figure 4: Range of scales modeled in the fusion multiscale application.

made to model processes at dierent scales, ranging from molecular to organ level and from fractions of a second to months.

An example of such a multiscale process is the previously mentioned modeling of bile acid and xenobiotic system (BAXS). It comprises model-ing the processes of metabolism, conjugation and modication and transport phases[1].

The supercomputing capabilities used for modeling this system may result in a better understanding of the model's parameters and the model's reaction to various experimental conditions.

(25)

2.4 Summary

In this Chapter, we introduced multiscale applications, their requirements and examples. Studying multiscale applications requirements shows that they require not only supercomputing capabilities to run simulations but also an ecient description language to aid development, cooperation and reusability of models. Popularization of easy to use middleware libraries which enable an automatic code-stubs creation is also vital.

This Chapter presented various examples of multiscale applications. Con-current (and/or sequential) simulation of dierent single scale models is used in domains as distant from each other as Flow Control and Fusion processes. Frequently used and recently proposed model description languages are described in subsequent Chapter. A review of the various single scale models coupling tools is also presented in Chapter 4.

(26)

(27)

3 Model description languages

This Chapter briey presents various model description languages, their com-mon usage and collaboration with the MUST tool. In Section 3.1 we focus on the languages used mainly for describing single scale models. Section 3.2 introduces a new language, MML, the main target of which is the modeling of multiscale applications as a whole (including interactions between single scale models). In Section 3.3 we compare the previously introduced types of languages.

3.1 SBML and CellML

Both Systems Biology Markup Language (SBML) [12] and Cell Markup Lan-guage (CellML) [13] focus on description of the models themselves (i.e. they do not stress describing connections and interactions between models or sim-ulation details). SBML focuses on modeling physical and chemical phenom-ena, whereas CellML is primarily used to describe mathematical models of cellular biological function[13]. Physical phenomena are mostly described by dierential equations and linear algebra.

SBML

SBML models are hierarchical. An SBML model may contain various child elements: notes, functions, units, compartments, species, parameters, rules, reactions and events. The so-called compartment is, in fact, a single model with a certain spatial and temporal scale dened. Unfortunately, compartments can only be contained in another compartment (i.e. no other relation between compartments is possible). Species sections may represent chemicals used in the model (ranging from simple ions to complex structures like RNA).

Figure 5 shows an exemplary fragment of an SBML le. Example units, compartments and species are dened.

CellML

CellML models are built from a set of smaller components. A component may represent a physical object (i.e. a cell), a physical or chemical reaction or a simple variable.

Besides components, a CellML model may contain unit, group, connection and import sections. The unit section is used for dening complex units (just like a similar section in SBML). The group section describes physical and logical component relations. There are two predened relationsphysical

(28)

...

< UnitDefinition id=" per_second "> <ListOfUnits >

</ UnitDefinition >

< UnitDefinition id=" litre_per_mole_per_second "> <ListOfUnits >

</ UnitDefinition >

</ ListOfUnitDefinitions > <ListOfCompartments >

<Species compartment=" cytosol " id="ES"

initialAmount="0" name="ES"/> ...

Figure 5: SBML le fragment example.

containment and encapsulation (representing a logical component hierarchy). The connection section is used to connect variables between components (so that a change in a particular model may aect another model). The import section may be utilized to reuse the previously dened compartments and units.

Figure 6 shows a fragment of a CellML le. The units are dened in a fashion similar to SBML. An example compartment with several variables is also shown.

SBML models may be described on a higher level of detail than CellML models. CellML models are more loosely coupled, therefore they allow slightly easier reusability (by use of compartments).

(29)

...

<Variable name="Na" units=" concentration_units "

public_interface="out"/>

<Variable name="Ca" units=" concentration_units "

public_interface="out"/>

<Variable name=" time " units=" second "

public_interface="in"/> ...

Figure 6: CellML le fragment example.

Both SBML and CellML may describe models from dierent elds at many scales. Both languages were designed for ease of models reusability. Unfortunately, in CellML the only possible relation between models (or com-partments) is inclusion. Connections between compartments can be dened. On the other hand, SBML does not allow to dene relations or connections between models. Therefore, SBML and CellML are the tools suited for de-scribing a single scale model rather than a set of models at dierent scales.

There are many tools which support creation (OpenCell, E-Cell, CellML Viewer) and/or simulation (JSimsupporting both CellML and SBML) of CellML and SBML models. Large repositories2 _{of both CellML and SBML} models exist.

3.2 MML and CxA

Multiscale Modeling Language (MML) [14, 15, 16] is a concept language proposed as a part of the Mapper project. MML can be used to describe models similarly to SBML and CellML. Furthermore, MML can describe the coupling between single scale models (including relations between computa-tional domains and scales), and the types of coupling (coarse graining, scale

2_{Such as http://physiome.org, http://models.cellml.org, http://e-cell.org/}

(30)

splitting, amplication). MML models can be described in the xMML for-mat (XML Multiscale Modeling Language) and depicted using the graphical representation of MMLgMML (Graphical Multiscale Modeling Language). MML describes accurately connections between models. Various types of sent information are distinguished (e.g. information sent during or after com-putation, initial conditions, information updating the domains or boundaries, information regarding scales, etc.).

Single scale models in MML are standalone and were designed for ease of reusability. MML introduces ltersthey can be used as links between models and perform required transformation of the exchanged data (including change of scale, interpolation or decimation).

MML also describes mappersentities controlling the ow of information between models. A mapper gathers information from all connected models, then processes and combines it as required and, nally, sends it to receiver models.

Figure 7: An example Coupling Diagram showing ISR (described in Section 2.3) application (based on [1]).

MML models may be easily depicted using gMML. There are two kinds of diagrams in gMML: Scale Specication Maps (SSM) and Coupling Dia-grams. SSM illustrates the coupling of dierent scale models. It is suited for depicting models with well separated scales. Coupling Diagram which shows interactions between single scale models is best suited for depicting more

(31)

complex multiscale applications. It may contain all the elements of MML (including lters, mappers and ow of information).

Figure 7 shows an example of Coupling Diagram. Single scale submodels are shown as rectangular boxes. Couplings between submodels are shown as dierently styled connectors. Filled ends of a connector are attached to the submodels sending data and empty ends are attached to the submod-els receiving data. Connectors may be labeled to describe data transferred between models. Dierent connector types are described in detail in [14].

# set physical properties

cxa.env[" kin_viscosity [m2/s]"] = 4E -6 cxa.env[" U_max [m/s]"] = 0.121 cxa.env[" rho0 [kg/m3]"] = 1000 # ... # declare kernels cxa. add_kernel ('bf ',

'kernel . flow3d . FlowTestController ')

cxa. add_kernel ('smc ',

'kernel . smc2d . SMCController ')

cxa. add_kernel ('smc2bf ',

'cxa. cxa3d . smc2bf . ObsArray2IncrementalLists3D ')

# ...

# configure connection scheme

cs. attach ('smc2bf ' => 'bf ') {

tie('StaticSolid ', 'BFObsExit ')

tie('NewSolid ', ' BFincSolidExit ')

tie('NewFluid ', ' BFincFluidExit ')

}

# ...

Figure 8: CxA le fragment example.

The xMML format may be used as a base for generating code for coupling libraries such as MUSCLE (described in Section 4.3).

There are several tools using the xMML format, e.g. MAD (Multiscale Application Designer) and MAME (Mapper Memory)[1] developed as a part of the MAPPER project. MAD allows modeling multiscale applications in

(32)

a graphical environment. The resulting graphs can be stored as (or loaded from) xMML. Moreover, MAD supports creation of CxA le stubs. The second tool, MAME, allows storing, sharing and reusing various metadata describing multiscale applications (including mappers, lters and even im-plementations). MAME can also store xMML application descriptions (so that they could be easily accessed by other MAPPER tools).

CxAMML prototype

The Complex Automata theory (CxA)[17] is a methodology for modelling complex multiscale systems. A Complex Automaton is a set of connected Cellular Automata and agent-based models, every one of them represent-ing a srepresent-ingle-scale simulation. Each automaton may consist of several other automata (hierarchical coupling).

The CxA theory is a base for the MUSCLE framework (described in Section 4.3). Each MUSCLE simulation must contain a conguration le, dening kernels (each representing a single automaton) and the connections between them.

The conguration le itself is written in Ruby programming language, therefore it is strongly attached to the MUSCLE framework.

Figure 8 shows an example fragment of CxA le. First, the environmental physical properties are set in the example. Then the computing kernels are declared and a connection scheme between them is congured.

3.3 Comparison

CellML and SBML are primarily used as single scale model description lan-guages, therefore their possible uses as a primary tool for describing a mul-tiscale application are limited. Although, they may be the best choice if it comes to modeling a single model contained in the application.

MML is a promising format, although not yet an established standard (such as CellML and SBML in their eld). Tools such as MAME or MAD[1], which use MML, may help to promote the language. If MML earns well-deserved recognition it may become a leading format in description of mul-tiscale models. The use of MML would certainly facilitate the design and development of multiscale applications. The proposed MML diagrams are also a tool which may ease the transfer of knowledge about legacy applica-tions.

CxA is a successfully used format, although not a model description lan-guage per se. Its tight coupling to the MUSCLE framework greatly limits the possibility to use it with other libraries.

(33)

The tools allowing conversion from xMML to a library specic description format or even code stubs (as proposed in [14]) would certainly help promote MML as a multiscale model description language.

3.4 Summary

In this Chapter, we presented various model description languages. Lan-guages aiming at detailed single scale models descriptions (SBML, CellML) have been introduced and described. XML code samples were shown and discussed.

The MML language and related diagram types describing multiscale ap-plications as a whole were introduced in Section 3.2. We also presented an example of a CxA le and its relation to the MML language and the MUS-CLE library.

This Chapter ended with a short comparison of dierent types of model description languages.

(34)

(35)

4 Coupling libraries

This Chapter presents dierent coupling libraries used for building multi-scale applications. In Sections 4.14.3 we introduce three example libraries. Section 4.4 compares the previously introduced tools.

4.1 AMUSE

Astrophysical Multipurpose Software Environment (AMUSE) is a framework for large-scale simulations of stellar systems (dynamics, stellar evolution, hydrodynamics, radiative transfer, etc.).

Figure 9: AMUSE architecture overview (based on [18]). AMUSE is a tree layer framework, the layers being:

1. User Script layerdenes a specic problem and couples two or more codes from the lower layers.

2. AMUSE Code layera generic layer providing an object oriented inter-face on top of the legacy codes.

3. Legacy Codes layerdenes interfaces to the legacy codes, contains existing legacy codes (actual simulations).

(36)

Each higher layer adds functionality to the lower layer.

The AMUSE framework is written in Python with use of the Message Passing Interface. The rst layer user scripts are written in Python. It is possible to integrate legacy C++ or Fortran code with the AMUSE frame-work.

Figure 9 shows an overview of the AMUSE architecture. Python scripts can be used to access the underlying AMUSE code layer which enables ac-cessing the underlying codes using MPI.

4.2 MCT

The Model Coupling Toolkit (MCT)[19, 20] is a library for coupling mod-els to form a parallel coupled model. MCT was build to bring together the parallel submodels which form the Community Climate System Model (CCSM). MCT solves problems of transferring data between dierent paral-lel programs, allowing for ecient data transfer for demanding interpolation algorithms. The MCT library is scalable and high-performing.

Figure 10: MCT usage in Community Climate System Model (CCSM). Figure 10 shows cooperation between MCT and various submodels form-ing the CCSM model. MCT is used to transfer signicant amount of data

(37)

between submodels.

MCT is available as the Fortran library, but C++ and Python bindings are also available through the external Babel library.

4.3 MUSCLE

Multiscale Coupling Library and Environment (MUSCLE)[21] is a framework for running multiscale simulations. MUSCLE was primary developed in the COAST project and is currently used in the MAPPER project[4]. MUS-CLE allows running simulations based on the complex automata theory (as described briey in Section 3.2).

A complex automaton is a group of cellular automata and agent-based models. A MUSCLE simulation is a group of independent kernelssingle scale simulations wrapped into a controller agent communicating with the core kernel (a plumber). Optional conduit lters may be used to alter data transferred between kernels. A conduit lter may perform simple transforma-tions such as scale altering or coordinate conversion. A MUSCLE simulation can often be depicted using a Scale Specication Map (Section 3.2).

Figure 11 shows an example MUSCLE environment. Kernel 1 commu-nicates with kernels 2 and 3. A conduit lter is used for altering the data transferred from kernel 2 to kernel 3. A plumber kernel is also depicted.

The MUSCLE framework is written in Ruby and Java, based on the Java Agent DEvelopment framework (JADE) [22].

MUSCLE allows developers to write kernels in Java and (using a supple-mentary library) in native code (C++/C, Fortran). Each kernel must dene the scale it is operating at and the portals (in/out connections) it is using. Kernels can be connected using a conguration le (CxA, Section 3.2).

(38)

kernel 1 kernel 2 kernel 3 plumber conduit filter MUSCLE environment

(a) A simple MUSCLE environment example.

# declare kernels

cxa. add_kernel ('kernel1 ',

'example . kernel . Kernel1 ')

# configure connection scheme

cs. attach ('kernel1 ' => 'kernel2 ') {

tie('OutA ', 'InA ')

}

cs. attach ('kernel1 ' => 'kernel3 ') {

tie('OutB ', 'InB ',

Conduit .new(" example . filter . ConduitFilter ")) }

(b) A CxA le fragment.

Figure 11: A MUSCLE environment example with a CxA le fragment de-scribing it.

(39)

4.4 Comparison

Table 1 presents a short comparison of the previously described coupling li-braries. The AMUSE and the MUSCLE frameworks seem to operate on a similar level of abstraction, whereas the MCT library solves the lower level problems, more related to parallel programming than to multiscale simula-tions.

MUSCLE library is more generic than AMUSE (AMUSE is limited to the large-scale stellar simulations).

AMUSE MCT MUSCLE

Level Multiscale

simulations Parallel pro-gramming Multiscalesimulations

Scope Stellar

simu-lations Generic Generic

Supported

languages C/C++, For-tran, Python C/C++, For-tran, Python C/C++, For-tran, Ruby, Java

Table 1: Coupling libraries comparison.

The three libraries compared allow the developer to use similar sets of programming languages (C++, C and Fortran for native code). AMUSE and MCT oer Python bindings, whereas MUSCLE kernels can be written in Java and the conguration lesin Ruby (as the whole framework is based on these two languages).

The MUSCLE framework seems to be the best suited tool for high level multiscale simulations. On the other hand, the MCT library may be the best choice for the parallel applications for which the performance is crucial.

(40)

4.5 Summary

In this Chapter, we presented three example coupling libraries (AMUSE, MCT and MUSCLE) which can be used to aid execution of multiscale simu-lations. After introducing each library, we presented their short comparison. The MUST tool presented in this Thesis uses the MUSCLE framework based applications. The MUSCLE framework was chosen as the tool best suited for running multiscale applications because it operates on a high level of abstraction (contrary to the MCT toolkit) and is not a problem-specialized tool (as AMUSE). Its close relationship to the MML language was also an advantage, as MML aims at describing single scale models and interactions between them (i.e. whole multiscale applications). Section 3.2 describes MML in detail.

(41)

5 Infrastructures

This Chapter presents the grid and the cloud infrastructures. Various def-initions of the grid are quoted in Section 5.1, which contains also a list of the most notable grid characteristics. Section 5.2 describes cloud comput-ing: introduces its denitions and characteristics. Section 5.3 compares the grid and the cloud infrastructures, highlighting the dierences and similari-ties between them. Aspects such as programming model, computing model, usability and standarization are compared in detail. Furthermore, both grid and cloud architectures are compared.

5.1 Grid

The term grid computing was rst used in Ian Foster and Cart Kesselman's work The Grid: Blueprint for a new computing infrastructure[23]. It is a metaphor for computing power being as common and easy to access as public utilities such as electricity and water. Since then, many grid denitions emerged.

Denitions

Ian Foster, in his 2002 work What is the Grid? A Three Point Check-list[24], denes a grid as a system that:

1. coordinates resources that are not subject to centralized control 2. using standard, open, general-purpose protocols and interfaces 3. to deliver nontrivial qualities of service.

In his denition, Foster highlights the grid's independence from any cen-tralized form of control. A grid system serves many dierent types of users oriented to various types of resources. A Grid system should also use standard protocol and interfaces to perform such fundamental tasks as user authenti-cation, local and global security policies appliauthenti-cation, resource management. These qualities should allow the grid system to deliver service of a higher quality than a sum of it parts (e.g. meaning the automation of inter-system communication, single sign on policy, lower response times, higher availability and security levels).

Vaidy Sundernam proposes the following denition of a grid system [25]: • A grid system is a paradigm/infrastructure that enables the sharing, selection and aggregation of geographically distributed resources (com-puters, software, data/databases, people)

(42)

• depending on availability, capability, cost, QoS requirements • for solving large-scale problems/applications

• within virtual organizations.

Moreover, according to Vaidy Sundernam a grid system is NOT: • The Next generation Internet

• A new Operating System

• Just (a) a way to exploit unused cycles (b) a new mode of parallel computing (c) a new mode of P2P networking

Sundernam emphasizes the geographic distribution of resources and their structured arrangement within Virtual Organizations. A Virtual Organiza-tion (VO) is a logical set of groups/instituOrganiza-tions created to solve a common problem using the grid resources. Resource sharing, access restrictions, se-curity policies etc. may be applied on the level of a Virtual Organization.

In [23] yet another grid denition is proposed by Foster and Kesselman: A computational grid is a hardware and software infrastructure that pro-vides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities. This concise denition stresses high quality of grid services and large scale computing capabilities of the grid.

A common vision of a grid system emerges from all the denitions. It is a dependable infrastructure, oering the highest computing (and not only) capabilities by a common eort of many geographically spread institutions. Below, we present the main dierences between the grid and a traditional distributed system or a computing cluster.

Grid Characteristics

The most signicant grid characteristics which distinguish the grid from other distributed systems and computing clusters are briey described below: • Grid systems bring together heterogenic resources (as opposed to a dis-tributed system) from dierent physical locations. It forces the cooper-ation with Wide Area Networks (which is rarely the case in computing clusters).

• Cooperation of numerous Virtual Organizations allows solving higher scale problems. Interoperability between VOs (intra-grid) and between

(43)

grid systems (inter-grid) is assured by the common open protocols and interfaces (few to none closed proprietary solutions are used).

• Grid systems are more generic and oer diversied resources as opposed to distributed systems and clusters which are often computing power-oriented.

Grid systems are generic tools having wide problem-solving appliance. Al-though the use of grid-specic tools is often needed but only once employed, it is easy to migrate between dierent grid systems thanks to unied archi-tecture.

5.2 Cloud

Cloud computing is a metaphor for the abstraction of actual resources it represents. In general, cloud computing is associated with scalable computa-tional resources available on a pay-per-use basis. There are three basic types of services exposed by Clouds.

1. Infrastructure as a Service (IaaS)basic services (such as computa-tional power, data storage) are available to the customers through ab-straction and virtualization (i.e. computational power is abstracted to virtual machine instances, storageto lesystems). Amazon Web Ser-vices (described further in Section 6.3) are probably the largest cloud services oered in such model.

2. Platform as a Service (PaaS) adds an extra abstraction level to IaaS. The service provided is an actual platform on which customer applica-tions may run. This model hides unnecessary details from the customer (i.e. scaling of the application may be transparent for the customer de-veloper). An example of such model is the Google App Engine[28]. 3. Software as a Service (SaaS) exposes to users applications running in

cloud systems. Such applications may be accessed from devices with small computational power (such as mobile devices, PDAs). Google Docs[29] is an example of SaaS. Although this application does not require large computing power (so it could be run on a PC), it takes advantage of being hosted in the cloud dierentlyby enabling docu-ment sharing between users (such a functionality is more natural since the whole application is run in the cloud).

(44)

Clouds may expose services on slightly dierent levels (theoretically, SaaS could be built on PaaS which could be deployed to IaaS) but there are def-initions which try to nd intersection between these kinds of services. The denition proposed in [27] is quoted below:

Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services). These resources can be dynamically recongured to adjust to a variable load (scale), allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay-per-use model in which guarantees are oered by the In-frastructure Provider by means of customized Service Level Agreements.

This denition includes the key cloud features: virtualization, dynamic scalability, pay-per-use (utility-like) model and SLAs application. Other characteristics of the cloud mentioned in [27] are: user-friendliness, located in (available by) the Internet, variety of resources, automatic adaptation and resource optimization.

Some of these features (especially variety of resources, automatic adaption and resource optimization) could characterize also grids. On the other hand, user-friendliness does not seem to be a feature of grids. Doerksen in [30] even goes as far as saying that Cloud computing is ... the user-friendly version of grid computing, which is an obvious exaggeration.

5.3 Comparison

In 1990s, Grids were thought of as a technology which would allow consumers to use computing capabilities as a public utility (the term Grid itself comes from a comparison to the electrical grid). The growth of Grids was moti-vated by a good causeto provide high computing capabilities transparent to everyone, using open protocols and interfaces.

Two decades later, supported by multi-billion dollar budgets of companies like Amazon, Google and Microsoft, Clouds emerged. The aim of those companies was slightly dierentto bring computing power to the customers. Clouds use mostly proprietary solutions, they oer simple APIs, but there is little to no support for interoperability between dierent providers.

Both the Grid and the Cloud serve a similar purposeto provide large scale computing capabilities for end-users. Table 2 shows a short comparison of the Grid and the Cloud infrastructures in a few dierent elds.

(45)

Grid Cloud

Denitions

In 2002, Ian Foster created a short checklist dening a Grid:

1. Grid resources are not sub-ject to centralized control. 2. Grid uses standard, open,

general-purpose protocols and interfaces.

3. Grid provides non-trivial qualities of service.

Corresponding checklist describing a Cloud:

1. Cloud resources are often governed by a single organi-zation.

2. Proprietary solutions are used, no interoperability between providers has been established so far.

3. Cloud provides non-trivial qualities of service.

Business

Mo

del Resources (i.e. CPU hours) are as-_{signed per project. Virtual} Orga-nizations use resource trading.

End-users pay for the resources consumed (a payment model simi-lar to basic resources such as elec-tricity, water etc.).

Compute

Mo

del Resources are governed by a queu-_{ing system (such as PBS).} Inter-active tasks are rarely supported. However, eorts have been made to lower the latencies to resources.

Resources are shared by all users at the same time. This model al-lows latency-sensitive applications to operate. QoS may be dicult to preserve when the Clouds grow in scale and number of users.

Data

Mo

del As the amounts of data to process grow exponentially, the_{data transfer becomes the bottleneck of large scale computing.} This is the concern of Grids as well as Clouds. Eorts have been made to ensure that data is processed in the nearest available location. Nevertheless, assurance of high QoS in such scenarios seems to be a challenge yet to be faced by both Grids and Clouds.

(46)

Grid Cloud

Virtualization

Grids do not use virtualization as often as Clouds (although some VOs may do so). However, there are technologies such as Nimbus which provide virtualization in the Grid. Its main purpose is to cre-ate virtual workspaces in separa-tion from the physical resources.

Virtualization in the Cloud is used for separating the service from the physical infrastructure. It also helps to provide Service Level Agreement as the virtualized re-sources are easier to raise back in case of failure.

Programming

Mo

del Grid programming does not dif-_{fer much from the standard} dis-tributed programming, although there are some extra factors which have to be taken into considera-tion, such as: restricted access to resources, heterogenic resources, dicult exception handling.

Clouds (e.g. Amazon Web Ser-vices, Microsoft's Azure) gener-ally oer pre-dened web services API (although integration of ap-plications between dierent service providers is not a trivial task).

Securit

y

Mo

del Security has always been a fun-_{damental part of Grids design as} Grid spans on multiple organiza-tions, each applying their own ad-ministration policy. Grids support single sign-on, delegation (user's program may inherit user's access rights) and privacy. User creden-tials required for login are never transported in an insecure way (i.e. emailed)this policy is time-consuming but ensures a high level of security.

Clouds generally use slightly lower level of security than Grids. All user credentials may be creat-ed/changed online. That may present a potential risk (e.g. emailed passwords). It is in Cloud service providers interest to ensure the customers that their data is treated in a Grid-alike (i.e. re-dundant data storage, knowledge of physical localization of data, re-stricted access to data, etc.).

Usabilit

y Although they aim at usability from the beginning, Grids are still not very user-friendly. The devel-oper has to be aware of many mid-dleware tools necessary for using the grid. Applications require a special design to be grid-runnable.

Clouds have been designed for usability. There are no special changes or design required for the applications to run in the cloud.

(47)

Grid Cloud

Standarization

Grids are based on open protocols and interfaces. Interoperability has always been a major concern in the grid design. Well dened stan-dards on every level of communica-tion help to expand, upgrade and maintain the grids.

Clouds are mostly proprietary and the inner mechanisms have been closed for the public. The lack of standards in inner APIs and, for example, virtual machine im-age format is an inhibitory fac-tor for the global expansion of the Clouds.

T

ypical

Usage

Grids are typically used for de-manding batch jobs (spanning hundreds to thousands of nodes) requiring little to none user inter-action.

Clouds are usually used for expos-ing scalable services to the outside.

Table 2: Grid and Cloud comparison (based on [26, 27])

Figure 12 shows a comparison of the Grid and the Cloud architectures. The architectures may seem similar but there are many dierences.

Application

Collective

Connectivity

Fabric Resource

(a) Grid architecture overview.

Application

Platform

Unified Resource

Fabric

(b) Cloud architecture overview Figure 12: Grid and Cloud architectures comparison (based on [26]). The Grid's fabric layer provides access to the underlying resources includ-ing not only computinclud-ing power and storage devices but also code repositories,

(48)

computed data or organization specic equipment. Resource managers (such as PBS) reside in fabric layer. The above connectivity layer ensures stable and secure communication. The resource layer is responsible for manag-ing and restrictmanag-ing/providmanag-ing access to individual resources. The application layer uses the Grid resources provided by lower layers APIs.

The Cloud's fabric layer represents the actual physical resources used by cloud services (most notablythe compute and storage resources). The above unied resource layer exposes and encapsulates the fundamental resources (e.g. in form of a virtual machine or a le system). The higher platform level adds necessary middleware and APIs to the lower layers, so that they can be accessed from the outside. The topmost application layer contains the actual applications running in the cloud.

The Grid's layers seem to be better separated and more specialized than those of Cloud. Each one of them has a single responsibility. This may be the result of the well dened protocols and interfaces of the Grid.

5.4 Summary

In this Chapter, we presented grid and cloud computing. Firstly, various grid and cloud denitions were quoted and, secondly the characteristics of the Grid and the Cloud were specied. We described also the three basic types of services exposed by cloud computing (Infrastructure as a Service, Platform as a Service and Software as a Service).

Later, grid and cloud computing were compared in detail with various aspects taken into consideration, including programming model, computing model and standarization of both infrastructures. Also, we compared sepa-rately the layered architectures.

(49)

6 Accessing infrastructures

This Chapter presents middleware tools used for accessing the grid and the cloud infrastructures described in the previous Chapter. Section 6.1 describes a queuing system used to access local PL-GRID resources (on the example of PBSPortable Batch System). Section 6.2 introduces the GridSpace virtual laboratorya high level framework allowing access to Grid-based resources. Section 6.3 describes an API used to access cloud computing, storage and messaging resources (on the example of the Amazon Web Services API).

6.1 Local resourcesPortable Batch System

One of the main requirements for the MUST tool was integration with locally available PL-GRID resources. Those resources can be accessed using a PBS queue.

The Portable Batch System (PBS)[31] is a tool which allows job schedul-ing on distributed resources (most notably, cluster environments). PBS con-sists of both server and client side components. The server component man-ages queues and jobs, while the client components are mostly the commands allowing the user to handle batch jobs.

The most important PBS features are: • Running tasks serially or concurrently.

• Scheduler which supports priorities and time restrictions.

• Support for policies that restricting access to certain resources for var-ious jobs.

There are tools which use PBS and expose a higher level API, such as the grid middleware PBS-Globus interface[32]. It allows running jobs on the grid-based resources without using grid-specic job scheduling interfaces.

The GridSpace Virtual Laboratory allows using PBS on the PL-Grid resources through the language specic gems (described further in Section 6.2).

Other examples of queuing systems include OGE (Oracle Grid Engine, previously Sun Grid Engine)[42] and LSF (Load Sharing Facility)[43].

6.2 Grid resourcesGridSpace

An integration with the GridSpace virtual laboratory was another vital re-quirement for the MUST tool.

(50)

The GridSpace Virtual Laboratory (GS)[44] is a high-level framework which improves and facilitates the usage of Grid-based resources. It allows scientists to develop, share and execute the so-called virtual experiments. A GridSpace experiment is a set of scripts (each script is called a snippet available languages include Python, Ruby, Perl and Bash) which can be run on the provided HPC resources.

Experiment Workbench Experiment Workbench Layer Experiment Execution Layer

Grid Fabric Layer Gem Layer

Grid Fabric Experiment Execution

Figure 13: GridSpace architecture overview (based on [44]).

GridSpace has been developed with high usability in mind. The main GS advantages include:

• High-level experiments can be prepared by scientists in isolation from the grid middleware layer.

• All GridSpace features are accessible through a simple to use yet pow-erful Web 2.0 portal.

• Usage of the Single Sign On authentication mode (e.g. the PL-Grid installation uses the PL-Grid LDAP directory) so that a logged user may use the grid resources without any further authentication.

• Support for collaborative work (sharing experiments) with secure han-dling of sensitive data such as user credentials or certicates.

Figure 13 shows a simplied overview of the GridSpace layered archi-tecture. The Experiment Workbench (a web browser accessible portal) layer

(51)

includes inter alia the Experiment Console and the Credential Manager. The Experiment Execution Layer (EE) contains a plan of the experiment execu-tion and various script interpreters. The EE layer calls the lower gem layer. Gems are the small libraries providing APIs for the lower level resources in a particular scripting language.

An example grid fabric middleware accessible through the gem layer in-cludes PBS[31], gLite[33], Unicore[34] or QosCosGrid[35].

The role of GridSpace is similar to that of workow systems (e.g. Swift[36], Kepler[37], Taverna[38], Triana[39], MyExperiment[40], Pegasus[41], etc.).

6.3 Cloud resourcesAmazon Web Services

The Amazon Web Services (AWS) based cloud infrastructure was chosen as the second execution environment for the MUST tool. Ease of access, wide use and availability of many middleware tools providing access to AWS were the main reasons behind the choice of this cloud environment. A wide range of services was another signicant advantage.

Amazon provides a web services-based platform allowing users to use the cloud infrastructure. All AWS are accessible through a SOAP API which is a base for many programming languages APIs (including Ruby's RightAWS library[50], AWS Java Library or AWS .NET Library). The most important web services are briey described below.

6.3.1 Elastic Compute Cloud

Amazon Elastic Compute Cloud (EC2)[45] is the core web service provid-ing scalable computprovid-ing capabilities. A simple API allows developers to run and manage numerous virtual machine instances (types dierentiating in conguration of memory, CPU and hard drive are available). Various oper-ating systems (including Microsoft Windows Server and many distributions of Linux) may be used. Virtual machine images with preinstalled specialist software (databases, web hosting software, etc.) are accessible.

A virtual machine (called an instance) may be started by means of a SOAP API. It can be accessed via SSH. EC2 has been designed for easy use with other AWS, such as Simple Storage Service (S3), Relational Database Service (RDS) or Simple Queue Service (SQS).

6.3.2 Simple Storage Service

Amazon Simple Storage Service (S3)[46]allows users to store data online. Stored data may be public or private. Data is stored in so-called buckets. A

(52)

bucket is accessible through various authentication methods, while an object within a bucket is identied by a unique name. Each object may contain up to 5 terabytes of data. Standard REST and SOAP APIs are available for the data manipulation. The HTTP protocol is the default method of data transfer but a BitTorrent interface may also be used. The data is stored redundantly in a few localizations to minimize the probability of data loss. 6.3.3 Elastic Block Storage

Amazon Elastic Block Storage (EBS)[47] provides storage volumes which can be used with running EC2 instances. Usual usage examples of such volumes are databases, le systems or raw block level storage. The state of EBS volumes persists while detached from running EC2 instance, redundant copies of the data are automatically created (transparently for the end-user). It is possible to create a so-called snapshot of an EBS volume, which may serve as a backup copy or the source for a new EBS volume. Amazon hosts public data sets which may be used as an EBS volume (examples of data available in such sets include: Annotated Human Genome Data, US Census Databases, Freebase Data Dump, etc. [48]).

6.3.4 Simple Queue Service

Amazon Simple Queue Service (SQS)[49]is a simple distributed queue mes-saging service. SQS may be used for building a workow whose elements may be on dierent networks, using dierent technologies and not even running at the same time. SQS allows creating, reading and deleting messages. A read message may become locked (i.e. not visible) for other machines processing messages from the same queue. If processing fails, message is unlocked again. Otherwise, the message may be deleted so that it is not processed multiple times.

SQS queues may be shared between applications (i.e. multiple applica-tions may have the access to the same queue) but simultaneously access to the queues is restricted and requires the usage of one of the few authentication methods (unless the queue is accessible anonymously).

6.4 Summary

This Chapter presented dierent middleware tools operating on various levels used by MUST to access and manage resources provided by the Grid and the Cloud infrastructures.

(53)

The PBS queuing system which is used by the MUST tool to execute applications on the local PL-GRID infrastructure was described in Section 6.1.

The GridSpace virtual laboratory, whose web interface allows the usage of the MUST tool, was introduced in Section 6.2. Conguration les, startup arguments and input sandboxes may be dened from the GS level. While running in the Grid environment, MUST employs the previously mentioned PBS pbsdsh command to allocate nodes and run a slave script on each of them.

Amazon Web Services API described in Section 6.3 is used to communi-cate with AWS from the MUST sender script level.

Detailed information regarding the usage of the mentioned tools is pro-vided in subsequent chapters.

(54)

marcin nowak msc

AGH

University of Science and Technology in Krakow

M

ASTER OF

S

CIENCE

T

HESIS

M

ARCIN

N

OWAK

M

ULTISCALE APPLICATIONS COMPOSITION AND

EXECUTION TOOLS BASED ON SIMULATION MODELS

DESCRIPTION LANGUAGES AND COUPLING LIBRARIES

Akademia Górniczo-Hutnicza

im. Stanisława Staszica w Krakowie

P

RACA MAGISTERSKA

M

ARCIN

N

OWAK

N

ARZ ˛

EDZIA DO KONSTRUOWANIA I WYKONYWANIA

WIELOSKALOWYCH APLIKACJI W OPARCIU O J ˛

EZYKI

OPISU MODELI SYMULACYJNYCH I BIBLIOTEKI Ł ˛

ACZ ˛

ACE

Contents

1 Introduction

1.1 Problem outline

1.2 Goals and scope

1.3 Organization of this Thesis

1.4 Contribution of other authors

2 Multiscale applications

2.1 Introduction

2.2 Requirements

2.3 Examples

2.4 Summary

3 Model description languages

3.1 SBML and CellML

3.2 MML and CxA

3.3 Comparison

3.4 Summary

4 Coupling libraries

4.1 AMUSE

4.2 MCT

4.3 MUSCLE

4.4 Comparison

4.5 Summary

5 Infrastructures

5.1 Grid

5.2 Cloud

5.3 Comparison

5.4 Summary

6 Accessing infrastructures

6.1 Local resourcesPortable Batch System

6.2 Grid resourcesGridSpace

6.3 Cloud resourcesAmazon Web Services

6.4 Summary

6.1 Local resourcesPortable Batch System

6.2 Grid resourcesGridSpace

6.3 Cloud resourcesAmazon Web Services