Index of /rozprawy2/10381

Pełen tekst

(1)A KADEMIA G ÓRNICZO H UTNICZA IM . S TANISŁAWA S TASZICA W K RAKOWIE W YDZIAŁ E LEKTROTECHNIKI , AUTOMATYKI , I NFORMATYKI I E LEKTRONIKI K ATEDRA I NFORMATYKI. Szacowanie efektywno´sci aplikacji gridowych z wykorzystaniem czasowych kolorowanych sieci Petriego. Rozprawa doktorska. Autor:. mgr in˙z. Wojciech R ZA¸ SA. Promotor: prof. dr hab. in˙z. Edward N AWARECKI. Kraków, kwiecien´ 2011 r..

(2) AGH U NIVERSITY. OF. S CIENCE AND T ECHNOLOGY. FACULTY OF E LECTRICAL E NGINEERING , AUTOMATICS , C OMPUTER S CIENCE AND E LECTRONICS D EPARTMENT OF C OMPUTER S CIENCE. Timed Colored Petri Net Based Estimation of Efficiency of the Grid Applications. PhD Dissertation. Author:. mgr in˙z. Wojciech R ZA¸ SA. Supervisor: prof. dr hab. in˙z. Edward N AWARECKI. Kraków, April 2011.

(3) Acknowledgments I would like to thank my supervisor, prof. Edward Nawarecki, for his support and valuable remarks that helped me develop my research and improve this dissertation. I express my special gratitude to dr Marian Bubak, who consulted this research, spent a lot of time and effort to help me improve this work and whose kindness and scientific advise I could always count on. Without his invaluable help and support this research would have not been possible. My gratitude goes to prof. Marian Wysocki for his interest, valuable consultations and suggestions that helped me develop my research. I would like to thank prof. Leszek Trybus for his continuous support and effort to ensure conditions to scientific work in Division of Informatics and Control at Rzeszow University of Technology. I am grateful to prof. Jacek Kluska for his valuable remarks concerning concepts of this research at its beginning. I would like to acknowledge valuable help of dr Sławomir Samolej that helped me to familiarize myself with the Petri nets, patiently answered to my questions and was always ready to offer his advice and help. I express my gratitude to my dad, Jacek Rza¸sa, that was the first consultant of my concepts and that spent his time and effort to help me improve elaboration of the experimental and simulation results. I owe my thanks to Bartosz Bali´s, Tomasz Szepieniec and to my colleagues from Division of Informatics and Control at Rzeszow University of Technology, with special gratitude for Dariusz Rzońca for all discussions and valuable remarks. Warm thanks to my wife and daughter for their patience and to whole my family for their support and help..

(4) Polskie streszczenie Odkad ˛ pojawiła si˛e idea Gridu, coraz bardziej popularne staja˛ si˛e lu´zno powiazane, ˛ silnie rozproszone aplikacje działajace ˛ na heterogenicznych zasobach i komunikujace ˛ si˛e za po´srednictwem Internetu. Moz˙ na oczekiwać, z˙ e rozwój koncepcji przetwarzania gridowego umoz˙ liwi dost˛ep do wysokiej jako´sci zasobów nie tylko naukowcom, ale takz˙ e ogółowi społeczeństwa. Do tego jednak, aby wykorzystać zasoby komputerowe konieczne jest efektywne oprogramowanie, a projektowanie i implementacja wysokiej jako´sci oprogramowania dla tak wymagajacego ˛ s´rodowiska jest złoz˙ onym zadaniem. Z tego wzgl˛edu konieczny jest rozwój narz˛edzi przeznaczonych dla twórców aplikacji i wspomagajacych ˛ ich podczas rozwiazywania ˛ nowych problemów. Badania opisane w niniejszej rozprawie dotycza˛ metody przeznaczonej dla twórców aplikacji i ułatwiajacej ˛ estymacj˛e efektywno´sci aplikacji przeznaczonych dla s´rodowiska rozproszonego, w szczególno´sci dla Gridu. Metoda pozwala twórcom opisać koncepcj˛e aplikacji oraz wykorzystywanych przez nia˛ zasobów na wysokim poziomie abstrakcji, umoz˙ liwiajac ˛ w ten sposób wygodna˛ analiz˛e, je´sli to konieczne juz˙ na wczesnym etapie projektowania. Taki model jest pó´zniej automatycznie przekształcany do formalizmu Czasowych Kolorowanych Sieci Petriego (ang. Timed Colored Petri Nets – TCPN), który umoz˙ liwia wiarygodne odzwierciedlenie akcji zachodzacych ˛ w systemach rozproszonych. Automatyczna transformacja modeli jest wykonywana bez udziału twórców aplikacji, dzi˛eki czemu nie musza˛ oni znać formalnej metody analizy, która moz˙ e być dla nich nieintuicyjna, czy niewygodna. Dokładno´sc´ szacowania jest zapewniana przez formalizm sieci Petriego, wspomagany przez zewn˛etrzne, wymienne moduły, modelujace ˛ problemy zwiazane ˛ z transmisja˛ danych w sieciach komputerowych. Wynikiem jest elastyczna i wygodna metoda analizy przeznaczona dla twórców i projektantów aplikacji rozproszonych, bazujaca ˛ na symulacji wykonywanej za pomoca˛ sieci Petriego. Metoda ta wypełnia luk˛e pomi˛edzy twórcami aplikacji, którzy niejednokrotnie nie sa˛ zainteresowani inwestowaniem swojego czasu i wysiłku w analiz˛e swoich projektów bazujac ˛ a˛ na metodach formalnych, a pomi˛edzy formalizmem Czasowych Kolorowanych Sieci Petriego, który moz˙ e zapewnić istotne wsparcie na róz˙ nych etapach tworzenia aplikacji. Wiarygodno´sc´ metody została zweryfikowana na podstawie prototypowej implementacji, poprzez porównanie wyników uzyskanych podczas symulacji z wynikami rzeczywistych eksperymentów. Porównanie to wykazało, z˙ e metoda moz˙ e dostarczyć warto´sciowych informacji o efektywno´sci analizowanej aplikacji. Głównym wkładem opisywanych badań jest wyz˙ ej opisana metoda symulacji i połaczenie ˛ zalet analizy opartej na formalizmie wspartym specjalizowanymi modułami z wygoda˛ szybkiego modelowania, na wysokim poziomie abstrakcji. Opisano takz˙ e rozwiazania ˛ dla kilku dobrze zdefiniowanych problemów naukowych i technicznych, które napotkano podczas realizacji badań, a które moga˛ być uz˙ yteczne podczas innych prac..

(5) Abstract Since the idea of the Grid have emerged, loosely coupled, highly distributed applications running on heterogeneous resources connected with Internet links become more and more popular. The concept promises exceptionally efficient access to high-performance resources not only for scientists, but also for the general public. However to make the resources widely available efficient software is required. Design and development of high-quality software for such demanding environment is a challenging issue. Therefore developer oriented tools are necessary in order to support the design and development process and support developers to handle the new challenges. The work described in this dissertation concerns developer oriented method that facilitates estimation of efficiency of applications designed for distributed environment, with special attention payed to the Grid. The method enables developers to describe their concepts of applications and exploited resources on high level of abstraction, enabling convenient analysis, if necessary on early stages of development. The model is then automatically transformed to the formalism of Timed Colored Petri Net (TCPN), being capable of reliable reflection of activities that appear in distributed systems. The automatic transformation does not involve application developers, consequently they need not be aware of the formal analysis methods that may be not intuitive for them. Precision of estimation is ensured by the formalism of TCPN supported by specialized external modules that model network-related issues and can be easily swapped. The effect is flexible and convenient analysis method for developers of distributed applications based on discrete-event simulation of a Petri net. The method fills a gap between application developers, frequently not willing to invest their time and effort in formalism-based analysis of their designs and the formalism of Timed Colored Petri Nets, that is capable of providing valuable support on different stages of software development. Reliability of the method was verified on the basis of prototype implementation by comparison of results obtained from simulations and real-world experiments. Precision of the results showed that the method is capable of providing valuable estimations. The main contribution of this research is the simulation method described above and combining virtues of analysis based on a formalism and supported by specialized modules, with convenience and rapid modeling. Additionally, the thesis describes solutions for several welldefined scientific and technical problems that were encountered while the research and may be useful in the other works. These include: a basis for research concerning a convenient for developers, complete model of application and resources, modular design of discrete-event simulator combined with the concept of hybrid simulation ensuring flexibility and precision of the solution, incorporation of TCPN into an application implemented with a general purpose programming language that provides ability to improve quality of software by reliability of the formalism, interconnection between TCPN and another discrete-event simulator that allows to analyze different scopes of a problem with specialized solutions in a single simulator..

(6) Contents 1 Research Scope and Roadmap 1.1 Motivation for Analysis of the Grid Applications 1.2 Distributed Computing Infrastructures in Europe 1.3 The Goal and the Hypothesis of the Research . . 1.4 User Requirements . . . . . . . . . . . . . . . . 1.4.1 Pass Over the Grid Management Systems . 1.4.2 Convenient Modeling . . . . . . . . . . . . 1.4.3 Reliable Results . . . . . . . . . . . . . . . 1.4.4 Wide Spectrum of Analyzed Applications . 1.5 Scientific Problems . . . . . . . . . . . . . . . 1.5.1 The High-level Model . . . . . . . . . . . . 1.5.2 Reliable Analysis . . . . . . . . . . . . . . 1.5.3 Contradictions Between Requirements . . . 1.6 Summary of the Goals and the Requirements . . 1.7 Research Roadmap . . . . . . . . . . . . . . . 1.7.1 Evolution of the Concept . . . . . . . . . . 1.7.2 The Roadmap . . . . . . . . . . . . . . . . 1.8 Organization of This Thesis . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. 2 Related Work 2.1 Performance Prediction in the Grid . . . . . . . . . . . . . . . . . 2.2 Performance Related Support for the Grid Application Developers 2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Simulation in the Grid . . . . . . . . . . . . . . . . . . . . . 2.3.2 Simulation of non-Grid Distributed Applications . . . . . . . 2.4 Simulation in Developer Oriented Performance Prediction... . . . . 2.4.1 DIMEMAS . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Performance Prophet . . . . . . . . . . . . . . . . . . . . . . 2.4.3 POEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Estimation of TCP Transmission Efficiency . . . . . . . . . . . . . 2.5.1 TCP Characteristics . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 TCP Modeling and Analysis . . . . . . . . . . . . . . . . . . 2.5.3 TCP Models in the Grid . . . . . . . . . . . . . . . . . . . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. 1 1 3 4 5 5 5 6 6 6 6 7 8 9 9 9 10 12. . . . . . . . . . . . . . .. 13 14 15 16 16 17 17 17 18 19 19 19 20 21 21.

(7) Contents. vii. 3 Petri Nets: Modeling Language for Distributed Applications 3.1 Low Level Nets . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Definition and Representation . . . . . . . . . . . . 3.1.2 The Other Naming Conventions and Net Classes . . . 3.1.3 Modeling Basics . . . . . . . . . . . . . . . . . . . 3.1.4 Behavioral Properties . . . . . . . . . . . . . . . . . 3.1.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Analysis Methods . . . . . . . . . . . . . . . . . . . 3.1.7 Modeling, Analysis and Simulation . . . . . . . . . 3.2 Colored Petri Nets . . . . . . . . . . . . . . . . . . . . . 3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Behavior . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Expressiveness and Relation to Low-Level Nets . . . 3.2.4 Hierarchical Colored Petri Nets . . . . . . . . . . . . 3.2.5 Properties and Formal Analysis . . . . . . . . . . . . 3.2.6 Simulation . . . . . . . . . . . . . . . . . . . . . . . 3.3 Time in Petri Nets . . . . . . . . . . . . . . . . . . . . . 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The Petri Nets Formalism . . . . . . . . . . . . . . . 3.4.2 Application in Distributed Applications Modeling . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. 24 25 25 26 26 28 31 31 35 35 36 37 38 38 39 39 39 40 41 41. 4 Developer-oriented Modeling and Analysis of Distributed Applications 4.1 Model of Distributed Application and Environment . . . . . . . . . 4.1.1 Describing Dynamic Parameters of the Model . . . . . . . . . . 4.1.2 Model of Resources . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Model of Distributed Application . . . . . . . . . . . . . . . . . 4.1.4 Mapping of Application and Resources . . . . . . . . . . . . . 4.2 Enabling Model-based Analysis of the Applications . . . . . . . . . 4.2.1 Analysis by Simulation . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Requirements for Reliable Simulation . . . . . . . . . . . . . . 4.2.3 Enabling Simulation . . . . . . . . . . . . . . . . . . . . . . . 4.3 Formalization of the Model . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Mapping of Application and Resources . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. 43 43 44 45 48 50 52 52 52 53 54 54 55 56. 5 Enabling Reliable Simulation 5.1 Architecture of the Simulation Method . . . . . . 5.2 The Main Part of the Model . . . . . . . . . . . . 5.2.1 Timed Colored Petri Net Model . . . . . . . 5.2.2 The non-TCPN parts of the Model . . . . . . 5.3 Network Schedulers . . . . . . . . . . . . . . . . 5.4 Network Transport . . . . . . . . . . . . . . . . . 5.4.1 Model of the Connection Establishment Phase 5.4.2 Interfaces . . . . . . . . . . . . . . . . . . . 5.4.3 Available modules . . . . . . . . . . . . . . . 5.4.4 State of the TCPN and an External Module .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 57 57 59 59 69 70 71 71 72 79 80. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . ..

(8) Contents 5.5. viii. Automatic Transformation from High-level Model 5.5.1 Mapping of Application and Resources . . . 5.5.2 Model of Resources . . . . . . . . . . . . . . 5.5.3 Model of Application . . . . . . . . . . . . . 5.5.4 Summary . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 80 81 81 83 85. 6 Simulation Tool: the Basis for Evaluation of the Method 6.1 Overview of the Tool’s Architecture . . . . . . . . . 6.2 The High-level Model . . . . . . . . . . . . . . . . 6.3 The Executable Model – Timed Colored Petri Net . 6.4 Automatic Transformation – the Model Parser . . . 6.5 Simulation of the Petri Net Model . . . . . . . . . . 6.6 Expression Interpreter . . . . . . . . . . . . . . . . 6.7 Network Scheduler Modules . . . . . . . . . . . . 6.8 Network Transport Modules . . . . . . . . . . . . . 6.9 Simulator Output . . . . . . . . . . . . . . . . . . 6.10 Summary . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 86 86 88 89 90 91 91 92 92 93 94. 7 Validation of the Model and the Tool 7.1 Kinds and Goals of the Experiments . . . 7.2 Partial Experiments . . . . . . . . . . . . 7.2.1 Configuration and Environment . . . . 7.2.2 Network Bandwidth . . . . . . . . . . 7.2.3 Network Capacity . . . . . . . . . . . 7.2.4 CPU Load . . . . . . . . . . . . . . . 7.2.5 Conclusions . . . . . . . . . . . . . . 7.3 Comprehensive Experiment . . . . . . . . 7.3.1 Configuration . . . . . . . . . . . . . 7.3.2 Analysis of the Results . . . . . . . . 7.4 Conclusions . . . . . . . . . . . . . . . . 7.5 Possible Future Application of the Method 7.5.1 Experiments . . . . . . . . . . . . . . 7.5.2 Conclusions . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. 95 95 96 96 98 98 100 103 108 109 110 112 113 113 117. 8 Summary and Future Work 8.1 Summary of the Goals and Achievements . . 8.1.1 Hypothesis and Goal of the Research . . 8.1.2 Responding to the User Requirements . 8.1.3 Solutions for the Scientific Problems . . 8.2 Possible Future Research and Development . 8.2.1 High-level Model . . . . . . . . . . . . 8.2.2 Executable Model . . . . . . . . . . . . 8.2.3 Output Management Solution . . . . . . 8.2.4 External Network Modules . . . . . . . 8.2.5 Production-ready Implementation . . . 8.3 Conclusions . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. 118 118 118 119 120 123 123 124 124 124 124 125.

(9) Contents. ix. Bibliography A APIs for External Network Transport Modules A.1 Higher-level Interface . . . . . . . . . . . . . . . A.2 Lower-level Interface . . . . . . . . . . . . . . . A.2.1 Static Methods – Controlling Simulation . . A.2.2 Instance Methods – Controlling Connections. 127. . . . .. 134 134 135 136 136. B Document Type Definitions for High-level Model B.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Resource Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 139 139 140 141. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. C Simulation Tool Usage 142 C.1 GUI Based Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 C.2 Command Line Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 D Test Application D.1 Communication Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.2 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.3 Command Line Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . .. 144 144 145 145.

(10) List of Figures 3.1 3.2 3.3 3.4 3.5 3.6 3.7. Petri net model of concurrency. . . . . . . . . . . . . . . . . . . . . . Petri net model of conflict. . . . . . . . . . . . . . . . . . . . . . . . . Petri net model of synchronization. . . . . . . . . . . . . . . . . . . . Petri net model of concurrent processing. . . . . . . . . . . . . . . . . Petri net model of a task exploiting a resource with self-loop. . . . . . . Petri net model of a task exploiting a resource. . . . . . . . . . . . . . Petri net model of two concurrent tasks competing for a single resource.. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 26 27 27 27 27 28 28. 4.1. Example of High-level model of application and resources . . . . . . . . . . .. 44. Conceptual architecture of the simulator . . . . . . . . . . . . . . . . . . . . . Simplified Petri net model of the connection establishment stage of network communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Simplified Petri net model of triggering new data transmissions depending of simulation time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Example of data traversal via subsequent layers of sender’s and receiver’s protocol stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Simplified Petri net model of data transmission over a network connection. In order to facilitate interpretation the main flow of data tokens is marked with bold arcs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Simplified Petri net model of the node part of data transmission over a network connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Simplified Petri net model of network transmission. . . . . . . . . . . . . . . 5.8 Simplified Petri net model of data processing after the data are received from a network connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Simplified Petri net model of generating new data after reception of a data package from a network connection. . . . . . . . . . . . . . . . . . . . . . . . 5.10 Simplified Petri net model of loading node’s CPUs independently from network traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Interconnection between TCPN based model of application and external model of network protocol on the example of TCP. . . . . . . . . . . . . . . . . . . 5.12 Algorithm of synchronization of clocks and events between TCPN based model of application and external model of network protocol. . . . . . . . . . . . . .. 58. 76. 6.1. Architecture of the Simulation Tool . . . . . . . . . . . . . . . . . . . . . . .. 87. 7.1. Topology and configuration of resources for the partial experiments. . . . . . .. 97. 5.1 5.2. 61 62 63. 64 65 66 67 68 69 73.

(11) List of Figures 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13. Results of the network bandwidth experiment. . . . . . . . . . . . . . . . . . Results of the network capacity experiment. . . . . . . . . . . . . . . . . . . Results of the node3 CPU load experiment. . . . . . . . . . . . . . . . . . . . Results of the node3 CPU load experiment – limited range. . . . . . . . . . . Results of the node2 CPU load experiment. . . . . . . . . . . . . . . . . . . . Results of the node2 CPU load experiment – limited range. . . . . . . . . . . Results of the CPU load experiment with all CPUs performing computations. . Results of the CPU load experiment with all CPUs performing computations – limited range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topology and configuration of resources for the Comprehensive Experiment. . Results of the Comprehensive Experiment. . . . . . . . . . . . . . . . . . . . Results of the security overhead test. . . . . . . . . . . . . . . . . . . . . . . . Size of the packets and CPU time consumption for different security levels. . .. xi. 99 101 102 103 104 105 106 106 109 111 115 116.

(12) List of Tables 2.1 2.2. Summary of the related works and tools (examples), part 1. . . . . . . . . . . Summary of the related works and tools, part 2 – the directly related works. . .. 22 23. 4.1 4.2. Grammar of expressions describing parameters in High-level model. . . . . . . Description of symbols available in expressions describing parameters in Highlevel model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of the model of resources from High-level model, its parameters and available symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of the model of application from High-level model, its parameters and available symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45. 4.3 4.4 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10. Results of the network bandwidth experiment. . . . . . . . . . . . . . . . . Results of the network capacity experiment. . . . . . . . . . . . . . . . . . . Results of the node3 CPU load experiment. . . . . . . . . . . . . . . . . . . Results of the node2 CPU load experiment. . . . . . . . . . . . . . . . . . . Results of the CPU load experiment with all CPUs performing computations. Results of the Comprehensive Experiment. . . . . . . . . . . . . . . . . . . . Security levels used in the experiment. . . . . . . . . . . . . . . . . . . . . . Average CPU time for 100-byte packet. . . . . . . . . . . . . . . . . . . . . Results of the simulated DoS attack. . . . . . . . . . . . . . . . . . . . . . . TCP connections established in one second. . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. 46 49 51 99 100 102 104 105 110 114 114 117 117.

(13) 1. Chapter. Research Scope and Roadmap This chapter introduces the problems discussed in the thesis. It presents motivation for the research concerning efficiency of distributed applications, its goal and hypothesis. The research area is specified by description of requirements concerning the analysis method and discussion of scientific problems resulting from the demands. Finally a roadmap of the research and outline of the thesis is presented. The work described in this thesis concerns analysis and assessment of efficiency of distributed applications, especially the ones designed for the Grid environment [29, 30]. This chapter is aimed at providing introductory information about the area of the research and about the thesis itself.. 1.1 Motivation for Analysis of the Grid Applications The first motivation for the research concerning analysis of the Grid applications came from the work on security of the Grid application monitoring system – OCM-G [9, 10]. A security solution was an obvious necessity for the monitoring system capable of not only delivering information, but also manipulating application processes on different Grid sites. The security solution implemented in the OCM-G was based on the Globus Toolkit1 version 2 that among the other functionalities provided also implementation of Grid Security Infrastructure2 (GSI). The solution based on cryptography ensured required security level, however it introduced additional overhead, that in this case had special impact on communication. Experiments verifying efficiency of secured OCM-G proved that the overhead was acceptable for the on-line monitoring system, however it could not be considered negligible [74]. 1 2. http://www.globus.org http://www.globus.org/toolkit/docs/2.4/gsi/.

(14) Chapter 1. Research Scope and Roadmap. 2. Further research [11] showed that not only transmission efficiency is affected by including cryptography-based security solutions in distributed software. Security of data transmission over a network is ensured by symmetric cryptography that provides confidentiality, supported by Message Authentication Code (MAC) computed using a hash algorithm that ensures integrity [25]. These require data processing that however needs noticeable CPU time, but should be considered relatively efficient. The other security aspects, e.g. authentication, non-repudiation, together with establishment of the secure channel enabling above described data transfer, are handled while connection establishment phase. This stage of communication exploits asymmetric cryptography algorithms that are capable of delivering required functionality, but also require significantly more computations. While research described in [11] we have observed that a considerably fast server was able to establish only 30 secured connections per second while the same machine could establish almost 1700 raw TCP connections per second. Conclusion is that the connection establishment phase may not only decrease efficiency of the whole application in case of wasteful management of connection pool, but also expose it to a Denial of Service (DoS) attack. Obviously the security is not the only source of communication overhead in distributed applications. It is frequent in computer science to solve different problems by means of an additional layer of abstraction. If the layer includes communication it usually requires additional meta-data to be transmitted and also data processing to be performed. Thus the already mentioned security overhead problem it is not an isolated case. Web-services related research is an important example for the Grid since the technology is the basis for OGSA3 and WSRF4 , included in the Globus Toolkit from version 3 and exploited in a number of Grid projects. In Web-services XML based communication data are packed into an ”envelope” of XML and the XML messages need to be generated by the sender and parsed by the receiver. Thus an overhead for both: volume of transmitted data and data processing is introduced. It is especially important for binary data transmissions, for the sake of proper encoding required to include them in an XML message. The former research led to conclusion that it would be useful for a developer to be able to assess efficiency of solutions designed for an application before it is implemented. Thus one would be able to verify how implementation of required security level in communication influenced its efficiency on a defined infrastructure, or how would hardware requirements change if a more convenient, e.g. Web-services based, architecture was exploited. Estimating the efficiency before actual implementation leads not only to saving unnecessary work but also to improved quality of the software, tailored to the needs, with less unforeseen alterations or optimizations. The need for estimation of efficiency of applications is connected not only with communication overhead. The architecture of developed application, structure of data processing and transmissions, topology of processes and connections, required parameters of resources and network links could be verified and assessed at least to some extent facilitating the development process and supporting the developers’ decisions. Frequently final, efficient distributed application is a result of a number of iterations that include optimizations, remodeling or rewriting fragments of the program. If only a few of them were done on the basis of simulation the whole process would be significantly more convenient and less expensive. The Grid environment is characterized by a complex set of resource management and job 3 4. http://www.globus.org/ogsa/ http://www.globus.org/wsrf/.

(15) Chapter 1. Research Scope and Roadmap. 3. scheduling solutions. Consequently, the actual location of subsequent elements (processes or sets of processes) of analyzed application is unknown in advance and is unlikely to be controlled by a user. This fact makes experimenting with real applications even more difficult. Concurrently, the estimation of efficiency, dependent on the actual assignment between application and resources must necessarily be done for a specific resource configuration. Therefore the estimation can be started after the considered application is assigned to a specific set of resources. The useful method should then pass over the Grid infrastructure and assume that the analyzed application was already assigned by the Grid infrastructure to a specific set of resources before the estimation starts. If the set of resources is configured by the application developers, then the developers will be given a chance to check how their application will behave if it is assigned by the Grid management systems to the specific resource configuration. This gives a possibility to investigate and assess a solution in the crucial extreme configurations and also in the ones that are predicted as usual and desirable runtime environment. This approach is also useful for designers of the applications being the part of the Grid infrastructure and thus not being managed by the Grid schedulers. The need for support for distributed application developers was obviously noticed before the Grid concept emerged. Consequently tools like simulation-based MPISim [69] and COMPASS [6] aimed at estimating efficiency of applications were already developed. This problem was also researched in the Grid related projects. The more complete survey of related work is presented in Chapter 2.. 1.2 Distributed Computing Infrastructures in Europe Intensive research of the Grid concept and the need for pervasive access to high-quality computing resources resulted in Europe in emerging of distributed computing infrastructures. The most important examples are described below. The European Grid Initiative5 (EGI) is the European Grid project that is the successor for the Enabling Grid for E-science in Europe6 (EGEE), Enabling Grid for E-science in Europe II7 (EGEE II) and Enabling Grid for E-science in Europe III (EGEE III) projects. The goal of the EGEE projects was to create Grid infrastructure for European scientists. A number of the other Grid projects were cooperating with the EGEE, EGEE II and EGEE III providing and using the resources. The infrastructure that emerged while these three projects and the DataGrid8 project (mainly based on clusters of PCs) is now maintained and expanded in the EGI. The EGEE projects were meant to create and organize the infrastructure while the EGI is a federation of resource providers coordinated by EGI.eu foundation. The computer cluster based EGI infrastructure provides the European researchers of different fields with free access to the computing resources. It also maintains an Application Database that stores tailor-made computing tools for scientists to use. The Partnership for Advanced Computing in Europe9 (PRACE), is an infrastructure that provides access to High Performance Computing (HPC) infrastructure for Europe. Each system available in the PRACE project offers computing power of several petaFLOPS. The target 5. http://www.egi.eu/ http://public.eu-egee.org/ 7 http://www.eu-egee.org/ 8 http://eu-datagrid.web.cern.ch/eu-datagrid/ 9 http://www.prace-project.eu/ 6.

(16) Chapter 1. Research Scope and Roadmap. 4. is to reach power of exaFLOPS in the year 2019. It is cooperating with Distributed European Infrastructure for Supercomputing Applications10 (DEISA) and is aiming at integrating many of its aspects. The project assumes three-tier distributed model with few European-level centers offering computing service at the highest performance level, national centers with performance sufficient to run most of the advanced computing and local computer centers in universities, research labs etc. The PRACE and DEISA cooperate with the EGI in order to ensure interoperability between the infrastructures. The Grid environment was the original motivation for the research presented in this thesis, therefore it is treated as a reference infrastructure. However, applications developed for the other distributed environments and their developers may also benefit from the presented solutions. Particularly, dynamically emerging Cloud Computing (CC) technologies may be the kind of distributed infrastructure where the method can be exceptionally useful. The virtualized, highly distributed infrastructure as CC may benefit from estimation of efficiency of distributed applications similarly to the Grid. From a cloud owners point of view it is important since the method may support improvements in efficiency of resource management and also may facilitate development of user oriented software. From the users point of view it may be important to verify when it is profitable to deploy their application on a cloud or clouds and what gain of efficiency should be expected.. 1.3 The Goal and the Hypothesis of the Research Among different tools designed to support development of distributed applications (discussed in details in Chapter 2) there lacks a solution that could reliably support estimation of efficiency of designs on their early stages of development. The gap in the spectrum of developer-oriented tools, particularly meant for the Grid environment needs to be filled, in consequence making developers able to verify their designs before actual, usually laborious implementation. The central hypothesis of the research is that it is possible to enable the developers of distributed applications to conveniently model and reliably analyze their designs before actual implementation in order to estimate efficiency on the basis of communication overhead, topology of the application, basic information about the application logic and also state and parameters of exploited resources. The reliable results can be provided thanks to advantages of Timed Colored Petri Nets [40, 41] designed to enable modeling and analysis of concurrent activities taking time into consideration. The method operating on high level of abstraction and implementing automatic transformation of the models should not require a user to be aware of details concerning lower level model exploiting Petri nets. The objective, verifying the hypothesis, is to design and validate on the basis of a prototype implementation a method capable of estimating efficiency of distributed applications, particularly designed for the Grid environment, on the basis of limited information concerning resources and the application. The estimation should be possible on the early stages of application development, when not all details are determined. Quick prototyping enabled by limited minuteness of the model should enable assessment of different possible design options. The method based on the formalism of TCPN should deliver reliable estimations to the application developers and its design should ensure the developers need not be familiarized with, or even aware of the exploited formalism, for their convenience. 10. http://www.deisa.eu/.

(17) Chapter 1. Research Scope and Roadmap. 5. 1.4 User Requirements Basic requirements of the research were mentioned in description of the hypothesis and the goal. This section is meant to cover and discuss the details. The user requirements concerning the method are related to four main issues described below. They result from the general demand to make the method suitable for rapid prototyping by developers, even on early stages of application design and development.. 1.4.1. Pass Over the Grid Management Systems. As discussed in Section 1.1 the estimation of efficiency of an application should start when it is already assigned to a set of resources. The Grid management systems should not be considered for several reasons. First, the problem of predicting behavior of schedulers is sufficiently complex for separate research. As such it would certainly shade results and make using the estimation method more difficult. Second, the estimation should be done in a predictable, repeatable environment. Thus the application developers would be able to improve their designs, repeat estimations and obtain results suitable for comparison. Including the Grid schedulers into the method would effectively prevent this scenario. Third, when considering application design the developers require estimation of the behavior of their application depending on the actual assignment to specific set of resources, not on the methods and algorithms used by the schedulers, information services or the other Grid management applications that are obviously affected by a number of other factors and are usually unknown at the development stage. Thus considering the Grid management systems seems not justified.. 1.4.2. Convenient Modeling. Convenience of modeling is an obvious demand for developer oriented methods. Usually developers are not interested in putting significant effort in learning analysis methods, formalisms and creating models, especially if the effort is not insignificant in comparison to real implementation. If a method does not deliver comfortable modeling, demanding limited work, developers are usually more willing to undertake real implementation and tests then model-based analysis. The requirement of convenient modeling concerns the level of abstraction on which the model should be created and number of details that should be provided. Developer should be able to include in the model all important characteristics of the system, but should not be forced to provide a lower level descriptions concerning functionalities. The structure of the system and data transmissions, volumes of processed data is an obvious essential part of the system design that should be analyzed, similarly demands concerning exploited resources. However the actual functionality and data processing details can safely be omitted saving unnecessary work. The second important issue is to allow a developer to describe his or her application using primitives that he or she considers natural and intuitive. The view of analyzed application required by the simulation method should correspond to the view on which the developers operate on everyday. Consequently the developers will not be forced to adjust themselves to provide.

(18) Chapter 1. Research Scope and Roadmap. 6. the method with required information which is both: tiresome and error-prone, the modeling will be more comfortable and the effort required to exploit the method will be decreased.. 1.4.3. Reliable Results. In order to be useful the method should obviously be able to provide the developers of applications with the results reliable enough to improve their designs and support development decisions. The actual required precision is however difficult to express in numbers, since it strongly depends on the analyzed application, phase of development and thus precision of the provided model. Additionally in some cases rapid analysis allowing to survey and compare a number of different designs in relatively short time is more valuable then high precision, thus it may be useful to develop such option. However the reliability and precision in general cannot be exactly defined as to the values, but it should be considered in the research and assessment of the method.. 1.4.4. Wide Spectrum of Analyzed Applications. There are analysis and simulation methods tailored for specific classes of applications, we present a survey in Chapter 2. This method operating on high level of abstraction is not meant to be connected to a specific distributed architecture or library. It also should not be limited to specific groups of applications. CPU intensive programs should be analyzed as well as communication-bounded ones together with the ones where both: communication and data processing play an important role.. 1.5 Scientific Problems The user requirements immediately result in a list of scientific problems that should be considered in order to make the analysis feasible. These issues are discussed below. This section however does not exhaust the list of all problems solved during the research. Obviously additional issues result from solutions devised for the ones described below. The roadmap of the whole research is presented in Section 1.7 and all important problems are also summarized in Chapter 8.. 1.5.1. The High-level Model. The first task coming from the user requirements is to work out a form for the model provided by application developers. The model should respond to depicted requirements of users concerning level of abstraction and number of details. In this case hybrid modeling approach proposed in [78] and applied e.g. in [67] is a promising possibility. Devising optimal form of the model certainly requires significant effort and a number of prototypes tested and criticized by application developers for significant period of time. The goal of this thesis does not include providing final form of the model since it is a substantial field for separate research concerning software engineering. However a model that can be a basis for such research should be prepared as a proof of concept enabling assessment of the method on the basis of a prototype implementation..

(19) Chapter 1. Research Scope and Roadmap. 1.5.2. 7. Reliable Analysis. Reliable analysis of the model provided by the user is another challenging issue. It demands to correctly reflect all important activities of analyzed application by means that ensure validity and precision of the results. Below the important problems are described. Concurrency Concurrency is pervasive phenomenon in contemporary computer science. It is even more important when we consider distributed applications, since in such environment one hardly can assume sequentiality of activities. If the activities take place on distributed, mostly independent entities the concurrency becomes the natural manner of their operation. Consequently ensuring sequentiality of distributed computations requires additional effort, necessarily taking into consideration concurrent nature of the environment. Therefore proper reflection of concurrent activities becomes crucial for reliable analysis of distributed applications. Competition for Resources However distributed applications usually exploit distributed and independent resources, but frequently there is a need for two or more distinct activities to take advantage of the same resource. The exploitation of resources usually requires limited, but non-zero period of time. If the periods for distinct processes overlap and the nature of the resource requires exclusive access, one have to deal with competition. This is another problem that should frequently be solved when analyzing applications in distributed environment. The competition has an impact on efficiency of applications, but also on resource usage. Therefore proper model of this issue is not only important to reflect activities of distributed applications, but also crucial for reliability of the analysis. Estimation of Network Transmission Time The network transmission is an integral part of each distributed application, however the importance of this factor is diverse, depending on the application design. The network is exploited to exchange large amount of data, but also to enable necessary synchronization of computations. Depending on type of application it may be important to quickly transfer large volumes of data or to immediately, without significant delay, pass a small peace of synchronization information or possibly perform long lasting transmission with possibly constant parameters of network link (e.g. delay and bandwidth). Therefore estimation of network transmission time is not only crucial for proper analysis but also a complex task that should be considered from various perspectives. Transmission Control Protocol The most pervasive protocol of the Internet communication, beyond the Internet Protocol (IP) is Transmission Control Protocol (TCP). From the whole stack of the Internet-related protocols the TCP is the one that has an important impact on efficiency of data transmissions. The reason for that is that TCP ensures flow control of data and is responsible for detecting and avoiding congestion of the network links. TCP has been successively developed since year 1981 [87] till nowadays [4]. Such a long time of subsequent modifications and corrections together with changing environment of constantly improved network links exploited by subsequent new applications result in a number of different approaches.

(20) Chapter 1. Research Scope and Roadmap. 8. and versions of the protocol. Specification of TCP is described in a number of distinct RFC documents covering its different aspects. Consequently different implementations join the specifications in their peculiar way. Reliable analysis of efficiency of TCP data transmissions presents a substantial challenge. First for the sake of complexity of the protocol. Second because of significant number of available options (e.g. congestion control algorithms). Finally for the sake of diverse implementations, that are usually consistent with the specification, but certainly different from each other potentially also in respect of efficiency. Dependency on Different Algorithms Efficiency of network data transmissions may also be influenced by the other solutions that include network schedulers. They might interfere e.g. with TCP on the level of their concepts or due to conditions related to implementation, reducing transmission speed in an unintentional way [47]. The aim of the schedulers can be diverse, from simple bandwidth limiting (e.g. TBF11 ) to fair treating of concurrent network connections (e.g. SFQ12 ). There is significant number of already implemented network schedulers available e.g. in Linux and modularity of the Linux kernel results in ease of including new implementations. Different specialized network devices use their own implementations of various algorithms.. 1.5.3. Contradictions Between Requirements. There are several contradictions between above described requirements in many respects. In case of modeling and reliable analysis: the high level of abstraction used in the model provided by developer and also adaptation to user needs do not correspond to the need for precise simulation. Required precision and credibility of simulation could be ensured by application of a proper formalism, the subject of this thesis assumes exploitation of Petri nets. However the descriptions required for formal analysis do not correspond to the requirements of the comfortable modeling, since they are based on different primitives and operate on different levels of abstraction then required for developers’ convenience. Additionally the analysis must necessarily consider the issues connected with estimation of network transmission time. These could certainly be included in the formal model, but respecting diversity of TCP flavors and implementations this requires essential work not directly connected with the researched problem. Moreover achieved solution would not be easily extensible and thus of small scientific and utilitarian value. The same problems are connected with the other algorithms e.g. network schedulers. Reliable estimation of network transmission time is certainly essential for network-bounded applications. However if network transmission time is not crucial for efficiency of an application, developers could require more efficient and less precise simulation of this aspect to obtain their results in shorter time, with less computing time exploited. This problem results from lack of assumptions limiting application of the method to a specific class of software. The contradictions should be solved in order to provide the developers with convenient, useful and precise modeling and analysis method. 11 12. http://lartc.org/howto/lartc.qdisc.classless.html#AEN691 http://lartc.org/howto/lartc.qdisc.classless.html#LARTC.SFQ.

(21) Chapter 1. Research Scope and Roadmap. 9. 1.6 Summary of the Goals and the Requirements The main goal of the thesis is to devise an efficiency estimation method for the Grid applications that will be reliable and convenient for the Grid application developers, and that will enable rapid modeling and assessment of the designs on early stages of development. The detailed goals and requirements described widely in the previous sections can be briefly summarized by the following tasks and problems that should be solved while working on the method. 1. Pass over the Grid management systems to avoid unnecessary difficulties while using the method. 2. Enable convenient, quick modeling in order to encourage application developers to use the method and limit effort required for estimation. 3. Ensure reliability of results to make the method valuable and useful. 4. Ensure wide spectrum of analyzed applications to allow different developers to take advantages of the method. These general aims are described from the method’s user point of view. Their scientific analysis allows to prepare the list of scientific goals that should be achieved to make the method respond to the above mentioned user requirements. 1. Work out a prototype of High-level model to enable method’s users to conveniently provide details of their designs 2. Reliably model activities of concurrent applications, what requires credible models of: ⇒ concurrency ⇒ competition for resources ⇒ estimation of network transmission time 3. Reconcile the two above demands in a single solution to join advantages of convenient modeling with reliability of a formal analysis method.. 1.7 Research Roadmap This section outlines the research, describing briefly different problems and applied solutions that are presented in the following chapters.. 1.7.1. Evolution of the Concept. As mentioned before, at the beginning the research was motivated by communication overhead resulting from applying cryptographic algorithms to ensure secure communication in the Grid monitoring system – OCM-G. Therefore the first concepts were concentrated mainly on dependency between communication overhead and efficiency of the whole application. These concepts were described in [72]. However it quickly appeared that it is not the only important issue for application developers and that the others can be naturally included in the research..

(22) Chapter 1. Research Scope and Roadmap. 10. Further research considered impact of partial delays caused by data processing on different kinds of resources on overall efficiency of analyzed application [73]. Currently the two main sources of delay are considered: ⇒ network data transmission, ⇒ data processing by CPUs. The method does not preclude considering the other possibilities, however currently the research is concentrated on these two.. 1.7.2. The Roadmap. The first issue that should be solved is to reconcile the already mentioned in Section 1.5 contradiction between the need for comfortable modeling with requirements of formal, reliable analysis. From the beginning the concept was to provide developers with a possibility to describe their application and resources using a form of High-level model. The High-level model was designed to respond to all requirements connected with convenience of the developers and additionally it is possible to translate it to a formalism capable of reliable analysis. Without the need to make the model strictly correspond to syntax of a formalism it is feasible to adjust it to the needs of application developers. Concurrently, since the model precisely and unambiguously describes its domain, it is possible to translate it to a formalism capable of reflecting the domain, but using different syntax and different set of primitives, suitable for reliable analysis. As mentioned before, the form of the High-level model in this work is an example, aimed at proving that the concept is feasible. The final form can be worked out after collecting experiences with application of the method by different developers and requires considerable period of time. Therefore it can change while the future research and developments connected with the software engineering rather then with the scope of this thesis. The High-level model required to prove the concepts presented in this thesis was however developed and described. It ensures that description of application and resources can be done on a level of abstraction that is natural for application developers. Information that should be provided to complete the modeling process are the ones that are necessary for the analysis and should be available for developers at least as approximations. Parameters of the model can be described using mathematical expressions depending of different factors, e.g. simulation time. Thus the concept of hybrid modeling [78] was applied to create a model of application that considers its dynamism. As the formalism enabling analysis of the model we have chosen Timed Colored Petri Net (TCPN). Petri nets proposed by C. Petri in [65] are considered a graphical and mathematical modeling tool [62] – they can be used as a graphical presentation of an activity of a system and also as a mathematical model which enables theoretical analysis. Colored Petri Net (CPN) introduced by K. Jensen [40] is an important extension to classical Petri net, extending its primitives with the notions corresponding to the ones known from programming languages. The model created by the use of CPN can be described by more compact and less complex graph then in case of classical Petri net. Therefore frequently the scalability of CPN model is significantly better in comparison to classical PNs. In order to reflect time relationships in the model K. Jensen designed Timed Colored Petri Nets (TCPN) [41] as an extension to Colored Petri Nets. In the TCPN a global simulation clock is introduced. The tokens are delayed in the places by their timestamps that can be set by the transitions the tokens are produced by. For the research presented in this thesis the most important advantage of the TCPN is the ability to naturally reflect the crucial activities of distributed applications:.

(23) Chapter 1. Research Scope and Roadmap. 11. ⇒ concurrency ⇒ competition for resources The use of the TCPN formalism ensures that these aspects of analyzed application are reflected correctly and precisely. The formalism does not introduce additional simplifications. Thanks to colored tokens the models are scalable and relatively easy to maintain, also by the use of their graphical representation. Thanks to the time extension analysis of the TCPN, model is capable of providing required statistics concerning efficiency. There is a convenient choice for the analysis method when considering Petri net based model and requirements concerning observation of different application activities on the model being analyzed. Petri nets enable convenient simulation with the possibility to observe subsequent states of the model and dependency between them facilitating comprehension and location of possible problems. It also enables a wide spectrum of results that can be obtained from a model, therefore there is no need to limit in advance the set of results that can be obtained from the method. Consequently this aspect can be left at discretion of particular application developers. Additionally simulation does not introduce excessive restrictions to the model thus it is possible to take all advantages and the whole expressiveness power of PNs with chosen extensions. The High-level model is automatically transformed to the TCPN based executable model. Thus the developers need not be aware of the existence of the formalism and what is more important, need not adjust themselves to its specificity. The transformation is formally described and implemented in the prototype simulator. The problem of reliable estimation of TCP transmission time required significant effort in the research. At the very beginning an analytical model of TCP performance was being sought. This solution could be naturally included in the Timed Colored Petri Net. However the analytical models appeared to be too specific for general purpose simulator. Thereafter we tried to include more complicated model of real TCP in the TCPN based model. Similar approach was already a part of the other research described in [21, 91]. This solution promised good results and could be made more general. However it made difficult and sometimes impossible to reflect specifics and changes of real implementations. Therefore it could be well suited to a theoretical research but it would be difficult to adapt it to real implementations and future changes. However we have created TCP model in Timed Colored Petri Nets and results of simulation presented in [73] were obtained with this model. Finally the complexity of estimation of network transmission time led to expanding the TCPN based simulator with specialized external modules better suited to reflect network-related algorithms. Module for TCP uses Linux TCP implementation from the well known Ns-2 simulator designed to provide a model as close to reality as possible [89]. Concurrently the modular design enables switching the fine grained simulation module to a simpler one that provides less precise results in considerably shorter time. Thus application developers are given possibility to chose a network protocol module that fits the needs of their specific application. If required another module can be used without the need to alter the TCPN part of the executable model. Similarly network scheduler modules can optionally be connected with the simulator. Two TCP modules (one for precise and one for quick, rough estimation) and an optional scheduler module were exploited in the research described in the thesis. In proposed solution advantages of the formalism of TCPN were joined with expressiveness power and convenience of general purpose programming language. The concurrency-related.

(24) Chapter 1. Research Scope and Roadmap. 12. problems were reflected in the model using TCPN, however the other aspects were implemented with a programming language. Thus the TCPN was used only to reflect the aspects it had been designed for and there was no need for not natural constructions describing the other functions of the simulator. While working on this thesis this concept was also exploited for the other application designed for banking and described in [23], where Petri net model was responsible for credibly steering execution of credit decision module implemented in an imperative programing language. The method was assessed on the basis of comparison between simulation results obtained from the prototype implementation and real-world results from experiments. The experiments were handled for different aspects of the application with different sources of delay: for communication - bounded and CPU-intensive cases, for different models of network protocols, with and without a network scheduler.. 1.8 Organization of This Thesis The thesis is organized as follows. Chapter 2 describes the other works and publications connected with different aspects of presented research. Chapter 3 is devoted to the formalism of Petri nets and their extensions, since they play an important role in this work. In Chapter 4 the analysis method with all details is described, including models, concepts enabling simulation and formalization of the High-level model. Chapter 5 describes the method of simulation with TCPN based model, models of network-related algorithms and automatic transformation from the High-level model. Prototype implementation of the concepts in the Simulation Tool is described in Chapter 6. The method and the Simulation Tool are evaluated in Chapter 7 where results of simulations performed by the use of prototype implementation are compared with results of real-world experiments. Chapter 8 summarizes the research and outlines some plans for possible future works..

(25) 2. Chapter. Related Work This chapter describes the other works that are related to the topics presented in this thesis. It covers different aspects concerning performance prediction, simulation and support for application developers to present a general view of the background. Finally the works strictly related to the research presented in this thesis are discussed. The aim of this chapter is to describe the works related to the research presented in this thesis. These works concentrate on several different topics that in this work are composed together. However in different other publications they are usually not connected to one another, or only selected ones are joined. Therefore the sections of this chapter present distinct descriptions that are summarized and discussed in the last section in order to create a consistent view of the whole. The major Grid-related topics discussed in this chapter are the following: ⇒ performance prediction of applications ⇒ performance related support for application developers ⇒ simulation ⇒ simulation as a method of performance analysis of applications, designed for their developers The last topic is the one that is directly related to this work. Additionally the chapter contains discussion about characteristics and analysis methods for Transmission Control Protocol (TCP). It is required for the sake of its pervasiveness in Internet communication and its impact on transmission efficiency. The TCP should be carefully considered in analysis of efficiency of distributed applications..

(26) Chapter 2. Related Work. 14. 2.1 Performance Prediction in the Grid Performance prediction is an important field of research in the Grid environment. Dynamism, decentralization and lack of a single administration entity render the Grid resource management exceptionally challenging issue. From the general point of view the Grid resources are loaded non-deterministically, since they serve not only the Grid requirements. Heterogeneity of the resources additionally increases the difficulty level. Therefore possibly accurate methods of predicting performance of applications on different resources were always considered an important way to enhance resource management. Thus the main aim of the Grid research related to performance prediction of applications is to feed valuable information for the application schedulers and resource brokers. An important requirement concerning performance prediction for schedulers concerns efficiency of the prediction process. The prediction time strongly influences the efficiency of scheduling process and thus it is important to make the prediction solutions as efficient as possible. Prediction of performance in principle concerns two issues: ⇒ prediction of queue wait time (or job start time) ⇒ prediction of job execution time Some research address only one of the problems: queue wait time e.g. [46, 51] or job execution time e.g. [38, 60, 79], and some of them concern both e.g. [52]. Peculiarity of the scheduling process results in performance prediction solutions that implement a common pattern with a few departures. Usually the prediction is based on historical data about performance of applications and resources, that are subjected to analytical processing [38, 46, 50, 52, 60, 79] and produce data for the prediction that can be done quickly for a specific case. The methods of historical data processing and form of produced output differs depending on selected solution. The conclusions about behavior of specific applications on specific resources are frequently based on similarities between the applications and the resources [52, 59, 60, 79] with differently defined notion of similarity being an important contribution of individual authors. Iverson et al. in [38] propose statistical method of non-parametric regression for estimating task execution times for heterogeneous distributed computing. The statistical model is build on the basis of historical information. The authors argue that statistical model is able to efficiently compensate a number of factors and has potential to improve over time, with the number of available observations. However Hui et al. [52] find the statistical method not likely to work well due to specific characteristic of the scheduling problem, namely heavy-tails distribution of run times and queue wait times. They propose prediction method based on Instance Based Learning (IBL), define suitable distance function and induction models and apply genetic search to tune the IBL parameters. Smith, Foster and Taylor [79] use historical information to prepare a set of templates categorizing jobs that should be considered similar and then use the templates to find required prediction. Kurowski et al. in [50] process historical information into a rulebased expert system and Kalantari et al. [46] predict queue-wait time for jobs using state-space model. The knowledge obtained from the historical data is stored in the form adequate to the processing method. For instance as an analytic model if an analytic or statistical method is used ([38, 46]) or e.g. as an expert system [50]. The existence of the intermediate model ensures.

(27) Chapter 2. Related Work. 15. efficiency of the prediction process, since the most time consuming processing is done while the model creation. Similarity between jobs and resources is considered an important issue that can support prediction of performance of specific computations on particular resources. Therefore a number of similarity models were proposed in order to efficiently exploit historical information for prediction of behavior of similar jobs or jobs on similar resources in the future. In [52] the authors enumerate a number of job parameters identifying job in openPBS. For resources two sets of attributes are combined: the ones describing resource state and the ones concerning its policy. The authors define distance function to judge similarity. The weights of subsequent attributes in the function are tuned using genetic search algorithm. The genetic search, besides greedy search, is also used by Smith, Foster and Taylor [79]. They apply it to find suitable set of templates to categorize the jobs to similarity classes. G-Prophet being a part of Askalon project described in [59, 60] uses combined solutions based on normalization of performance value for different Grid sites on the basis of performance of a reference problem if required by lack of demanded parameters they also use a simple statistical model like B-Splines, linear piece wise fit etc. Alternatively performance prediction can be done on the basis of simulation and this is discussed in Section 2.3. 2.2 Performance Related Support for the Grid Application Developers The problem of developer oriented tools is an important issue considered in different Grid related projects. The works include monitoring and also performance profiling and visualization. Their aim is to provide application developers with tools corresponding to debuggers and profilers available for sequential programs. An example of a monitoring system is OCM-G [8] developed as a part of CrossGrid project to form an infrastructure capable of delivering to a developer details concerning an application executed on the Grid. An important assumption of the system is to enable on-line monitoring. The system is able to deliver not only data about status of subsequent application processes. The application can be instrumented to provide data concerning efficiency of subsequent phases of computing. The system is secured with GSI implemented in Globus Toolkit [9]. It is fully compliant with European Grid infrastructure created by EGEE1 project. The functionality of the OCM-G is complemented by GP-M, a tool intended as a frontend for developers, currently being replaced by a project called Candle2 Performance analysis and visualization capacity is also provided by PROVE that is a part of P-GRADE3 toolkit developed at MTA Sztaki. The toolkit is designed to deliver complete solution for development and deployment of applications in the Grid environment. One of important goals is to provide comfortable, high level environment ready to support developers in the same manner as program development tools available for more traditional architectures. Comparison of different monitoring systems developed as a part of the Grid projects can be found in [8]. Additionally Zanikolas and Sakellariou [93] provide extensive study of different Grid monitoring systems with comparison of their characteristics and systematize them in 1. http://www.eu-egee.org/ http://grid.cyfronet.pl/ocmg/ 3 http://www.p-grade.hu 2.

(28) Chapter 2. Related Work. 16. a taxonomy. The tools are capable of facilitating the work of developers while writing code, deploying it on the Grid, debugging and profiling. They are however not suitable for rapid prototyping where it is required to limit effort required to perform assessments.. 2.3 Simulation This section describes simulation-related works that are connected with the Grid environment or with distributed applications. The ones that concern distributed applications in the Grid environment and thus are directly related to this thesis are described with more details in Section 2.4.. 2.3.1. Simulation in the Grid. Simulation was exploited in the Grid related research from the very beginning. Therefore there is a significant number of solutions based on this method. There are also works aimed at systematizing the efforts concerning simulation [24, 83, 84]. The simulation was however rarely used to analyze Grid applications. In most cases its advantages supported research concerning management of the Grid resources. Significant number of such works emerged when there was no real Grid infrastructure pervasively available for the scientists and thus it was the only feasible method of testing the management algorithms. Therefore most of the Grid simulators concern one or both of the following issues: ⇒ job scheduling ⇒ data replication In these works simulation is not an integral part of production implementations, but a method that enables convenient and not expensive testing of solutions meant for production in an artificial Grid-like environment. One of the most prominent Grid simulators is GridSim [16]. It is based on SimJava discreteevent simulation library and emerged as a toolkit designed to enable assessment of the Grid scheduling algorithms. GridSim is capable of simulating different classes of heterogeneous resources. It focuses on Grid economy treating resource owners as producers and end users as consumers; brokers (schedulers) are responsible for allocation between users and resources. The recent publication [82] describes an extension that enables application of GridSim to simulation of data replication strategies. Bricks [86] is another example of performance evaluation system designed to compare and evaluate different scheduling algorithms. It allows to simulate scheduling algorithms, programming modules for scheduling and different network topologies of clients and servers. It is component based and thus allows not only comfortable replacement of the algorithms, but also incorporation of existing components. ChicagoSim developed on the University of Chicago and used e.g. in [70, 71] is a tool designed to evaluate DataGrid management algorithms. It allows to assess scheduling strategies that consider location of data sets. Data replication strategies can be included in models and analysis. It is based on general purpose discrete-event simulation language Parsec [7]..