C O N T E N T S
i b a c k g r o u n d
1 i n t r o d u c t i o n 3
1.1 Motivation . . . 3
1.2 Main Thesis and Objectives . . . 5
1.3 Structure of This Thesis . . . 5
2 c l o u d i n f r a s t r u c t u r e 7 2.1 Introduction to Cloud Computing . . . 7
2.2 Service Models . . . 8 2.2.1 Infrastructure-as-a-Service . . . 9 2.2.2 Platform-as-a-Service . . . 10 2.2.3 Function-as-a-Service . . . 10 2.2.4 Software-as-a-Service . . . 11 2.3 Deployment Models . . . 11
2.4 Infrastructure-as-a-Service Building Blocks . . . 12
2.4.1 Virtual Machines . . . 12
2.4.2 Storage Services . . . 13
2.5 Billing in the Clouds . . . 14
2.6 Summary . . . 15
3 w o r k l oa d s t r u c t u r e a n d c h a r a c t e r i s t i c s 17 3.1 Classes of Applications . . . 17
3.1.1 Workflows . . . 17
3.1.2 Bag of Tasks . . . 19
3.2 Resource Demand Estimation . . . 19
3.3 Workflow Execution Environment . . . 20
3.4 Summary . . . 22
4 s tat e o f t h e a r t 23 4.1 Scientific Workflow Scheduling in the Cloud . . . 23
4.1.1 Scheduling Strategies . . . 24
4.1.2 Application Model . . . 25
4.1.3 Static vs. Dynamic Scheduling . . . 25
4.1.4 Optimization Objectives . . . 25
4.1.5 Resource Elasticity . . . 25
4.1.6 Resource Quotas . . . 26
4.1.7 Multi-provider Provisioning . . . 26
4.1.8 Multi-core Parallelism . . . 26
4.1.9 Data Transfer and Storage Cost . . . 27
4.1.10 Hourly Billing . . . 27
4.2 Evaluating Cloud Functions . . . 28
4.2.1 FaaS for Scientific Applications . . . 28
4.2.2 Performance Characteristics . . . 28
4.3 Summary . . . 29
x c o n t e n t s
ii c o n t r i b u t i o n s
5 m i l p-based planning and scheduling 33
5.1 Applications of Mixed Integer Linear Programming . . 34
5.2 Problem Scope and Definition . . . 36
5.3 Cloud Performance Benchmarking . . . 36
5.4 Case Study: Workflow Scheduling . . . 38
5.4.1 Application Model . . . 38
5.4.2 Infrastructure Model . . . 38
5.4.3 Formulation of the Scheduling Problem . . . 39
5.4.4 Application and Infrastructure Data . . . 47
5.4.5 Evaluation of Optimization Models . . . 49
5.5 Other Uses . . . 53
5.6 Summary . . . 53
6 s tat e f u l d i v i d e-and-conquer algorithm case study 55 6.1 Porting Methodology . . . 57
6.1.1 Partitioning and Communication . . . 57
6.1.2 Agglomeration . . . 58 6.1.3 Mapping . . . 59 6.1.4 Workflow Generation . . . 61 6.1.5 Workflow Execution . . . 62 6.2 Experiments . . . 63 6.2.1 Experiment Setup . . . 63 6.2.2 Results . . . 65 6.3 Discussion . . . 67 6.4 Conclusion . . . 69 7 c l o u d f u n c t i o n s 71 7.1 Motivation and Scientific Questions . . . 73
7.1.1 Motivating Use Cases . . . 73
7.1.2 Scientific Questions . . . 74
7.2 Benchmarking Framework for Cloud Functions . . . . 76
7.3 Experiment Setup . . . 78
7.3.1 Integer-based CPU Intensive Benchmark . . . . 78
7.3.2 Instance Lifetime Experiment . . . 79
7.4 Performance Evaluation Results . . . 80
7.4.1 Overheads Evaluation . . . 82
7.4.2 Instance Lifetime . . . 85
7.4.3 Cost Comparison . . . 85
7.4.4 Infrastructure Heterogeneity . . . 88
7.4.5 Serverless vs IaaS Cost Analysis . . . 88
7.5 Discussion of Results . . . 91
c o n t e n t s xi iii c o n c l u s i o n s 8 d i s c u s s i o n a n d f u t u r e w o r k 95 8.1 Summary of Contributions . . . 95 8.2 Auxiliary Contributions . . . 96 8.3 Lessons Learned . . . 97 8.4 Future Work . . . 98 au t h o r’s publications 99 b i b l i o g r a p h y 103 l i s t o f f i g u r e s 113 l i s t o f ta b l e s 114 Appendices
a p p e n d i x a: Modeling, Optimization and Performance Evaluation of Scientific Workflows in Clouds 119 a p p e n d i x b: Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization 121 a p p e n d i x c: Porting HPC applications to the cloud: a
multi-frontal solver case study 134
a p p e n d i x d: Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google
Cloud Functions 145
a p p e n d i x e: Performance evaluation of heterogeneous cloud