Computing in Large-Scale Dynamic Systems

(1)

Computing in Large-Scale Dynamic Systems

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus prof. ir. K.C.A.M. Luyben, voorzitter van het College voor Promoties,

in het openbaar te verdedigen op vrijdag 29 mei 2013 om 15:00 uur

door Andrei Pruteanu Master of Science, Computer Science

(2)

samenstelling promotiecommissie:

Rector Magnificus

Prof. Dr. K.G. Langendoen Dr. Ir. S.O. Dulman Prof. Dr. Ir. M. van Steen

Prof. Dr. Giovanna Di Marzo Serugendo Prof. Dr. Ir. D.H.J. Epema

Dr. Ir. Zoltan Papp Dr. Ir. Jacob Beal Prof. Dr. C. Witteveen

voorzitter

Technische Universiteit Delft, promotor Technische Universiteit Delft, copromotor Vrije Universiteit Amsterdam

University of Geneva, Switzerland TU Delft, TU Eindhoven

TNO, the Netherlands

Raytheon BBN Technologies, USA Technische Universiteit Delft, reservelid

ISBN: 978-94-6186-160-3

Copyright c_{2013 by Andrei Pruteanu}

All rights reserved. No part of the material protected by this copyright notice may be repro-duced or utilized in any form or by any means, electronic or mechanical, including photocopy-ing, recording or by any information storage and retreival system, without written permission of the author.

(3)

This work was funded by the Free project, a Point-One project funded by SenterNovem.

This work was funded by DEMANES, an Artemis European Union project.

Advanced School for Computing and Imaging

This work was carried out in the ASCI graduate school. ASCI dissertation series number 281.

(4)

(5)

Acknowledgements

When I started my PhD, I knew the road towards the finish would be a difficult and long one. The transformation process I passed through the last four years has been quite remarkable. I learned to be independent and face difficult challenges in both my private and professional life. While sometimes my work was a solitary effort, I had lots of support from people around me as well as the family back home. First, I would like to thank Stefan Dulman - my supervisor, for guiding my steps all along the PhD track. His advice and ideas have helped me publish at a rapid pace. His constant push towards focusing on new and interesting research ideas, has motivated my efforts and kept my interest high for always new and challenging research projects. He was a friend more than a supervisor and that has greatly influenced the way I performed during my PhD.

I would also like to thank my promotor, Koen Langendoen for offering me the pos-sibility to conduct research as a PhD candidate at the Embedded Software Group. His down-to-earth attitude towards research has helped me improve the quality of my publications. His guidance with respect to ”selling” my work in a comprehen-sible manner is invaluable for my future career.

During the past years I have interacted with many BSc and MSc students and worked together on different projects. I would like to thank for their efforts to Qingzhi Liu, Bogdan Mihoci, Steffan Karger, Agostino di Figlia, Sjors van Berkel, Daniel Turi, Victor Spiridon, Maurice Boss, Chiel de Roest and Harmjan Treep. I would like to acknowledge my colleagues and office mates Kavitha Muthukr-ishnan, Alexander Feldman, Niels Brouwers, Venkat Iyer, Coen van Leeuwen, Andreas Loukas, Philipp Glatz, Marco Cattani and Martin Bor for their support, their constructive ideas and simply for the amount of fun we had while working together.

Next, I would like to thank my friends Robert Mihail, Bogdan Necula, Mihai Capota, Marius Enachescu, Maria Gheorghe, Ruxandra Mustata, Tomek Jask-iewicz and Carl Mair for the amazing time we have spent together on numerous vacations and weekend trips. A special appreciation to Dr. Alexandru Voitecovici for motivating me in following a PhD track when I was still undecided and thought this was not the right path for me.

(6)

(7)

Introduction

Moore’s law has successfully predicted the exponential increase in microchip transistor density over the last decades. This important trend has not only lead to a continuous increase in per-formance but also a reduction in size, power consumption and costs in all classes of computing devices. For embedded systems, reduced power consumption and costs have always been the most important product requirements. The addition of various communication interfaces has unleashed the creation of many feature-rich applications. This fact has driven mass adoption of the technology in various application domains and has accelerated innovation in research fields such as wireless sensor networks, swarm robotics and mobile ad-hoc networks. While pilot projects that make use of a few computational elements have been developed for quite a while, large-scale applications in terms of number of devices have always been difficult to tackle due to problems caused by their intrinsic properties.

Above a certain network size (roughly one hundred), services such as code updating, dis-covery of network topology and data dissemination in general are increasingly more difficult to manage. This is mainly due to sheer size of the system in terms of number of components that have to be controlled. Practical aspects such as harsh and unpredictable deployment con-ditions and varying system properties, such as the quality of the communication links, can no longer be ignored or mitigated in the same manner as for small-scale systems. The larger the network size, the higher the probability that individual or groups of devices become mobile. Examples range from networks in which mobility occurs relatively rarely (e.g., static networks with occasional device relocation due to maintenance operations) to highly dynamic networks (e.g., monitoring freight in transport and logistics applications). From the application point of view, device mobility is usually causing a decrease in quality of the communication ser-vice. This is mainly caused by the use of wireless links and, although it removes the need for wires, it brings along tremendous challenges with respect to robustness of the underlying com-munication protocols. These negative effects are usually mitigated via the usage of advanced scheduling techniques at the MAC layer and smart buffering at the routing layer. The draw-back for this is higher complexity and increased communication latency. Compared to wired

(12)

solutions, wireless systems require much more sophisticated techniques for achieving (almost) similar performance.

For all these reasons, there is a high need for a paradigm shift in the way we develop applications and the underlying algorithms for large-scale networked embedded systems. To address this important challenge, we introduce novel algorithms that make use of randomised techniques such as gossiping in order to detect and cope well with the dynamic nature of large-scale networks. To validate our work, we show implementation details on real hardware and run extensive experiments on it.

1.1 Application Scenarios for Large-Scale Networked Embedded

Systems

There are multiple application domains where networks of embedded devices are employed on a large scale for enabling complex surveillance and control applications. In the following paragraphs we present a few of them and show some of their challenges.

Track and Tracing in Transport and Logistics

The research for this thesis was partly funded by the Dutch government via the Agentschap NL PointOne Free project for enabling future mesh networks for transport and logistics appli-cations. The use-cases addressed in the Free project involve tracking and tracing goods along the logistics chain with wireless sensor networks. The communication model is mostly uni-directional - from the pallets towards the wireless infrastructure where all data is aggregated. There are also situations where devices cannot communicate with the access points due to their limited transmission range. Consequently, they have to make use of multi-hop communication in order to disseminate information. In large warehouses, pallets are continuously moved from one trailer to another via mobile robots [62]. Due to the dynamic nature of the topology, there is a drop in performance for networking-related services such as data collection protocols. Link failures are also common and are caused by specific deployment conditions (the metal structure of the warehouses drastically influences the propagation of the radio waves).

Emergency Communication Systems

Mobile devices are omnipresent nowadays. For large crowds, cellular networks are not always the most reliable means of communication, especially in emergency situations. Due to the overloading of mobile telephony capacity, precious information that could sometimes save lives cannot be successfully transmitted. Although rarely used, mobile phones have means of communicating with each other in an ad hoc manner. In disaster situations, when services provided by mobile operators are temporary unavailable, the devices could talk to each other

(13)

1.1 Application Scenarios for Large-Scale Networked Embedded Systems

in a peer-to-peer manner in order to exchange valuable emergency information [108]. Even more, they can self-organise hierarchically into clusters based on various criteria in order to facilitate information dissemination and reduce the amount of traffic that is generated when relying on flooding protocols [65]. The challenge is to uncover local interaction rules that produce quasi-static overlays on top of mobile networks.

Distributed Management of Large-Scale Networks

For large-scale computer (embedded systems) networks, the failure probability of the com-munication links or the devices themselves is usually proportional to the size of the system. To cope with such phenomena, network management policies make use of feedback informa-tion coming from network surveillance services. They maintain accurate informainforma-tion about communication and device failures and compute aggregate status information based on event logs. While this approach is certainly straightforward, it requires an extensive amount of col-lected data and the aggregate status information is available only after complex data processing activities have been performed. This approach leads to delays and demands for powerful com-puting capabilities. An alternative and challenging approach is to use distributed feedback mechanisms created via randomised methods such as gossiping protocols. Due to their elegant nature, they are able to compute aggregates with reduced traffic overhead, while being sim-ple to imsim-plement. Compared to centralised approaches, aggregate communication and device failure information is readily available at each individual device.

Advanced Traffic Monitoring in Future Transport Systems

Traffic monitoring in future transport systems will most likely use wireless sensors or vehic-ular area networks to measure the number of vehicles that pass through certain road segments. Technical solutions that make use of video cameras are already in place for years to support this class of applications. There is a need to replace them with more scalable and cheaper traffic surveillance methods. One approach is to use sensor networks and vehicular communication technologies. This new approach could be utilised to monitor the traffic in remote locations where video cameras are difficult to install. One application subclass that maps well to the problem of the detecting vehicles that pass through a certain road segment is the so-called churn detection problem. In the broadest sense, churn describes the number of items that are moving in or out of a group over a certain period of time. In the traffic monitoring context, it describes the percentage of vehicles of cars passing through that are actively entering or exiting a given road segment. Vehicles capable to communicate with each other wirelessly can use gos-siping protocols to estimate, in a distributed manner, the number of vehicles passing through a given road segment and use this information to detect traffic jams or any other potentially hazardous situations.

(14)

Adaptive Media Delivery Services in Peer-to-Peer Systems

The notion of peer-to-peer systems usually refers to networks of computers that are con-nected to the Internet and exchange information via point-to-point protocols. The terminology has extended also to embedded devices that, although limited with respect to their computa-tional capabilities, are inter-connected with each other in similar ways. As a consequence, algorithms that estimate system properties such as sudden increase in network size, also called

FlashCrowd, can be applied to both types of computing infrastructure. Adaptive systems require feedback mechanisms for establishing reconfiguration policies. For networks where bandwidth allows for dissemination of multimedia information, such algorithms enable dy-namic resource allocation based on user demand. Embedded systems, can also adapt various energy conservation policies in accordance with relative network size increases.

Interactive Architecture

Interactive Architecture is a field of architecture in which every-day objects and human-inhabited space have the ability to respond to changing needs in relation with evolving indi-vidual, social, and environmental conditions [49]. In other domains it is also referred to as

re-sponsive architecture. Interactive architectural environments [14] have increased in popularity boosted by several waves of technological improvements in computing (embedded systems) and networking. It has evolved through several decades to include concepts from cybernet-ics, ubiquitous computing, organic computing and lately spatial computing. The underlying technology that facilitates the realisation of Interactive Spaces relies on spatially distributed networks of embedded devices, each equipped with sensors and actuators. Due to its high inter-dependency with the creative industries, the embedded systems technology that powers these devices has to offer support for fast and easy prototyping and testing of ideas.

A recent development in the research for distributed embedded systems is the Spatial Computing Paradigm [147]. It uses spatial abstractions as basic ingredients for creating self-organising, large-scale autonomous systems. An additional challenge related to embedded software middleware is to provide support for specific services such as viral code updating and runtime adaptation of application behaviours for achieving a good robustness to frequent device and communication failures. An example application is the ProtoSpace environment at the Faculty of Architecture in Delft. The room is equipped with an interactive floor (see Figure1.1). Each individual tile is equipped with sensors and actuators and is connected as a mesh network topology with its neighbors (see Figure1.2). The floor interacts with the people in the room by means of pressure sensors and light patterns.

(15)

1.2 Problem Statement

Figure 1.1: Protospace Interactive Floor. Each tile has one sensor (pressure) and one actuator (LED).

Figure 1.2: Interactive Floor - Embedded Devices.

1.2 Problem Statement

The objective of the thesis is to address some of the challenges outlined below that are asso-ciated with the creation of distributed algorithms used by software middleware for large-scale networked embedded systems as outlined in Figure 1.3. In the following paragraphs we go more in-depth and provide additional details.

Topology Dynamics

One of the most important challenges associated with large-scale networks is related to changes in topology due to device mobility and random hardware and software failures. For

(16)

Figure 1.3: System Architecture.

networks that exhibit low mobility, algorithms developed for static networks perform reason-ably well. Usually, changes of the device position are detected and the routing layer finds alternative paths – either via a dedicated mechanism within the protocol itself (e.g. the DSR al-gorithm [82]), or by simply refreshing all the routes periodically [3]. For networks that exhibit high device mobility, however, radical new solutions are required. Classic solutions produce way too much (communication) overhead [122].

Link Failures

In communication networks, device mobility induces also the problem of low quality of the communication links. It is also caused by either increased contention due to high network density, or by environments in which wireless communication is influenced in a negative man-ner by phenomena such as fading. Although local information about the average packet loss is beneficial for various communication protocols [115], having aggregate information about regions or entire systems offers the benefit of having a global feedback loop on the overall communication performance [129].

Churn

An important challenge in distributed systems research is to compute the number of nodes that are actively entering and exiting the system. In the most general sense, churn describes the number of items that are moving in or out of a group over a fixed period of time. In peer-to-peer systems, churn distorts the information contained in routing tables and leads to

(17)

1.3 Methodology

inconsistencies of the stored resource items [114]. In networked embedded systems, churn has the same negative effects plus it causes an increase in power consumption and, just like for peer-to-peer systems, a reduction in the application quality of service [105]. The challenge of detecting network churn is thus important and will benefit applications trying to mitigate it.

Flashcrowds

In peer-to-peer systems, a phenomenon called flashcrowd has been intensively studied re-cently [34]. It describes the ongoing activity of peers suddenly connecting or disconnecting due to various reasons. The amount of bandwidth that is required to feed information to the clients for instance in video streaming applications is dependent on having knowledge at run-time about the size increase of the network. Current approaches make use of statistical methods to describe and estimate such phenomena a posteriori [148]. Online methods, although prefer-able, are not extensively studied due to the challenge of estimating runtime phenomena in dynamic large-scale systems.

Complex Spatial Computing Middleware

For networked embedded systems, the ultimate validation of distributed algorithms on real hardware is always preferable. Spatial Computing [6] is a novel paradigm in the study of algorithms for distributed system. It advocates the usage of spatial and temporal constructs for describing various algorithmic building blocks and simplifying application development. Unfortunately, the default middleware implementation is built around a Lisp-like programming language called Proto [11]. For complex applications, the code becomes difficult to maintain and non-it specialists find it difficult to learn and utilize. The challenge is to simplify their work by offering them and easy to use programming language and corresponding spatial computing middleware.

1.3 Methodology

The problems addressed in this thesis came out mainly from the discussions with several re-search project partners. From the Free project we learned about the engineering challenges in IT systems deployed in transport and logistics products. Problems such as limited en-ergy resources, device mobility, device and communication failures, automatic code updating came out as major concerns for the logistics companies. Our algorithms target specifically these problems and offer novel ways towards solving or limiting their effects. The DEMANES project [47] has as goals the design, monitoring and operation of adaptive networked embedded systems. The major challenges addressed by the industrial partners involve allowing networked

(18)

embedded systems to adapt and reconfigure on the fly in case of dynamic deployment environ-ments, application characteristics and various failures associated with large-scale systems, in general. The embedded software middleware briefly described in the previous section and de-tailed in Chapter 6helps facilitating all these features while incorporating some of the complex algorithms studied in the first chapters of the thesis.

We address the challenges of large-scale systems with novel distributed algorithms. Ini-tially, the validation of our work was done via simulations using tools such as Matlab and Mathematica. In order to test our algorithms with more realistic device mobility scenarios we made use of the latest state-of-the-art in mobility models such as SLAW [95] or feed in real mobility traces such as the ones recorded by the cabs in San Francisco [92]. Later on, we tested our algorithms and the underlying spatial computing middleware on real hardware. We ran our experiments using two testbeds. The first one is a wireless testbed made out of roughly one hundred devices [137] deployed in the ceiling of the 9th floor at the Faculty of Electrical Engi-neering, Mathematics and Computer Science in Delft. The second one, called ProtoDeck [112], uses the serial interfaces to wire the embedded devices and consists of roughly one hundred and eighty nodes interconnected as a mesh [83]. It is deployed in the Protospace Lab, at the Faculty of Architecture in Delft.

1.4 Thesis Contributions

In the thesis we address the research questions mentioned in Section1.2. The contributions are summarized briefly as follows.

Creating Quasi-Static Overlays

One of the most widely used techniques for managing large-scale networks is the creation of overlays. Clustering algorithms partition the systems using various strategies such as device location, topological information (network density) as well as functional aspects (types of sen-sors and actuators). While for systems below a certain scale and with limited dynamics (e.g. low device mobility) all these approaches are suitable, above a certain scale and mobility level, a completely new approach is required. The ASH clustering algorithm introduced in Chapter2, copes well with device mobility and is very resilient to topology dynamics. It is able to cre-ate quasi-static overlays while the underlying devices are mobile. Unlike traditional clustering schemes, ASH does not use location information, and the computation is based on local infor-mation only. It benefits from the so-called emergent behaviour in complex networked systems. Clusters self-organise and adapt continuously via an aggregate feed-back mechanism called ”pressure”. The publications associated with this chapter are:

(19)

1.4 Thesis Contributions

1. Andrei Pruteanu, Stefan Dulman and Koen Langendoen. ”ASH: Tackling node mobility

in large-scale networks.”, 4th IEEE International Conference on Adaptive and Self-Organising Systems (SASO), 2010, pages 144-153.

2. Andrei Pruteanu and Stefan Dulman. ASH: tackling node mobility in large-scale

net-works. Springer Computing, September 2012, Volume 94, Issue 8-10, pages 811-832.

Estimating Aggregate Link Failures

In large-scale networks, there is a high probability of variation in the quality of the com-munication links [116]. Data dissemination protocols employed in wireless systems make use of aggregate information, such as average packet failure rate, to adjust various communica-tion protocol policies such as the selected channel, the duty-cycling rate and back-off intervals in case of packet collisions. In Chapter3, we introduce an algorithm, named LossEstimate, for runtime estimation of the aggregate number of communication failures present in a region or the entire distributed system. The new algorithm has the advantage of being completely decentralized - each device computes an estimate of the number of errors using a localised, gossip-like approach. The proposed method is adaptive in the sense that it can follow changes in the mean value of the communication failures over time. The publications associated with this chapter are:

1. Andrei Pruteanu, Venkat Iyer and Stefan Dulman. ”Faildetect: Gossip-based failure

estimator for large-scale dynamic networks”, Proceedings of 20th International Confer-ence on Computer Communications and Networks (ICCCN), 2011, pages 1 - 6.

2. Andrei Pruteanu and Stefan Dulman. LossEstimate: Distributed failure estimation in

wireless networks. Elsevier Journal of Systems and Software, 2012, pages 2785 - 2795, volume 85, number 12.

Distributed Detection of Churn

Network churn causes distortion of the information found in routing tables and leads to inconsistencies in the shared resource items [114]. Applications for peer-to-peer systems could improve considerably their performance if the phenomenon would be detected at runtime. In Chapter 4, we introduce a distributed algorithm for detecting churn in large-scale dynamic systems. The publication associated with this chapter is:

1. Andrei Pruteanu, Venkat Iyer and Stefan Dulman. ”ChurnDetect: a gossip-based churn

(20)

Online Detection of Flashcrowds

Nowadays, peer-to-peer applications generate most of the traffic on the Internet. It is thus important to assure they attain high performance to ensure good quality of service for the users. Apart from off-line analysis of traces, online mechanisms for estimating real-time changes of the network characteristics (i.e., increases in network size, churn, failures, etc.) are needed in order to enable products that utilise adaptive algorithms for content delivery applications. In Chapter5we specifically focus on the problem of online detection of the flash-crowd phe-nomenon, defined as a sudden, unexpected increase in the number of peers requesting a piece of content. To the best of our knowledge, this is one of the first online methods for detecting the FlashCrowd phenomenon. The publication associated with this chapter is:

1. Andrei Pruteanu, Lucia d’Acunto and Stefan Dulman. Distributed Online Flash-Crowd

Detection in P2P Swarming Systems. Elsevier Computer Communications, 2012.

Spatial Computing Applied for Interactive Architecture

The technologies that architects (non IT specialists) use when designing interactive

envi-ronments could benefit from the use of the Spatial Computing paradigm. The two domains match very well in terms of envisioned application characteristics. They both use spatial and temporal concepts to describe behaviours for distributed systems. This fact has determined us to link the two domains and develop the necessary embedded software middleware. In Chap-ter6, we introduce the software tool chain that enables fast prototyping of ideas on large-scale systems, by abstracting away from the underlying technological complexity related to com-munication protocols, programming languages, operating systems, virtual machines, hardware platforms, etc. The publications associated with this chapter are as follows:

1. Steffan Karger, Andrei Pruteanu and Stefan Dulman. ”Spatial computing for non-it

specialists.” Spatial Computing 2012 collocated with AAMAS.

2. Andrei Pruteanu, Agostino di Figlia and Stefan Dulman. Large-Scale Networked

Em-bedded Systems for Interactive Architecture. 2013.

1.5 Thesis Outline

The outline of the thesis is as follows. Chapter 2 introduces a novel distributed algorithm that is able to create a virtual infrastructure that partitions the network into domains of quasi-equal size while handling well topology dynamics caused by device mobility. In chapters3, 4

(21)

1.5 Thesis Outline

and5, we introduce a mechanism based on gossiping that tracks aggregate system phenomena such as average packet loss, churn and flash crowds in peer-to-peer systems. In Chapter6we present an experimental evaluation for spatial computing algorithms such as hop-count distance estimation and synchronisation algorithms via a feature-rich embedded software middleware. Chapter7concludes the thesis.

(22)

(23)

2

ASH: Tackling Node Mobility in

Large-Scale Networks

We begin our research by tackling the problem of device mobility in large-scale wireless net-works. Recent years have seen a significant increase in the number and the diversity of the devices that form the wireless networks around us. The number of devices per network has grown substantially, and research domains such as mobile ad-hoc networks (MANETs) and

wireless sensor networks(WSNs) have studied the corresponding scalability issues, for exam-ple, by providing theoretical boundaries [60, 66]. However, large collections of networked devices also bring in the problem of mobility; the larger the network, the higher the probability that individual or groups of devices become mobile. Examples range from networks in which mobility occurs relatively rarely (e.g., static networks with occasional node relocation due to maintenance operations) to highly dynamic networks (e.g., monitoring freight in transport and logistics applications).

One of the main approaches for handling topology dynamics is the creation of hierarchies (also called overlays or clustering algorithms). Recent research projects targeting the devel-opment of large-scale cyber-physical systems, including programmable matter [58], swarms of tiny robots [84] and amorphous computing [2], take mobility as a default assumption. A common approach to tackle mobility has not materialized yet; most of the research efforts are focused on the scalability aspect, in particular, the need to program the network as a whole rather than as an individual set of nodes [71,143].

In this chapter we propose a novel mechanism (called ASH) for handling mobility in large-scale networks; in essence we “slow down” the network by creating a quasi-static overlay on top of the highly mobile network. A local rule is defined as a pairwise interaction between two neighbouring sensor nodes (1-hop distance). A unique feature of ASH is that it is based on the execution of local rules only: there is no knowledge of the global structure of the network and there is no usage of additional information related to position, speed and direction of nodes. A key idea behind ASH is that nodes are not addressed individually, but rather that the network is

(24)

composed of a set of domains – groups of nodes – whose membership constantly changes (see Figure 2.1(a)). By observing their neighbours, nodes can decide themselves which domain they currently belong to; the decision policy is tuned to lead to domains whose centres of gravity hover around slowly, effectively providing a “quasi-static” overlay. The name ASH was inspired by the metaphor of an ash cloud where tiny particles are floating together in the air. The cloud is slowly traversing the sky, while the contained particles move around each other randomly and fast. This natural phenomenon resembles the dynamics of our mechanism; the cloud can be compared to our domains, while the volcanic ash particles resemble the moving individual nodes.

ASHcan be used directly as an efficient overlay mechanism, for example, by assigning dif-ferent application-level functionality to the difdif-ferent domains. Alternatively, ASH can be used as a clustering protocol by adding a leader election mechanism (see Section2.4). ASH is – by design – very robust to node and link failures. It is based on a combination of gossiping, which is topology agnostic, and a periodic adjustment procedure that reconstructs local state based on the actual neighbourhood. Message loss and node failures are simply regarded and treated as nodes leaving the network (domain). Simulations show that ASH succeeds in providing a robust overlay mechanism at low cost, that is, with minimal message exchange. This approach makes ASH extremely robust in contrast to many other existing protocols.

The domains defined by ASH are quasi-static with respect to the deployment area. Their mobility exhibits speeds orders of magnitude lower than the average speed of the nodes (see Figure2.1). From this perspective, if applications are targeted at the domains ASH provides, rather than at the individual nodes, then employing algorithms for static networks on top of the overlay becomes possible.

The remainder of this chapter is organized as follows. In Section2.1 we describe related work. In Section 2.2 we present an overview of ASH, while in Section 2.4 we introduce a clustering mechanism as an application example. We analyze the performance of ASH in Sec-tion2.5. Finally, Section2.6concludes the chapter.

2.1 Related Work

In ASH, individual mobile nodes are using simple behavioural rules (i.e., periodic local ex-change of a set of variables) to generate a pattern at the collective level (i.e., a static overlay) that is more intricate than the simple, one-hop interaction from which it emerges. From this perspective, ASH resembles algorithms met in the area of complex systems [104], such as the techniques inspired by biological systems, which are based on simple local (1-hop) interactions and are fully decentralised. For example, the firefly-inspired synchronisation [134] has several striking features that make it attractive for large-scale networks. To synchronise with each other, nodes execute very simple computations and interact in a simple manner, maintaining no internal state regarding neighbours or network topology. The synchronism provably emerges in

(25)

2.1 Related Work 0.2 0.4 0.6 0.8 0 0.2 0.8 1 0.6 0.4 1

(a) ASH Overlay (colors mark different cluster IDs); nodes superimposed (black dots).

0 50 100 150 0 50 100 150 200 250 300 350 actual estimated Time [rounds] #nodes

(b) Cluster Size (green = actual; red = esti-mated).

Figure 2.1: ASH Geometry and Time Evolution

a completely decentralised manner, without any explicit leaders and regardless of the starting state. The algorithm is very robust to network topology changes. On the other hand,

desyn-chronisationis the logical opposite of synchronisation; instead of nodes attempting to perform periodic tasks at the same time, nodes perform their tasks at moments in time equally spaced apart from each other. Desync [36] is such a self-maintaining desynchronization primitive and achieves desynchronisation in a single-hop network. Other types of emergent algorithms, sim-ilar to ASH, exist as well. In the MIT amorphous computing project [2], researchers used the

peer pressurealgorithm to regularise the regions of a surface covered by smart paint, smooth the edges and fill in the surface holes. Similar to the assumptions ASH makes, the myriad of computational elements are uniform-randomly distributed throughout the smart paint.

ASHuses gossiping as the basis for its communication mechanism. Gossiping (also known as epidemic algorithms) is a simple randomised procedure, finding its use in disseminating information in large-scale networks. It was firstly introduced to maintain consistency for dis-tributed databases when performing updates [28], offering a resource-efficient and robust alter-native to complex deterministic algorithms. From a communication perspective, the underlying communication mechanism of ASH is related to the work presented in [89], where epidemic algorithms were proposed to forward information in mobile networks with intermittent com-munication links. Similar mechanisms, such as random walks, are explored in more recent work [4]. The focus in these approaches is on algorithms that reliably spread information in large-scale networks, while minimising the energy usage [155].

When dealing with the challenges introduced by network mobility, there exist few alter-native techniques to constructing a static overlay (as ASH proposes) or a network hierarchy in general. The simplest one is flooding: flooding small-sized packets in the network is a common practice in routing algorithms such as DSR [82], but induces big overheads for large networks.

(26)

A second alternative is using knowledge on the geographical position of the nodes – as is the case of geographical routing algorithms [38,154]. Geographical routing algorithms have the default assumption that sensor nodes have a means to determine their locations and usually come with the overhead that the position of the final destination of a message is explicitly included in the message. Unfortunately, in the case of wireless sensor networks, location in-formation acquired through GPS is usually expensive energy-wise and unavailable for indoor applications.

2.2 The ASH Algorithm

Since we noted that most existing networking protocols cannot handle large-scale networks exhibiting high node mobility, we set out to design an architecture that would be capable of handling these networks of the future. The result is a fully decentralised mechanism (ASH) that makes use only of local (1-hop) interactions between nodes, to create a quasi-static overlay, a virtual partitioning of the network into domains. For the sake of clarity, we define a domain as a group of neighboring nodes that share the same identifier (domain ID). Each domain usually spans over multiple communication hops and has a tendency to maintain, from a global perspective an almost convex shape – see Figure2.1(a).

The domains in ASH can be thought of in analogy with a number of gas balloons filling a fixed physical space (i.e., a box). Due to disturbances, the shape of the balloons and their relative positions may change. The amount (mass) of gas in each balloon is nevertheless con-stant, although the pressure in each balloon may fluctuate. Despite the random movement of air molecules, the system will converge to a stable state. ASH works on the same principle: each domain has a total mass M distributed over the nodes in that domain. The share of mass a nodei holds is denoted by mi. In a domain S, we havePi∈Smi = M. pirepresents the local

pressure inside a domain available at each node and is a function of the total mass M and the

number of nodesin a domain.

For a domain containing a large number of nodes, the share of the mass variable on each node will be small (we say that the domain has a low pressure). Neighbouring domains con-taining a smaller number of nodes (i.e., having a higher pressure) will extend, by pushing the boundaries of the first domain until the number of nodes in each domain will be approximately equal - pressures will equalise.

For a static network, the system will converge to an equilibrium at the borders between the domains. In a dynamic network, mobile nodes continuously trigger border changes, al-though the resulting domain macro-mobility is orders of magnitude lower than the mobility of the nodes themselves (this is somewhat similar to immiscible fluids interaction modelled with cellular automata [9] or a low-pass filter applied to device mobility of the nodes).

In order to explain ASH in detail, we consider the abstraction of network communication occurring in rounds – similarly to the work presented in [76]. Rounds are fixed-length time

(27)

2.2 The ASH Algorithm

intervals with each node acting once every round. This model does not reduce the general-ity of the solution as the rounds do not need to be synchronised, avoiding cumbersome clock synchronisation between nodes. In practice, this translates to nodes performing actions period-ically, such that, when averaging over a large period of time, all nodes perform equal numbers of actions.

During each round, each node has to perform the following three phases (see Algorithm1):

1. domain ID selection – nodes will decide their domain ID based on the domain IDs and local pressures of their neighbours (see Section2.2.2);

2. residual mass return – nodes that just changed their domain ID will distribute their mass value to neighboring nodes from the old domain, if any (see Section2.2.3);

3. diffusion of mass – nodes will attempt to equally distribute the mass in each domain (i.e., equalise the pressure in each domain) by means of gossiping (see Section2.2.4).

The second action (residual mass return) can lead to domain mass loss; nodes belonging to one domain can move out so fast that they do not have the chance to return their share of the mass back to the old domain. This phenomenon leads to a steady drop of mass in the domains over time – see Section2.2.5for a solution.

2.2.1 Initialization

ASHconsiders a fixed number of domains at start (NS). In the initialization part, the algorithm

starts by randomly assigning domain idk (k = 1..NS) and mass M to a NSnumber of nodes.

They act as “seeds” from which the domains will “grow”. Thus, each domain starts out as a single node, and will expand until the whole deployment area is covered by domains.

If we consider the clustering algorithms for comparison (although ASH is a more general framework!), the fixed number of domains might seem a limitation. Almost all existing clus-tering algorithms dynamically decide on the number of clusters in the network. This is because clustering is being used as a hierarchical way of controlling a large network by aggregating information. ASH can be easily extended to comply with this behaviour, by allowing domains to be dynamically created at run-time: when the pressure of a domain is very low (meaning that the domain is made up of a large number of nodes), a new domain can be spawned. A similar rule based on checking the pressure level can be used to remove unwanted domains. With the current approach, we are also covering a class of applications less often addressed: having a constant number of multihop domains leads to the so called functional partitioning of a net-work. Each domain will be associated a different functionality. As individual nodes randomly roam through domains, a scheduling policy at high level, of how many nodes should perform a certain functionality at each given moment, is easily implementable, without requiring keeping track of each node separately.

(28)

Algorithm 1 ASH algorithm.

1: function ASH(NS,nr, M)

N_S _{– domain count} nr– round count

M – initial domain mass

2: for all NSdomainsk do ⊲ initialization

3: Pick random unassigned nodej

4: node_{j ← domain k}

5: node_{j ← mass M}

6: end for

7: fornrrounds do ⊲ main algorithm

8: for all nodesi do ⊲ algorithm phase 1

9: nodei updates its domain ID

10: end for

11: ⊲ algorithm phase 2

12: for all domain leader nodesj do

13: nodej runs pressure correction phase

14: end for

15: for all nodesi do ⊲ algorithm phase 3

16: nodei runs diffusion phase

17: end for

18: end for

19: end function

2.2.2 Domain ID Selection

Each node will decide its domain ID (“domain color” in Figure 2.1(a)) based on a weighted combination between majority voting - dominant domain ID of its neighbours - and the pressure

differencebetween neighbouring domains (via a weight_{η ∈ [0, 1]).}

Let us assume a nodei has in its vicinity nodes from Di distinct domains. The number of

neighbours for node i, including itself, belonging to a domain k (where k = 1..|Di|) is ni,k.

The average pressure of the surrounding nodes in domaink is pi,k. The nodei will compute a

series of valuesθi,k:

θi,k= (1 − η) · ni,k P t∈Dini,t + η · P pi,k t∈Dipi,t . (2.1)

Let ˜k be the domain id corresponding to the maximum θi,kfor nodei (˜k = arg maxkθi,k).

The node i will consider switching its domain ID to the domain ˜k. Let k0 be the previous

domain id of the node i. To allow for a smooth functioning of the network, the switch to a new domain is subject to a threshold mechanism: node i will switch its domain only if θi,k0 − θi,˜k> ∆ with ∆ being a predefined threshold.

(29)

2.2 The ASH Algorithm

Since the domain selection process is carried out independently by all nodes, it can happen that a small domain simply dissolves when all members join a neighbouring domain. If not prevented, this effect will carry through and cause all domains (clusters) to eventually collapse into a single one, spanning the complete network. This undesirable effect is disabled by intro-ducing a domain leader (see Section2.4), which will be prevented from changing its domain ID.

2.2.3 Residual Mass Return

This phase involves all nodes that decided, in the previous algorithmic phase, to change their domain ID. The nodes crossing to a different domain need to adjust their mass variable: they need to return the current mass value to the old domain and enter the new domain with mass0. The diffusion phase that follows will make sure that mass redistributes equally in both old and new domain.

The simplest way a node can return mass to the old domain is to select one or more of its neighbours from the old domain and hand them its mass. This approach works in most of the cases, with one exception though. It can happen, due to various dynamics, that a node finds itself in a situation in which it has no neighbours from the old domain anymore. In this case, the mass on the node will actually be lost, unless a mechanism such as routing is in place and being used. We decided to use the simple solution of discarding the mass in this particular case, and repairing the loss later. The reason is that we avoid the complexity of routing in a volatile network, and the repair mechanism described in Section2.2.5is easy to implement.

2.2.4 Diffusion

The purpose of this phase in the ASH algorithm is to diffuse the mass inside a domain such that all nodes share the same view on what the average value is in that cluster. As nodes are

notsynchronised, and may deploy sleep schedules to conserve energy (behaviour common to sensor networks), there is no guarantee that all neighbours are ready to communicate when a node enters the diffusion phase. This situation is aggravated by nodes moving in and out of range, as well as errors on the wireless communication channel. To handle the resulting volatility ASH employs a gossiping style of communication on top of a periodic mechanism of neighbourhood discovery (the diffusion phase may actually consist of several gossiping rounds - see Section2.5).

Periodic neighbourhood discovery is done by nodes sending short “Hello” messages con-taining a tuple<node ID, domain ID, local pressure>. Periodic neighbourhood discovery is a common mechanism in mobile networks, with its functionality being assured by the media

access control(MAC) mechanism. For the implementation of the gossiping mechanism, an acknowledgment mechanism for the messages being sent is assumed in place. This needs not

(30)

be perfect as the pressure correction mechanism described in Section2.2.5compensates the effects of message loss.

Similarly to the Push-Sum gossiping algorithm [85], each nodei needs to store the follow-ing local variables: the local mass (mi), a weight factor (ωi) and the domain ID of the node

(di). Local pressure is computed aspi = m_ω_ii via the averaging mechanism described in the

Push-Sum algorithm.

In short, the gossiping protocol works as follows. Assume a nodei has the values mi,tand

ωi,t at the beginning of communication roundt. Node i randomly picks a neighbour from the

same domain, and sends to that node and to itself the setmi,t

2 , ωi,t

2 . During that round t, the

node receives updates n

mr_j,t, ω_j,tr ofrom a set S0 ofni neighbours, including itself (j ∈ S0).

The node updates its mass value and weight, for the communication roundt + 1, as follows: mi,t+1 =P_j∈S₀mr_j,tandωi,t+1=P_j∈S₀ω_j,tr . As shown in Proposition 2.2 in [85], the sum

P

imi,t remains constant at each moment in time.

For fixed infrastructures, standard gossiping has a convergence time for computing an av-erage value across the network within accuracye that requires Θ(n2_{log e}−1_{) messages. The}

solution of constructing a spanning tree and flooding back the average in an ad-hoc network introduces a lot of overhead and complexity. It has been proved that any kind of mobility is beneficial, especially the fully-random one as is the case with our scenarios [126]. Different mobility patterns can have significantly different effects on the convergence of distributed al-gorithms such as gossiping [41]. Ifm nodes have full mobility and the others are fixed, the convergence time drops toΘ(n2_{/m log e}−1_).

To ensure that information is spread across the complete domain, the diffusion phase may actually consist of several gossiping rounds. The right number of rounds depends on the ap-plication, more exactly, on the average speed of the nodes and their density, and the desired domain stability. We evaluate this dependency via simulation and present the results in Sec-tion2.5. On the other hand, one is not limited to gossiping alone. Broadcasting each message might come as a natural solution in many wireless networks as well. From the perspective of dissemination speed, gossiping presents the worst case scenario and we chose to use it to characterise the lower limit of the performance of our algorithm.

2.2.5 Mass Correction

In practice, domains lose mass over time. This happens primarily as an effect of node mobility; when a node suddenly finds itself surrounded by neighbours all belonging to a different domain than itself, it must switch domain ID, but cannot hand its residual mass back to the originat-ing domain. Simulation results show that the mass loss is kept at small levels even for high mobility, across all domains. Nonetheless, if no measures are taken, the mass in each domain will constantly drop towards_{−∞. In the case of real networks mass loss may also occur due} to failures. For example, nodes that suddenly crash or messages getting lost in the diffusion phase lead to additional mass loss. Thus, providing a mechanism for solving the mass loss

(31)

2.3 ASH convergence

issue, leads not only to a solution to the problem of nodes having no neighbours from the same domain, but also constitutes a self-healing mechanism for two of the most common failures met in mobile wireless networks.

Simplistic approaches for solving the mass loss issue may rely on knowing the statistical characteristics of the network: average density and flux of nodes in and out of domains. Based on these numbers, the mass could be periodically increased in each domain with a precomputed amount. This mechanism, however, cannot guarantee that the average mass across all domains is stable (may diverge to either infinities) due to the lack of a feed-back mechanism. We propose a solution for keeping the average mass level constant in the domains, which assumes the existence of a leader in each domain (similar to a conventional cluster head). The basic idea is that a diffusion-based mechanism (called the ASH-NetSize algorithm) is used to estimate the number of nodes inside a domain. By multiplying this estimate with the average mass value obtained in the previous round, a domain leader can determine the total mass in its domain. If it drops bellow a threshold, the leader can inject additional mass into the domain to compensate the loss.

ASH-NetSizeis making use of the gossiping mechanism [126] to estimate the domain size at a moment in time. The idea behind gossiping algorithms is that they are able to compute a mean value of some shared variableφ through all the n nodes of the domain (φ =

P

iφi

n with

i = 1..n). Let us assume that all nodes have a value of φi = 1. The domain leader (node k)

estimates the size of the network to bene and subtracts this value locally (φk ← 1 − ne). By

using gossiping, after a number of rounds in time, the setφi converges to a new setφ′i = φ′

– the new average of the distributed variable. If the network size was exactly estimated, then φ′

i = 0, ∀i. If not, then the sign and value of φ′i gives an indication on how the estimation

of the network size needs to be updated (if φ′

i < 0 then ne was overestimated, else it was

underestimated). By constantly updating it,newill follow the variations in the network size.

In Figure2.1(b)we show the accuracy of ASH-NetSize algorithm for being able to follow the fluctuations of the domains quite closely, smoothing out temporary “noise”. The traffic overhead associated with ASH-NetSize is minimal, since the correction information is piggy-backed through the already existing mechanism of diffusion (see Section2.2.4). As a result the average mass in the domains will stay around a desired value, steadily decreasing with time as an effect of nodes leaving and periodically increasing due to the domain-leaders injecting mass.

(32)

Cluster U U U Cluster Cluster Cluster Cluster U U j i K, 1 , -j i K 1 , +j i K j i K+1, j i K-1, j i j i K K PT , 1 ,, + j i j i K K PT , , 1, + j i j i K K PT , , 1, -j i j i K K PT , 1 ,, -j i K U PT , , 1 , ,j,ij -i K K PT j i j i K K PT , 1 ,-, j i j i K K PT , 1 ,+, PTKi,j,Ki,j+1 1 , ,Kij -U PT j i K U PT , 1 , -j i K U PT , 1 ,+ 1 , ,Kij+ U PT

(a) Model State Diagram.

0 100 200 300 400 500 −100 0 100 200 300 400 500 600 time (rounds) node count model unclustered nodes simulation (b) Convergence. Figure 2.2: ASH Modeling

(33)

2.3 ASH convergence

The creation of the ASH clusters depends solely on local rules. At equilibrium, the clusters converge to an equilibrium where they have the same number of nodes. To prove convergence, we will first consider a general case, in which we analyse the variation of the number of nodes niandnj in two neighbouring clustersi and j.

The amount of nodes that switch between the two clusters i and j is proportional to the contact borderφi,jbetween them. Let the node density in clusteri be ρi = n_Si_i, whereniis the

number of nodes in cluster i and the cluster surface isSi. Letri =

q

ni

π·ρi be the radius of the

cluster i. Let the length of the contact border between the two clustersi and j be

Li,j = fi,j· 2πri = 2fi,j

r_πn

i

ρi

(2.2)

wherefi,j is the fraction of clusteri border that is “touching” cluster j. We can approximate

Li,jto:

Li,j ≈ K ·√ni (2.3)

K is a constant resulting from the fact that ρiis also constant at equilibrium. Since we are

dealing with a model that considers the physical space, the distance between the domains also plays an important role. We capture it via the functionα(di,j) that models the relation between

domainsi and j distance, and the node exchange rate between domains.

A schetch of the state transitions that models the system is shown in Figure2.2(a). TU,i

describes the flow of unclustered nodes to be part of the clusteri. Ti,j models a node transition

from clusteri to cluster j.

TU,i ≈ K ·√ni (2.4)

Ti,j ≈ Sign(ni− nj) · K√ni· α(di,j) (2.5)

We approximate α(di,j) with an exponential distribution to show that the rate of transfer

between neighbouring domains decreases exponentially fast with the distance between them. Although in practiceα(di,j) has a very abrupt profile, being actually equal to 0 after a distance

threshold, the exact form of the function does not influence the convergence proof (although it certainly influences the convergence rate). The model considers a1000x1000 metters deploy-ment area.

α(di,j) = e−ψ·di,jH(di,j) (2.6)

whereψ is a constant and H is a modified version of the heaviside step function, defined as following:

H(di,j) =

0 ifdi,j > 500meters

(34)

From a node flow perspective, the number of nodes in a domainj varies as following: dnj dt = N X i=1,i6=j Sign(nj− ni) · K√nj· α(di,j) (2.7)

We check the model by comparing it with simulation results as shown in Figure2.2(b). The model considered a number of 10 clusters with node mobility following a Random Walk pat-tern. Node speed is selected randomly uniform from0 to 10[mps]. The average neighbourhood size is 15. For the constants, we usedK = 0.1 and ψ = 3.

As shown by Figure 2.2(b), the number of un-clustered nodes converges fast to 0. The results in the simulation closely matched the model. In order to compute the equilibrium point, we need to solve the system given by dnj

dt = 0, for all clusters j in the setup. This translates

into the search for the conditions for which there is an equilibrium and there is no fluctuation of the domain (cluster) sizes. If for all clustersj, we sum up Equation2.7and reduceK and αi,j, we have: 0 =√n1(Sign(n2− n1) + ... + Sign(nK− n1) +√n2(Sign(n1− n2) + ... + Sign(nK− n2)) ... +√nK(Sign(n1− nK) + ... + Sign(nK−1− nK)) = f (ni=1..K) (2.8)

Since the number of nodes in each cluster is strictly positive, at equilibrium, (Sign(n2−

n1)+...+Sign(nK−n1) = 0, (Sign(n1−n2)+...+Sign(nK−n2) = 0 etc. For the general

case, the equilibrium point of the dynamic system occurs whenn1 = n2 = n3 = ... = nK = N

K, as expected. In other words, the system converges to an equilibrium only if the clusters

have the same number of nodes. In other words, the system converges to an equilibrium only if the clusters have the same number of nodes. The convergence occurs under the following conditions: nodes transition from one domain to the other depending on how close they are to the border and the domains exchange more nodes if the border size between them is large. This maps very well to our simulation conditions.

2.4 Application Example – ASH-Cluster

In this section we show how the quasi-static domains of ASH can be used by applications. We revert to the example of clustering, for which a large number of algorithms have been surveyed and compared in works such as [1,146]. The performances of these clustering algorithms are compared on a multitude of metrics: communication overhead, power balancing, re-clustering ripple effect, cluster formation time, etc. The large majority of these algorithms target static

(35)

2.5 ASH Algorithm Analysis

networks – as soon as mobility is involved the clustering problem becomes increasingly more difficult to solve although possible alternatives have been proposed[145].

The domains defined by ASH can be readily used as clusters, hence, ASH-Cluster was de-veloped as a natural algorithm on top of the overlay. ASH-Cluster provides multihop clustering for mobile networks by solving the problem of re-clustering ripple effect in an elegant way, while keeping the communication overhead at a low value. The key is that ASH-Cluster de-termines the domains (clusters) independently of the decision of electing a cluster head. This makes it superior to the large majority of existing clustering algorithms, in the sense that the mobility of a cluster head does not trigger re-clustering. Actually, the cluster head election is a mechanism implemented in ASH-Cluster after the clusters have been created.

Assume nodei belongs to domain Di. To establish a gradient on nodei (see Figure2.3), we

use the ratio between the number of neighbouring nodes from other domains (P

j∈Di,j6=ini,j)

and the number of all neighbouring nodes (P

j∈Dini,j). A low pass filter is also being applied.

Figure2.3shows the gradient in colors, blue indicating the center of the domains. The proba-bility of nodes “hosting” the cluster head agent is smallest in the red regions and highest in the dark blue ones.

Routing of information takes place in a unidirectional way, in the sense that nodes can send data towards the cluster head (fitting the data-collection type of applications) as in [56], via the gradient mechanism described below. The communication between the cluster heads is similar to the one used by the LEACH protocol [70]. The cluster heads form a network backbone (a spanning tree routed at the gateway) and can make use of an increased transmit power to communicate to each-other. The cluster head is, in our case, a software agent that “jumps” to different nodes to perform the data collection and communication with other agents. It is usually located in the minimum gradient area, and also uses the gradient to restrict the search area (ideally to the centre of the domain) when looking for a new candidate to “jump to”. Nodes route data towards the minimum gradient point, where it will be met by the cluster head agent.

2.5 ASH Algorithm Analysis

To evaluate the stability of ASH, we need some means of characterising the shapes of the domains as well as their fluctuation in size (number of nodes). We introduce four metrics to measure the stability of domains over time: the motion of the centroid of the node positions (motion metric), the domain shape variation (variation metric), the standard deviation of the domain sizes (Std. Dev. Domain Size metric) and the ratio between the largest domain size and the smallest domain size (ratio metric). The domains in ASH “hover” around, somewhat similarly to Brownian motion, at a speed significantly smaller than the average speed of the nodes in the network. At the same time, the shapes of the domains fluctuate around a stable circular-alike perimeter. The motion metric captures the actual mobility of the domains, and the variation metric captures the fluctuations in surface size. The stdSize metric tracks the

(36)

0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Distance [Km] D is ta n ce [K m ]

Figure 2.3: ASH Gradient

variation of domain sizes through time and the ratio metric shows the imbalance between the largest domain size (measured in number of nodes) and the smallest.

The motion metric is equal to the distance traveled by the centroid of the node positions, in a domain. The centroid is defined as(xc, yc) = _n1 Pixi,_n1 Piyi, i = 1..n, where xiandyi

are the coordinates of the nodes. The motion metric is defined as the average traveled distance of the centroids through time.

Letdtbe the mean value of the distances between the centroid of a domain and all the nodes

in the domain, at time roundt. The variation metric is equal to the difference: m2 = dt+1− dt.

This metric will show increasing values with the fluctuations of the sizes of the domains. We simulated ASH and ASH-Cluster using Matlab. We considered 1000 mobile nodes deployed in a square space with the edge of1000 meters. The transmission range of the nodes is set to100 meters and assumed a very simplified - unit disk transmition model. We configure

ASH to operate with 10 domains in our simulations (cf. Figure2.1(a)). One might argue that such a high-level simulation is not a true representation of an actual deployment since a lot of problems occur from unexpected places (software bugs, hardware failures, communication interference, scalability issues etc). The goal is to present a mechanism that is agnostic to the lower layers such as MAC and PHY that are being used. To preserve the generality, we model all low-level errors as mass loss, with the implication that the simplified assumptions do not affect the overall stability.

A node moves through space with a speed ranging from a minimum of1 mps to a maxi-mum of10 mps. In our simulations, we use four mobility models: Random Walk [15], Random

(37)

2.5 ASH Algorithm Analysis

Algorithm 2 ASH cluster head election

1: function ASH-CLUSTERHEAD(c_t−1) returnsct

local node variables: gi– local gradient

ni– nr. neighbors ofi

no,i– nr. neighbors ofi from other domains

notations:

i – node identifier

cr– clusterhead node ID at roundr

N_i_{– set of neighbors of}_{i in the same domain}

2: for all nodes in domain do ⊲ algorithm phase 1

3: g_i← _nno,i

i+1

4: end for

5: for all nodes in domain do ⊲ algorithm phase 2

6: ifi == ct−1then

7: c_t← arg min g_i∈ N_i

8: end if

9: end for

10: end function

Direction[124], Random Waypoint [15] and Slaw [95]. The choice was made such that they cover a wide range of behaviors, from nodes traveling all over the deployment area (Random Waypoint) to nodes moving in a localized manner (Random Walk) to realistic human mobil-ity (SLAW). Each experiment consisted of simulations running for500 time rounds, for each mobility case. The maximum speed was varied across simulations to achieve different charac-teristics for mobility.

In the Random Walk model [15], also named Markovian Mobility model, nodes move freely anywhere in the simulation area. The direction of the movement ϕ is taken from a uniform distribution on the interval[0..2π]. The speed values ϑ follow a uniform distribution. Once the node reaches a destination, it chooses a new direction and starts moving toward it after a randomly chosen time interval, taken from an exponential distribution.

The Random Direction model [124] operates similarly to the random walk, except that nodes continue to travel until they are within some distance of the simulation space boundary. Then they stop and choose new, random destinations.

In the Random Waypoint model [15] a node randomly chooses a destination point in the deployment area, moves with constant speedυ (chosen uniformly between υminandυmax) on

a straight line then pauses for a random time before it again chooses a new destination. SLAW [95] simulates social contexts present among people sharing common interests or those in a single community such as university campus, companies and theme parks. It

(38)

ex-presses the mobility patterns involving these contexts by fractal waypoints and heavy-tail flights on top of the waypoints.

2.5.1 Influence of Network Mobility

The Random Waypoint (RWP) mobility leads to the formation of more stable domains when compared to other mobility patterns (see Figure 2.4(b) and Figure 2.4(a)). The results are similar to well known experiments, such as those presented in [27]. This is caused by the position distribution of the nodes, which is higher in the centre of the deployment area, when compared to the other two models.

The Random Waypoint mobility model is used in many prominent simulation studies for ad-hoc network protocols. Although its ability to produce realistic mobility patterns is debat-able, the flexibility of the model determines its adoption by a lot of simulation scenarios. The Random Direction model provides a uniform distribution of nodes over the deployment space. As seen in our simulation results, this mobility model leads to comparable results with the Random Waypoint model. The Random Walk model causes ASH to perform the worst. The SLAW model produces to human-alike mobility and, as such, creates clusters that are more stable (reduced speed around specific points). ASH performs best for this model.

Figure2.4(b) and Figure2.4(a) show that the increase in node speed has basically no in-fluence on the overall stability of ASH for all mobility models except Random Walk. In fact, both variation and motion metrics exhibit a similar behavior. This is the most important char-acteristic of our algorithm. For the Random Walk model, on the other hand, ASH shows a steadily increasing degradation in performance with the increase of speed. This happens for a high node speed due to the fact that the second term in the domain selection formula leads to irregular shapes of the domains, affecting the stability of the algorithm. Up to a speed of 12 mps, the metrics are relatively constant for all models. Above this limit, the clusters are less stable for the case of Random Walk model. The authors of [30] showed that the idea of a critical radius (smallest possible transmission radius, to minimize the amount of consumed energy for transmission, without compromising connectivity) being determined solely on the given node density is not accurate. For the case of uniform mobility models [79], it is expressed as a function of the node velocity, as well. The benefits of the uniform node density and result-ing connected graph dependence on the node velocity parameters are greatly influencresult-ing the performance of the algorithms.

Figure 2.4(b) shows the distribution of domains average movement (the motion metric) through the simulation. The average speed of the centre of mass of the domains is more than one order of magnitude smaller than the average speed of a node.

Computing in Large-Scale Dynamic Systems