Modular Structures, Robustness and Protection of Complex Networks: Theory, Complexity and Algorithms

Pełen tekst

(1)Modular Structures, Robustness and Protection of Complex Networks Theory, Complexity and Algorithms. Stojan Trajanovski.

(2)

(3) Modular Structures, Robustness and Protection of Complex Networks Theory, Complexity and Algorithms.

(4)

(5) Modular Structures, Robustness and Protection of Complex Networks Theory, Complexity and Algorithms. PROEFSCHRIFT. ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus Prof. ir. K.C.A.M. Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op 13 oktober 2014 om 12:30 uur door. Stojan TRAJANOVSKI. master of philosophy in Advanced Computer Science van University of Cambridge, Engeland, het Verenigd Koninkrijk geboren te Skopje, Macedonië..

(6) Dit proefschrift is goedgekeurd door de promotor: Prof. dr. ir. P.F.A. Van Mieghem. Samenstelling promotiecommissie: Rector Magnificus Prof. dr. ir. P. F.A. Van Mieghem Prof. dr. K. Aardal Prof. dr. ir. R. E. Kooij Prof. dr. E. A. Cator Prof. dr. J. Crowcroft Prof. dr. E. Altman Prof. dr. J. van Leeuwaarden Prof. dr. ir. C. Witteveen. Keywords: Author’s email: Author’s web page: Cover design: Printed & Lay Out by: Published by:. voorzitter Technische Universiteit Delft, promotor Technische Universiteit Delft Technische Universiteit, Delft en TNO, Delft Radboud Universiteit Nijmegen, Nederland University of Cambridge, United Kingdom INRIA, Sophia-Antipolis, France Technische Universiteit Eindhoven, Nederland Technische Universiteit Delft, reservelid. Modularity, Network Robustness, Complex Networks, Virus-Spread. stojan.trajanovski@gmail.com http://www.trajanovski.net Ivana Balamovska Proefschriftmaken.nl k Uitgeverij BOXPress Uitgeverij BOXPress, ‘s-Hertogenbosch, Nederland. c 2014 by S. Trajanovski Copyright All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without the prior permission of the author. ISBN 978-90-8891-972-5 An electronic version of this dissertation is available at http://repository.tudelft.nl/. Typeset by the author with the LATEX Documentation System..

(7) To my family..

(8)

(9) Thesis Summary Modular Structures, Robustness and Protection of Complex Networks Theory, Complexity and Algorithms Community structure is observed in many real-world networks, such as (online) social networks, where groups of friends of a certain person are often also friends of each other. Newman’s modularity has been explored as an important quantitative metric for communities and clusters detection in networks. We present a new expressions and bounds for the modularity. These expressions reveal conditions for or properties of the maximum modularity of a network in both topological and spectral domains. Finding the maximum modularity of a given graph has been proven to be NP-complete and therefore, several heuristic algorithms have been proposed in the past. We investigate the problem of finding the maximum modularity of classes of graphs that have the same number of links and/or nodes and determine analytical upper bounds. Moreover, from the set of all connected graphs with a fixed number of links and/or number of nodes, we construct graphs that can attain maximum modularity, named maximum modular graphs. The maximum modularity is shown to depend on the residue obtained when the number of links is divided by the number of partitions. The modularity depends on the chosen partitioning of the network into communities, which makes finding the specific partition that leads to the maximum modularity a hard problem. In this thesis, we prove that deciding whether a graph with a given number of links, number of communities, and modularity exists is NP-complete and subsequently propose a heuristic algorithm for generating random graphs with a given modularity and number of links with different topological properties. The generator can be used in the broad field of modeling and analyzing clustered social or organizational networks. We also propose a model which can randomly generate simple graphs which are line graphs of other simple graphs, employing an iterative merging procedure. These graphs possess interesting properties, for example, if the cliques are all of the same size, the assortativities of the line graphs i.

(10) ii. THESIS SUMMARY. in each step are close to 0, and the assortativities of the corresponding root graphs increases linearly from −1 to 0 with the steps of the nodal merging process. Due to their importance to society, communication systems, represented as complex networks, should be built and operated to withstand failures. We define robustness as the maintenance of functionality under node or link removal. In this context, the functionality is measured by several graph metrics. We study the robustness of both static and time-varying networks under node removal, considering random node failure, as well as targeted node attacks based on network centrality measures. In static networks, targeted and random failures have been studied in the literature; however, existing approaches tend to study random failure in terms of average-case behavior, giving no idea of how badly network performance can degrade purely by chance. Instead of considering average network performance under random failure, we compute the network performance probability density functions as functions of the fraction of nodes removed. We show that many centrality measures produce similar targeted attacks in both static and time-varying networks and that a combination of degree centrality and eigenvector centrality may be enough to evaluate worst-case behavior of static networks: even small subsets of highly connected nodes act as a bottleneck in the static or temporal information flow, becoming critical weak points of the entire system. We also study the robustness envelope and targeted attack responses of static networks that are rewired to have high and low degree assortativities, discovering that moderate assortativity increases confer more robustness against targeted attacks whilst moderate decreases confer more robustness against random uniform attacks. In time-varying randomly generated networks, where all the nodes have similar properties, we show that random errors and intelligent attacks exhibit similar behavior. However, cost considerations make network providers less inclined to take robustness measures against failures that are unlikely to manifest, like several failures coinciding simultaneously. Considering networks embedded in a two-dimensional plane, we study the problem of finding a critical region - a part of the network that can be enclosed by a given elementary figure of predetermined size - whose destruction would lead to the highest network disruption. We determine that the problem is polynomially solvable and propose appropriate algorithms. In addition, we consider region-aware network augmentation to decrease the impact of a regional failure. We subsequently address the region-disjoint paths problem, which asks for two paths with minimum total weight between a source and a destination that cannot both be cut by a single regional failure of a given diameter (unless that failure includes the source and the destination). We prove that deciding whether region-disjoint paths exist is NP-hard and propose a heuristic region-disjoint paths algorithm. Defining an optimal protection strategy against viruses, spam propagation or any other kind of contamination process is an important feature for designing new networks and architectures. The first approach is a network adaptation, which is the interplay between disease dynamics on a network and the topology dynamics. A continuous-time adaptive Susceptible-Infectious-Susceptible (ASIS).

(11) iii model is introduced in order to investigate this interaction, where a susceptible node avoids infections by breaking its links to its infected neighbors while it enhances the connections with other susceptible nodes by creating links to them. When the initial topology of the network is a complete graph, an exact solution to the average metastable state fraction of infected nodes is derived without resorting to any mean-field approximation. A linear scaling law of the epidemic threshold as a function of the effective link-breaking rate is found. The metastable state topology shows high connectivity and low modularity in two cases: (i) a “strongly adaptive” region with very high effective spreading rate, and (ii) a “weakly adaptive” region with very low effective spreading rate. These two regions are separated from the other half-open elliptical-like regions of low connectivity and high modularity in a contour-line-like way. Our results indicate that the adaptation of the topology in response to disease dynamics suppresses the infection, while it promotes the network evolution towards a topology that exhibits assortative-mixing, modular structure and a binomial-like degree distribution. In the second approach, we consider decentralized optimal protection strategies when a virus is propagating over a network through a SIS epidemic process. By assuming that each node in the network can decide to protect itself from infection at a constant cost, we model our system using a game theoretic framework. We find pure, mixed equilibria, and the Price of Anarchy (PoA) in several network topologies and propose algorithms to compute a pure equilibrium. Stojan Trajanovski.

(12)

(13) Samenvatting Modulaire structuren, robuustheid en bescherming van complexe netwerken Theorie, Complexiteit en Algoritmes Veel netwerken zijn opgebouwd uit groepsstructuren, zoals bijvoorbeeld (online) sociale netwerken, waarin de vriendengroep van een bepaalde persoon meestal ook onderling bevriend is. Newmans modulariteit wordt vaak gebruikt als een belangrijke kwantitatieve grootheid om groepen en clusters te herkennen in netwerken. In dit proefschrift formuleren we nieuwe uitdrukkingen voor en stellen grenzen aan de modulariteit. Via deze nieuwe uitdrukkingen leiden we voorwaarden aan en eigenschappen van de maximale modulariteit van een netwerk af in zowel het topologische als het spectrale domein. Het is bewezen dat het vinden van de maximale modulariteit voor een bepaalde graaf NP-compleet is. Dat is waarom er in het verleden verschillende heuristische algoritmes ontwikkeld zijn. Wij behandelen het probleem van het vinden van de maximale modulariteit in klassen van grafen met hetzelfde aantal kanten en/of knopen en leiden analytische grenzen af. Bovendien creëren we uit de set van alle geconnecteerde grafen met een vast aantal kanten en/of knopen grafen waarin de modulariteit maximaal kan zijn: deze grafen zijn maximaal modulair. We tonen aan dat de maximale modulariteit afhangt van de rest bij deling van het aantal kanten door het aantal partities. De modulariteit hangt af van hoe het netwerk in groepen is verdeeld, dit maakt het vinden van die verdeling die zorgt voor een maximale modulariteit een moeilijk probleem. In dit proefschrift bewijzen we dat het beslissen of een graaf met een gegeven aantal kanten, aantal groepen, en modulariteit bestaat NPcompleet is, en vervolgens ontwikkelen we een algoritme om willekeurige grafen met een bepaalde modulariteit en aantal kanten, maar verschillende topologische eigenschappen te genereren. Ons algoritme is nuttig in het vakgebied dat zich bezig houdt met het modelleren en analyseren van geclusterde sociale of organisatorische netwerken. Ook ontwikkelen we een model dat enkelvoudige grafen kan genereren die lijngrafen van andere enkelvoudige grafen zijn door iteratief knopen samen te voegen. Deze grafen kennen interessante eigenschappen. Bijvoorbeeld, v.

(14) vi. SAMENVATTING. als alle klieken even groot zijn, is de assortativiteit van de lijngraaf in iedere stap nagenoeg 0 terwijl de assortativiteit van de corresponderende wortel graaf lineair toeneemt van −1 tot 0 met iedere samenvoeging van knopen. Omdat communicatiesystemen zo belangrijk zijn in onze moderne maatschappij, moeten zij ontworpen worden om robuust te zijn tegen storingen. Wij definiren robuustheid als het behoud van functionaliteit wanneer knopen of kanten uit het netwerk worden verwijderd. In deze context wordt functionaliteit gemeten aan de hand van verschillende graaf eigenschappen. We onderzoeken de robuustheid van zowel statische als tijdsveranderlijke netwerken wanneer knopen verwijderd worden, zowel als gevolg van het willekeurig uitvallen als als gevolg van een gerichte aanval op knopen gebaseerd op hoe centraal de knooppunten zijn. De gevolgen van het willekeurig uitvallen en gericht uitschakelen van knopen in statische netwerken heeft veel aandacht gehad in de literatuur. Echter, de bestaande methodes richten zich vooral op de gemiddelde impact van het uitvallen van knopen, zonder inzicht te geven in hoeveel gevolgen het willekeurig falen van een knoop kan hebben. Als alternatief voor het berekenen van de gemiddelde impact van het falen van een willekeurige knoop berekenen wij de kansverdeling van de impact als een functie van het percentage verwijderde knopen. We laten zien dat veel methodes om te bepalen hoe centraal een node is leiden tot dezelfde aanvallen op zowel statische als tijdsveranderlijke netwerken, en dat een combinatie van de graad-centraliteit en de eigenvector-centraliteit genoeg zou kunnen zijn om het worst-case gedrag van statische netwerken te evalueren. Zelfs een klein aantal zeer goed geconnecteerde knopen kan genoeg zijn om een knelpunt in de statische of tijdsafhankelijke informatiestroom te vormen en daarmee een kritieke zwakke punt in het hele systeem. Ook bestuderen we de robuustheidsomhullende en de statische reactie op aanvallen op netwerken die herverbonden zijn om een hoge of juist een lage graad assortativiteit te hebben. We hebben ontdekt dat een kleine verhoging van de assortativiteit netwerken robuuster maakt tegen gerichte aanvallen, terwijl een kleine verlaging van de assortativiteit netwerken robuuster maakt tegen het willekeurig falen van knopen. We demonstreren dat in willekeurig gegenereerde tijdsveranderlijke netwerken waarin alle knopen dezelfde eigenschappen hebben, het willekeurig uitvallen van knopen hetzelfde effect heeft als een gerichte aanval op knopen. Netwerk providers zijn echter omwille van kostenbesparing niet geneigd om hun netwerken te beveiligen tegen onwaarschijnlijke situaties zoals verschillende simultane storingen. In netwerken die in een tweedimensionaal vlak ingebed zijn, hebben we het vinden van een kritieke regio bestudeerd. Een kritieke regio is een deel van het netwerk dat binnen een bepaalde eenvoudige vorm van een gegeven grootte gevat kan worden, en waarvan het uitvallen een maximale impact op het functioneren van het netwerk heeft. Met het door ons geformuleerde algoritme kan binnen polynomiale tijd een kritieke regio gevonden worden. Wij behandelen ook het regio-bewust verbeteren van een netwerk om de impact van een regionale storing te verminderen. Daarnaast beschouwen we het regio-disjuct pad probleem:.

(15) vii het vinden van twee paden met een minimaal totaal gewicht tussen een bron en bestemming die niet allebei verbroken kunnen worden door één regionale storing met een gegeven diameter (tenzij de storing zowel de bron als bestemming beslaat). We bewijzen dat het beslissingsprobleem of een regio-disjunct pad bestaat NP-hard is, en geven een heuristisch regio-disjunct pad algoritme. In het ontwerp van netwerken en architecturen is het belangrijk om bescherming te bieden tegen virussen, de verspreiding van spam, en andere besmettingsprocessen. Een eerste aanpak behelst netwerk adaptatie. We introduceren een continue-tijd adaptief Susceptible-Infectious-Susceptible (ASIS) model, waarin gezonde knopen kanten met genfecteerde knopen breken om besmetting te voorkomen, en juist verbindingen leggen met ander gezonde knopen, om het gedrag van zowel de topologie als de staat van de knopen te beschrijven. We leiden een exacte oplossing af voor de quasi-stabiele fractie van genfecteerde knopen in het geval dat de initiële topologie een complete graaf is, zonder gebruik te maken van gemiddeldveldtheorie. De epidemische drempel schaalt lineair als functie van de effectieve kant-breek snelheid. De quasi-stabiele topologie is sterk geconnecteerd en heeft een lage modulariteit in twee gevallen: (i) in het “sterk adaptieve” gebied onder een hoge effectieve verspreidingsgraad, en (ii) in het “zwak adaptieve” gebied onder een zeer lage effectieve verspreidingsgraad. Deze twee gebieden zijn gescheiden van de andere halfopen elliptische gebieden van zwakke connectiviteit en hoge modulariteit op een contour-achtige manier. Onze resultaten tonen aan dat wanneer de topologie zich aanpast aan de dynamiek van de infectie deze onderdrukt wordt, en dat het netwerk zich ontwikkelt tot een modulaire topologie waarin knopen assortatief mixen en een binomiaal- achtige graad verdeling hebben. Een tweede methode om een netwerk te beschermen tegen een SIS besmetting berust op een optimale decentrale beschermingsstrategie. We modeleren het systeem met behulp van speltheorie onder de aanname dat iedere knoop in het netwerk zichzelf kan beschermen tegen een constante prijs. We vinden equilibria en de “Price of Anarchy (PoA)” voor verschillende grafen en geven algoritmes om een puur equilibrium te berekenen.. Stojan Trajanovski.

(16)

(17) Contents Thesis Summary. i. Samenvatting. v. 1 Introduction. 1 2 3 3 4 5. 1.1 1.2 1.3. I. Basic Graph Theory notation . . . . . . . Research Objectives and Challenges . . . Thesis Outline . . . . . . . . . . . . . . . 1.3.1 Modular Structures in Networks . 1.3.2 Robustness, Design and Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . of Networks. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. Modular Structures in Networks. 7. 2 Topological and spectral properties of network metrics 2.1. 2.2. Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 New topological form and bounds for the modularity . . . . 2.1.2 Spectral form for the modularity . . . . . . . . . . . . . . . 2.1.3 Alternative forms for the modularity . . . . . . . . . . . . . 2.1.4 Relations with the assortativity and other spectral properties Fiedler’s Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 m-dimensional lattice graphs . . . . . . . . . . . . . . . . .. 3 Maximum modular networks 3.1. 3.2. 9 10 10 15 18 21 25 28 35. Maximum modular graphs for given numbers of munities c . . . . . . . . . . . . . . . . . . . . . 3.1.1 Graph modifications . . . . . . . . . . . 3.1.2 A maximum modular connected graph . Maximum modular graph for given L, N and c 3.2.1 The two communities case (c = 2) . . . ix. links . . . . . . . . . . . . . . .. L and . . . . . . . . . . . . . . . . . . . .. com. . . . . . . . . . . . . . .. . . . . .. 36 36 41 48 51.

(18) x. CONTENTS. 4 Generating tunable random networks 4.1. 4.2. II. 55. Generating graphs that approach a prescribed modularity: hardness and algorithms . . . . . . . . . . . . . . . . 4.1.1 Complexity of modular graph generation . . . . . . . . . . 4.1.2 Tunable modularity graph generator . . . . . . . . . . . . 4.1.3 Properties of the obtained graphs . . . . . . . . . . . . . . Generating random line graphs and inverse line graph conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Theoretical preliminaries . . . . . . . . . . . . . . . . . . 4.2.2 A random line graph model . . . . . . . . . . . . . . . . . 4.2.3 The assortativity of line graph H and corresponding root graph during the merging process . . . . . . . . . . . . . . . . .. . . . .. 56 57 61 67. . . .. 69 70 76. .. 78. Robustness, Design and Protection of Networks. 85. 5 Framework for topological metrics evaluation 5.1. 5.2. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 87 88 88 89 92 103 105 109 109 112 121. Model and problems statements . . . . . . . . . . . . . . . . . . Finding critical regions and links mitigation . . . . . . . . . . . 6.2.1 Theoretical basis . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Polynomial-time algorithm for detecting critical regions 6.2.3 Region-critical network augmentation . . . . . . . . . . Region-disjoint paths problem . . . . . . . . . . . . . . . . . . . 6.3.1 Complexity of the problem . . . . . . . . . . . . . . . . 6.3.2 Heuristic region-disjoint paths algorithm . . . . . . . . . Evaluation study . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Used data . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Effect of critical regions . . . . . . . . . . . . . . . . . . 6.4.3 Evaluation of region-critical network augmentation . . . 6.4.4 Evaluation of region-disjoint paths algorithms . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. 125 127 128 129 132 135 137 137 140 143 144 144 145 147. Framework and robustness in static networks . . . . . . . . . 5.1.1 Envelope computation and comparison . . . . . . . . . 5.1.2 Network perturbations or challenges . . . . . . . . . . 5.1.3 Robustness of random and real networks . . . . . . . . 5.1.4 Similarity of node-centrality measures . . . . . . . . . 5.1.5 Robustness optimization by degree-preserving rewiring Robustness in temporal networks . . . . . . . . . . . . . . . . 5.2.1 Temporal Robustness and attacking strategies . . . . . 5.2.2 Temporal Models . . . . . . . . . . . . . . . . . . . . . 5.2.3 Real temporal networks . . . . . . . . . . . . . . . . .. 6 Regions-aware network optimization 6.1 6.2. 6.3. 6.4.

(19) xi. CONTENTS. 7 Virus-spread protection in networks 7.1 7.2. 7.3. The SIS epidemic model . . . . . . . . . . . . . . . . . . . . . Virus-spread in adaptive networks . . . . . . . . . . . . . . . 7.2.1 Adaptive SIS model . . . . . . . . . . . . . . . . . . . 7.2.2 The steady-state infection in the adaptive ε-SIS model 7.2.3 The metastable state in a complete graph KN . . . . Decentralized SIS epidemics protection strategies: a game theoretic approach . . . . . . . . . . . . . . . . . . . . 7.3.1 NIMFA approximation for the SIS epidemics . . . . . 7.3.2 Game model on a single community network . . . . . 7.3.3 Game model in bipartite network . . . . . . . . . . . . 7.3.4 Game model in multi-communities network . . . . . . 7.3.5 Numerical evaluation . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . .. . . . . .. 149 149 150 151 154 156. . . . . . .. . . . . . .. . . . . . .. 170 171 173 181 186 189. . . . .. 193 194 195 197 198. 8 Conclusions 8.1 8.2 8.3 8.4. Modular structures in networks . . . . . . . . . . . . . . Assessment, optimization and design of robust networks Adaptive virus-spread and protection in networks . . . . Directions for Future Work . . . . . . . . . . . . . . . .. Bibliography. . . . .. . . . .. . . . .. . . . .. . . . .. 201. Publications by the author. 221 Relations to the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 223. Acknowledgements. 225. Curriculum Vitae. 227.

(20)

(21) Chapter. 1. Introduction Telecommunication, computer or transportation networks as well as the Internet, power grids, social and financial networks are even more important than before in people’s lives and the society. All those different networks, commonly known as complex networks, are interconnected and operate together. Although structurally different, several interesting phenomena have been discovered in many of the complex networks, such as power law distributions and the preferential attachment, small-world, or the existence of communities. Real-world networks are composed of hierarchically connected communities. For instance, people in online social networks tend to connect with their friends, forming cohesive groups of schoolmates, colleagues and others with similar interests. Web pages covering the same topic tend to link to each other rather than to web pages covering different topics. Surely, we are aware of the existence of communities in many of those networks, but it is very important to understand their internal structures as well as their interconnections and how do they operate together? The study of real-world communication systems via complex network models has greatly expanded our understanding on how information flows. Due to their importance to society, communication systems should be built to withstand failures. Over the last decade, there have been many efforts to characterize a network and its “goodness” via few indicators, even a single number. Several terms have been proposed: reliability, resilience, safety, maintainability, dependability and degree-distribution entropy [13, 133, 219, 57], however, due to the lack of common vocabulary and the differences in performance matrices that have to be maintained, a common consensus in the research community is not achieved. We define robustness as the maintenance of functionality under node or link removal. In this context, the functionality is measured by several graph metrics. However, cost considerations make network providers less inclined to take robustness measures against failures that are unlikely to manifest, like several failures coinciding simultaneously in different geographic regions of their network. 1.

(22) 2. INTRODUCTION. 1.1. The coupling between process and topology is natural in many cases. In epidemics, for example, after the observation that a relative is infected, either one may avoid him/her (in which case the social contact network is changed) or one may take medicines as a protection against the virus (in which case the curing rate is increased; without topology coupling). In human brain networks, Hebbian learning alters the connectivity between brain regions that are trained or neurally excited. Ant routing [53], self-restoring and self-adaptive networks are other examples. Although self-adaptation naturally occurs in biology, adaptive networks, in which the process interacts with the topology, are unfortunately difficult to analyze and it is fair to say that we have just started to understand the interplay between process and topology. Apart from the self-adaptation, defining an optimal protection strategy against viruses, spam propagation or any other kind of contamination, at the same time accounting the cost of such protection is also an important issue. The theory behind complex networks is the network science, which is quickly growing inter-disciplinary research field that relies on graph theory, spectral theory and dynamic systems theory.. 1.1. Basic Graph Theory notation. We start with a basic definition of a graph and some related concepts, which are used across the whole thesis. A network is represented as a graph G(N , L) consisting of a set N of N nodes and a set L of L links. Each node i ∈ N represents an entity, such as computer host or a sever in a computer network, a person in a social network, a city or a traffic hub in a transportation network. Each link (i, j) ∈ L, which connects two nodes i and j from N , represents a communication between the two nodes, such as fiber or optical connection in a wired network, a wireless connection in a wireless network, a friendship relation in a social network or a road in a transportation network. A graph could be weighted or unweighted. In a weighted graph, a link (i, j) has a weight w(i, j), which represent the capacity or delay in a communication network, or the length (distance) in a transportation network. The weights of all the links in an unweighted graph are equal to 1, therefore, an unweighted graph is a special case of a weighted graph. On the other hand, a graph could also be directed or undirected. If for two nodes i and j from N , there is a link connection only from i to j (but not from j to i) then G(N , L) is directed graph. If for each link (i, j) ∈ L holds (j, i) ∈ L, the connection is bidirectional and G(N , L) is undirected graph. Directed graphs are good models for client-server computer networks, social networks with followers relations (e.g., Twitter) or, one-directional transportation networks. A simple graph is an unweighted, undirected graph containing no self-loops (links starting and ending at the same node) nor multiple links between the same pair of nodes. Adjacency matrix A = {aij }i,j=1,2,...,N is most commonly used to represent a.

(23) 1.3. RESEARCH OBJECTIVES AND CHALLENGES. 3. graph. If there is a link from i to j i.e. (i, j) ∈ L then aij = 1, otherwise aij = 0. The adjacency matrix of an undirected graph is symmetric. The adjacency matrix of a simple graph is symmetric with 0s on the main diagonal (aii = 0). In an undirected graph, we denote byP the di the degree ofPnode i, which the number PN isP N N N of neighbors of i. It holds di = i=1 aij and 2L = i=1 di = i=1 j=1 aij . In a directed graph, we distinguish between in-degree and out-degree of node i. The incidence matrix or adjacency lists are other representations for a graph. More specific extensions of a graph, for example, time-varying (or temporal) graphs and graphs embedded in a plane are described in Chapters 5 and 6, respectively.. 1.2. Research Objectives and Challenges. This thesis aims to solve several fundamental questions in complex networks, from structures in networks, network robustness assessment and processes in networks. Some of the research challenges considered in this thesis are the following: • How do the graphs that attain a maximum modularity look like? What is the contribution in the modularity of placing a link within a community and what is the contribution of placing a link to connect two communities? • What is the computational hardness to construct a graph that attains a prescribed modularity? • How do different realizations of both random and targeted attacks differ? What is the gap in the robustness after removing the highest versus the lowest centrality nodes? Is there any redundancy in the network centrality rankings? • What is the effect on the network robustness by degree-preserve rewiring? • In which way the computational hardness of the problems changes by introducing the geometry and nodes embedding of the network? Does the geometry make robustness problems “easier”? Is it possible to propose approximation algorithms for the NP-hard problems? • What is “the final” the outlook of the initial network after the network adaptation in terms of topology and nodes susceptibility?. 1.3. Thesis Outline. The remainder of the thesis is organized as follows. The thesis is divided into two parts. The first part deals with the common structures in networks, in particular the existence of modular structures. The second part aims to evaluate the network performance under failures. In the second part, the interplay between some.

(24) 4. 1.3. INTRODUCTION. dynamic processes in a network, such as the virus-spread and the topology dynamics in order to ensure network protection is also studied. The thesis structure is visualized in Figure 1.1. Chapter 1 Introduction. I Modular Structures in Networks Chapter 2 Topological and spectral properties of network metrics. Chapter 3. Chapter 4. Maximum modular networks. Generating tunable random networks. II Assessment, Design and Processes in Networks Chapter 5. Chapter 6. Chapter 7. Framework for topological metrics evaluation. Regions-aware network optimization. Virus-spread in adaptive networks. Chapter 8 Conclusions. Figure 1.1: Thesis structure.. 1.3.1. Modular Structures in Networks. In Chapter 2, we consider the existence of communities in networks. We take Newman’s modularity as a metric that has been considered important quantitative metric for communities and clusters detection in networks. Several related metrics, such as the assortativity has also been considered. In this chapter, several new modularity expressions and bounds have been found, in both spectral and topological forms. Those expressions and bounds offer an opportunity to find the structure of maximum modular graphs. Several other metrics have been studied, such as Fiedler’s partitioning in the non-trivial m-dimensional lattice, which is another metric for characterizing communities existence in a network, as well as the relation of the modularity with other metrics such as the hop-count and the assortativity. Based on the bounds determined in Chapter 2, the maximum modular graphs, namely the graphs that attain a maximum modularity under certain conditions,.

(25) 1.3. THESIS OUTLINE. 5. are considered in Chapter 3. Here, we start with an analyzes of the effect of link rewiring on the modularity of a graph. We proceed with finding the class of maximum modular graphs with given numbers of links and communities. The modularity is determined by the numbers of links within communities and the number of links that link communities. We show that the residue obtained when the number of links is divided by the number of partitions plays a special role in finding the maximum modular graphs. The complexity of generating graphs with a given modularity is discussed in Chapter 4. We prove that deciding whether a graph, with a given modularity, number of links, and a partitioning into two communities, exists, is NP-complete. Subsequently, based on the link the effect of link rewiring, which was discussed in Chapter 3, we propose a heuristic algorithm for generating network structures with a given modularity and number of communities. Due to the nice properties of the line graphs and the relation with their original root graphs, in this chapter, we also consider other types of graphs that approximately approach a prescribed negative assortativity by an iterative merging procedure.. 1.3.2. Robustness, Design and Protection of Networks. Our robustness envelope metrics and a framework for robustness assessment are presented in Chapter 5. In particular, metric envelopes of random networks as well as real-world networks, both static and time-varying, are studied in this chapter. Here, we consider the extent to which different targeted attack strategies overlap. The changes of the network envelopes under degree-preserving rewiring are also explored. Due to the importance of the nodes location in a network as well as the geographical nature of network failures, in Chapter 6, we study non-trivial robustness optimization problems that are aware of a network and failures geometry. We first determine that finding the most vulnerable location for a predefined failure shape is polynomially solvable and, subsequently propose an appropriate algorithm. On the other hand, we have proved that the NP-hardness of the regiondisjoint paths problem even in its decision variant, that is finding the existence of two paths between dedicated source and destination with all the intermediate nodes from the first path being on a distance from all the intermediate nodes from the second path. The last implies that no single failure with a diameter at most that distance can destroy both paths unless that failure affects the source or the destination. In Chapter 7, we study the possibilities of network protection against virusspread, either by network adaptation or by defining an optimal protection strategy. We first start with studying the interplay between the topology and virus-spread dynamics. Consequently, we study a continuous-time adaptive SIS model (ASIS model ) and we provide theoretical results for various metrics of interest (e.g., the fraction of infected nodes, the epidemic threshold, and the modularity) that are also verified by simulations. The properties of the final topology in the meta-.

(26) 6. INTRODUCTION. 1.3. stable state are also studied. Finally, we consider finding optimal decentralized protection strategies from a game-theoretic perspective. Here, we find pure and mixed equilibria in some network topologies and propose fast iterative algorithm to compute the Nash equilibrium. Finally, the main conclusions are articulated, directions for future work are given in Chapter 8..

(27) Part I. Modular Structures in Networks. 7.

(28)

(29) Chapter. 2. Topological and spectral properties of network metrics Graph communities reveal important structural features of the topology. The communities can be observed as sections of the graph topology that exhibit relatively higher levels of connections within the regions and lower connectivity between the regions. Such structure plays a significant role in both the sorting of nodes and the evolution of processes on graphs, including slowing spreading epidemics and containing cascading failures [34, 77, 164]. Although there is not a common consensus of community that is accepted for all graph-based systems, a metric known as modularity has led to a surge of research in community structure discovery and analysis [160]. Modularity is a quantitative measure for detecting the presence of communities which can aid network exploration by dividing the analysis of interactions into inter-community and intra-community analyses. A thorough survey of the community definitions, community detection algorithms, the modularity and close variations is given by Fortunato [64]. A significant fraction of this research focuses on ways to assign nodes to communities in efforts to maximize the modularity metric [106]. Other pursuits have begun characterizing the modularity metric, investigating the counterintuitive non-trivial expected values for random graph models and lattices [84] and the upper bound and partitioning resolution [50, 177, 65, 110, 10, 78]. Related to the modularity, a measure for quality of the communities has been proposed by Delvenne et al. [52], who also found that “balanced” communities leads to high modularity. Unlike this work, we re-write modularity in a suitable form, such that, the property of “balanced” communities could be observed. Finally, an upper bound, which is only based on the number of links between the communities by an order of O(L−1 ) has been determined by Fortunato and Barthélemy [65]. However, we have extended this result by deducing a tighter upper bound and an additional dependence on the cumulative degree differences by an order of O(L−2 ). Characterizing modularity could be also done by examining how it relates to 9.

(30) 10. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.1. other significant graph metrics. Introduced only shortly before modularity, assortativity is a correlation of the similarities of nodes sharing a link [151, 152]. Newman suggested that one driving factor in the formation of communities was the preference of nodes to connect to other nodes that possessed similar characteristics to their own. This has been observed in some social networks where the similarities are race related [159]. Within topological analysis, assortativity is most commonly used with the node degrees or node strengths. Previous work has shown relations between the assortativity of a graph and the characteristic path length, the fraction of nodes in the giant component, the clustering coefficient, the robustness, the spectra of the adjacency matrix, and the modularity [234, 23, 90, 95].. 2.1. Modularity. The modularity m, proposed by Newman [160], is a measure of the quality of a particular division of the network, which is defined in [156] as m=. N N di dj 1 XX aij − 1{i and j 2L i=1 j=1 2L. belong to the same community}. (2.1). where aij is the matrix element of the adjacency matrix A of the graph with N nodes and L links. The modularity is proportional to the number of links falling within clusters or groups minus the expected number in an equivalent network with links placed at random. Thus, if the number of links within a group is no better than random, the modularity is zero. A modularity approaching one reflects networks with strong community structure: a dense intra-group and a sparse inter-group connection pattern. If links are placed at random, then the di dj expected number of links between node i and node j equals 2L , where dj is the degree of node j.. 2.1.1. New topological form and bounds for the modularity. The general definition (2.1) is first rewritten as follows. We transform the nodal representation to a counting over links l = i ∼ j such that N X N X i=1 j=1. aij 1{i and j. belong to the same cluster}. =2. c X. Lk. k=1. where Lk is the number of links of cluster Ck , and the factor 2 arises from the fact that all links are counted twice, due the symmetry A = AT of the adjacency matrix. If we denote by Linter the number of inter-community links, i.e. the number of links that are cut by partitioning the network into c communities or.

(31) 2.1. 11. MODULARITY. clusters, then L=. c X. Lk + Linter. k=1. Similarly, N X N X. di dj 1{i and j. c X. =. belong to the same cluster}. i=1 j=1. k=1. ! X. di.  X. . i∈Ck. j∈Ck. dj  =. c X. 2 DC k. k=1. where DCk =. X. di. i∈Ck. is the sum of the degrees of all nodes that belong to cluster Ck . Clearly, DCk ≥ 2Lk , because some nodes in cluster Ck may possess links connected Pc to nodes in other clusters. The basic law of the degree then shows that k=1 DCk = 2L. Substituting these expressions in the definition (2.1) leads to an alternative expression1 for the modularity 2 ! c X DCk Lk − (2.2) m= L 2L k=1. The faction under the square in (2.2) can also be written as nk E [Dk ] DCk = 2L N E [D]. (2.3). where nk and Dk are the number of nodes and the degree of a random node in cluster Ck , respectively. According to (2.2) and (2.3), the modularity is determined by the fraction of internal links in the clusters ( LLk ) and the square of the fraction of nodes per cluster ( nNk ) multiplied by the ratio of the average degree in k] a cluster over the average degree in the network ( E[D ). PE[D] c Invoking the Cauchy identity (see [212]) and k=1 DCk = 2L, c X. 2. 2 = DC k. k=1. c. j−1. 2 (2L) 1 XX + DCj − DCk c c j=2 k=1. results in another expression for the modularity c. m=1−. j−1 . Linter 1 1 XX − − L c c j=2. k=1. 1 Newman. DCj − DCk 2L. [157] presents still another expression for the modularity.. 2 (2.4).

(32) 12. 2.1. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. Since the double sum is always positive, (2.4) provides us with an upper bound for the modularity, 1 Linter m≤1− − (2.5) c L The upper bound (2.5) is only attained if the degree sum of all clusters is the same. In passing, we mention that (2.4) rigorously proves the upper bound derived by Fortunato [65] based on a cyclic chain of same subgraphs, for which, indeed, DCj = DCk for each pair (j, k). In addition, the upper bound (2.5) shows that m ≤ 1 and that a modularity of 1 is only reached asymptotically, when the number of clusters c → ∞ and Linter = o (L), implying that the fraction of intercommunity links over the total number of links L is vanishingly small for large graphs (N → ∞ and L → ∞). Regarding the lower bound, Brandes et al. [22] proved that the modularity of any graph is never smaller than − 12 . We provide another lower bound in terms Pc of L and c. For c > 1 and DCk ≥ 1 for all k, using k=1 DCk = 2L, we have the following c X. 2 DC = k. k=1. =. c X. (DCk − 1) + 1. 2. =. k=1 c X. c X. (DCk − 1)2 + 2. k=1. c X. (DCk − 1) +. k=1. (DCk − 1)2 + 4L − c. c X. 1. k=1. (2.6). k=1. Pc Pc The inequality k=1 x2k ≤ ( k=1 xk )2 holds for all xk ≥ 0, k = 1, 2, . . . , c and it boils down to equality if no more than one of all xk is strictly positive. Using the last inequality for xk = DCk − 1 in (2.6), yields c X k=1. 2 DC ≤ k. c X. (DCk − 1). 2. + 4L − c = (2L − c)2 + 4L − c. (2.7). k=1. Applying (2.7) into (2.2), gives m≥. c X Lk k=1. =1−. L. −. 1 Linter 1 2 2 (2L − c) + 4L − c = 1 − − (2L − c) + 4L − c 4L2 L 4L2. 1 c 2 1 Linter − − (1 − )(1 − ) c L c 2L. (2.8). which is yet another lower bound of the modularity. Interestingly, upper (2.5) and lower (2.8) bounds only differ in the last term of (2.8). Equality in (2.8) holds if and only if all but one of DCk are equal to 1. Using the fact that Linter ≤ L, the bound in (2.8) could be further improved to 1 1 c 2 ) m ≥ − − (1 − )(1 − c c 2L. (2.9).

(33) 2.1. MODULARITY. 13. Equality in (2.9) holds if and only if all but one of DCk are equal to 1 and Linter = L, Lk = 0 for k = 1, 2, . . . , c. Hence, (c − 1) communities have degree 1 and only 1 inter-community link. This is possible for a star graph, where each node is a single community. In such a case L = N − 1 = c − 1 and the bound in c 1 )2 = − 4(1− ≥ − 21 for c ≥ 2, (2.9) boils down to m ≥ − 1c − (1 − 1c )(1 − 2(c−1) 1 c) which is accordance to the bound in [22].

(34)

(35) Let D∆C = max{Cj ,Ck }

(36) DCj − DCk

(37) , then a lower bound of the modularity, deduced from the expression (2.4), is 2 Linter 1 (c − 1) D∆C m≥1− − − (2.10) L c 2 2L Only if D∆C = 0, the lower bound (2.10) equals the upper bound (2.5) and the equality sign can occur. Excluding the case that D∆C = 0, then not all DCj are equal, and we may assume an ordering DC1 ≥ DC2 ≥ . . . ≥ DCc , with at least one strict inequality. We demonstrate that, for c > 2, not all differences DCj − DCk = D∆C are positive for any pair (j, k). For that, assume the contrary, DC1 − DC2 = DC2 − DC3 = DC1 − DC3 = D∆C > 0, then D∆C = DC1 − DC3 = (DC1 − DC2 ) + (DC2 − DC3 ) = 2D∆C , which cannot hold for D∆C > 0. Hence, if D∆C > 0, the inequality in (2.10) is strict; alternatively, the lower bound (2.10) is not attainable in that case. We now focus on non-negative modularity, which reflects the case with physically more distinguishable communities. The requirement that the lower bound (2.10) is non-negative, supplies us with an upper bound for the maximum difference D∆C in the nodal degree sum between two clusters in a “modular” graph s 2 Linter 1 D∆C ≤ 2L 1− − (2.11) c−1 L c For c > 1, (2.11) demonstrates that D∆C < 2L. In order to closely depend the properties of graphs that exhibit modular structure, we want the modularity be as high as possible. Ignoring the integer nature of c, the lower bound (2.10) is maximized with respect to the number of communities c for2 √ 2 2L √ ∗ (2.12) > 2 c = D∆C resulting in an improved lower bound, independent in c, 2 Linter √ D∆C 1 D∆C m≥1− − 2 + L 2L 2 2L L (c−1) d(− inter − 1 −. (2.13) D. ∆C. 2. ). L c 2 2L first derivative in c on the right side in (2.10) is = c12 − dc 2 1 1 D∆C is equalized to zero, while the second derivative is −2 c3 < 0, hence we have a 2 2L maximum.. 2 The.

(38) 14. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.1. ∆C The right-hand side in this lower bound (2.13) could be rewritten as 12 ( D2L − q q q √ √ √ D∆C D∆C Linter 2(1 + Linter 2(1 + Linter L ))( 2L − 2(1 − L )). We have 2L < 1 < L ) q √ ∆C < 2(1 − Linter and the lower bound in (2.13) is positive only if D2L L ), implying. √. ∆C ≤ 3−22 2 , then D2L <1≤ that the graph exhibits modular structure. If Linter L q √ Linter 2(1 − L ), hence there is no new insight. Finally, the lower bound in (2.13) q √ ∆C < min{1, 2(1 − Linter is positive for D2L L )}. Another presentation for the modularity applies the identity. 2  j−1 n X n n X n n X X X X  xj xk x2j + 2 xj xk = xj  =. (2.14). j=2 k=1. j=1. j=1 k=1. j=1. to xj = DCj , c X. 2. 2 DC = (2L) − 2 k. j−1 c X X. DCj DCk. j=2 k=1. k=1. such that (2.2) is rewritten as m=. j−1 c X X. 2 (2L). 2 j=2 k=1. DCj DCk −. Linter L. (2.15). Pc Using the basic law of the degree, k=1 DCk = 2L, the first term in (2.15) is maximized, as follows from a similar Lagrangian argument as before, when the degree is distributed uniformly across the communities as DCk = 2L/c, resulting in 2 2 2 c Linter 1 c −c Linter 2L 2L 1 X − = − (j − 1) m≤ 2 2 2L j=2 c L 2L c 2 L =1−. 1 Linter − c L. which is again the upper bound (2.5) and, hence, agrees with the degree balancing of (2.4). Equations (2.4) and (2.15) present the maximization of the modularity from dual perspectives, yet both point to a common solution of degree balancing and minimizing Linter . Finally, we present a probabilistic setting for the modularity m by defining the random variable DP C as the sum of the degree in an arbitrary cluster. The c ∗ average is E [DC ] = 1c k=1 DCk = 2L c and, comparing with the estimate c in 2L (2.12), it always holds that c = E[DG ] . However, the estimate (2.12) suggests that the extreme difference D∆C is not that far way from the mean, roughly by.

(39) 2.1. MODULARITY. a factor of. √. 15. 2. Further, with c. 1X 2 2 DCk = Var [DC ] + (E [DC ]) c k=1. the expression (2.2) for the modularity becomes m=1−. Linter 1 c − − Var [DC ] L c (2L)2. (2.16). which, again, leads to the upper bound (2.5) when the variance is zero, i.e. when all clusters have an equal degree sum. Incidentally, comparing (2.4) and (2.16), we find that j−1 c X X DCj − DCk 2 Var [DC ] = c j=2 k=1. and this is a general result that holds for any random variable in a specific graph [212].. 2.1.2. Spectral form for the modularity. The N × c community matrix S, defined as 1 if node i belongs to community k Sik = 0 otherwise can be used to rephrase the condition in (2.1) as 1{i and j. belong to the same community}. =. c X. Sik Sjk. k=1. leading to the matrix representation of the modularity c. N. N. trace S T M S 1 XXX Sik mij Sjk = m= 2L 2L i=1 j=1. (2.17). k=1. where. 1 d.dT (2.18) 2L is the modularity matrix and d is the degree vector. We define the community vector sk , which equals the k-th column of the community matrix S and which specifies the k-th cluster: all components of sk , corresponding to nodes belonging to cluster Ck , are equal to one, otherwise they are zero. Using the eigenvalue decomposition of the symmetric modularity matrix M = W diag(λj (M )) W T , where W is the orthogonal N × N matrix with the j-th M =A−.

(40) 16. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.1. eigenvector wj belonging to λj (M ) in column j, the general spectral expression for the modularity m follows from (2.17) as T ! N c trace W T S diag (λj (M )) W T S 1 X X T 2 wj sk m= = λj (M ) 2L 2L j=1 k=1. (2.19) PN T because W T S jk = l=1 Wlj Slk = wj sk . In particular, the scalar product P wjT sk = q∈Ck (wj )q is the sum of those eigenvector components of wj that belong to cluster Ck . If we write the PNcommunity vector as a linear combination of the eigenvectors of M , sk = j=1 βkj wj , then the orthogonality of eigenvectors indicates that the coefficients equal βkj = wjT sk . Moreover, the vectors Pc s1 , s2 , . . . , sc are orthogonal vectors, and, by definition, k=1 sk = u. Since u is an eigenvector of M belonging to the zero eigenvalue as follows from the definition of the modularity matrix (2.18) because M.u = Au −. 1 d.dT u = d − d = 0 2L. and Au = d and dT u = 2L, we observe that c X. wjT sk = 0. k=1. provided the eigenvector wj 6= u. Using the Cauchy identity c 2 X c wjT sk − k=1. c X. !2 wjT sk. =. k=1. c c c m−1 2 2 X X T 1 X X T wj (sm − sk ) = wj (sm − sk ) 2 m=1 m=2 k=1. k=1. we find that N. 1 X m= 2Lc j=1. c m−1 X X. 2 wjT (sm − sk ). ! λj (M ). (2.20). m=2 k=1. For c = 2 and y = s1 − s2 , which is a vector with component yj = 1 if node j belongs to cluster C1 and yj = −1 if node j belongs to cluster C2 , the general relation (2.20) reduces to N. m2 =. 1 X 2 β λj (M ) 4L j=1 j. (2.21). PN where y = j=1 βj wj with βj = y T wj . Expression (2.21) was Newman’s starting point in [156] for his iterated bisection method..

(41) 2.1. MODULARITY. Since W W T = I, we have that trace. . WTS. T. WTS. . 17. = trace S T S = N. (see [212]), such that N X c X. wjT sk. 2. =N. (2.22). j=1 k=1. PN In the bi-cluster case where c = 2, we see that y T y = N such that j=1 βj2 = N . Let wq = √uN denote the eigenvector of M belonging to the eigenvalue λq (M ) = 0, then c c c X 2 1 X T 2 1 X 2 wqT sk = u sk = nk N N k=1. k=1. k=1. where nk is the number of nodes in cluster Ck . By applying the inequality ak a1 + a2 + · · · + an ak ≤ ≤ max 1≤k≤n qk 1≤k≤n qk q1 + q2 + · · · + qn min. (2.23). where q1 , q2 , . . . , qn are positive real numbers and a1 , a2 , . . . , an are real numbers, P P 2 2 PN c c T T λj (M ) λj (M ) k=1 wj sk j=1;j6=q k=1 wj sk ≤ max = λ1 (M ) PN Pc Pc 2 2 T T 1≤j≤N j=1;j6=q k=1 wj sk k=1 wj sk 2L N ,. a spectral upper bound for the modularity ! c 1 X 2 λ1 (M ) nk 1− 2 m≤ E [D] N. from which we find, with E [D] =. k=1. This bound can also be written as λ1 (M ) m≤ E [D]. . c 1 1 − − 2 Var [nC ] c N. . where nC is the number of nodes in an arbitrary cluster, because E [nC ] = 1c Pc N k=1 nk = c . Since Var[nC ] ≥ 0, we arrive at the upper bound λ1 (M ) 1 m≤ 1− (2.24) E [D] c We observe that (2.24) may lead to a sharper upper bound than (2.5) if λ1 (M ) < E [D] (see e.g. Figure 2.2b below). We have shown in [212] that the eigenvalues of the modularity matrix M = 1 A − 2L d.dT are interlaced with those of A, λ1 (A) > λ1 (M ) ≥ λ2 (A) ≥ λ2 (M ) ≥ . . . ≥ λN (A) ≥ λN (M ) Hence, increasing λ2 (A) implies increasing λ1 (M ). For regular graphs, all eigenvalues of the modularity matrix M are the same as those of the adjacency matrix A, except that λ1 (A) is replaced by a zero eigenvalue..

(42) 18. 2.1.3. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.1. Alternative forms for the modularity. Here, two more forms of the modularity are given: in terms of the eigenvalue spacing and a matrix form. Eigenvalue spacing form Partial (or Abel) summation n X. ak bk =. k=1. of m∗ = (2L)m = ∗. m =. =. PN. n−1 X. k X. k=1. l=1. ! al. (bk − bk+1 ) + bn. n X. ! al. l=1. vj λj (M ) equals. j=1. N −1 X. j X. j=1. k=1. N −1 X. j X. j=1. k=1. ! (λj (M ) − λj+1 (M )) + λN (M ). vk. N X. vk. k=1. ! (λj (M ) − λj+1 (M )) + N λN (M ). vk. and only the last term is negative, all others are positive. Further, using Pj while P N the condition (2.22) such that k=1 vk + k=j+1 vk = N , we can write ∗. m =. j c X X j=1. +. ! (λj (M ) − λj+1 (M )). vk. k=1. N −1 X. . N X. N −. j=c+1. =. j X. j=1. k=1. −. vk  (λj (M ) − λj+1 (M )) + N λN (M ). k=j+1. c X. N −1 X. . ! vk. . (λj (M ) − λj+1 (M )) + N λc+1 (M ) N X.  j=c+1.  vk  (λj (M ) − λj+1 (M )). k=j+1. Again, by shifting all the weight to the first c clusters such that λc+1 (M ) ≥ 0 and vk = 0 for k ≥ c + 1, an upper bound of the modularity can be written in term of the spacings λj (M ) − λj+1 (M ) as ∗. m ≤. j c X X j=1. k=1. ! vk. (λj (M ) − λj+1 (M )) + N λc+1 (M ). (2.25).

(43) 2.1. MODULARITY. 19. This expression shows that the larger indices of j are more heavily weighted. Recall that the spacing ∆λj = λj (M )−λj+1 (M ) ≥ 0 is not necessarily decreasing with j. The number of positive eigenvalues of M is, due to interlacing, less (by one of zero) than the number of positive eigenvalues of A. The latter is larger than the independence number (see [212]) which is equal to the largest co-clique. Thus, when maximizing the modularity (such that λc+1 (M ) ≥ 0), the number of clusters c should be smaller than the independence number of the graph. Matrix form for the modularity The identity (2.14) and (2.15) also offer a new matrix representation of the modularity as Linter 1 T T 2 DC DC − diag DC u− (2.26) m= 2u i L (2L) where DC is the c × 1 vector of community degree sums, which equals DC = S T d Using that relation gives Linter L (2L) (2L) 1 Linter 1 T T T T T = 2 u S dd Su − 2 d SS d − L (2L) (2L). m=. 1. T T 2 u DC DC u. −. 1. T 2 DC DC. −. With Suc×1 = uN ×1 and uT d = 2L, we arrive at m=1−. Linter 1 T T − 2 d SS d L (2L). which equals (2.2). Maximizing the modularity Brandes et al. [22] have proved that finding a clustering with maximum modularity on a given graph is NP-hard. In this section, we consider the spectral form (2.19) to deduce further insight along the lines of Newman in [155]. We define the non-negative weights c X 2 vj = wjT sk k=1. and the modularity in (2.19) becomes N. m=. 1 X vj λj (M ) 2L j=1. (2.27).

(44) 20. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.1. 2 Pc First, assume that v1 = k=1 w1T sk = N , then (2.22) implies that wjT sk = 0 for all 1 ≤ k ≤ c and all j > 1. This means Pcthat the vectors s1 , s2 , . . . , sc (or any linear combination of them, apart from k=1 sk = u) are orthogonal to the eigenvectors w2 , w3 , . . . , wN . Since eigenvectors span the N -dimensional space, it means that all sk must be parallel or proportional to w1 . However, the vectors s1 , s2 , . . . , sc are orthogonal, such that this is not possible. If there are c clusters, it seems that we must require in (2.22) into account, that r X vj = N j=1. for r ≥ c − 1, and that, necessarily at least c − 1 eigenvalues in (2.19) play a role, because then wjT sk = 0 for all 1 ≤ k ≤ c and all j ≥ c. This means that the vectors Pc s1 , s2 , . . . , sc (or any linear combination of them, apart from k=1 sk = u that is u proportional to wq = √N ) are orthogonal to the eigenvectors wc , wc+1 , . . . , wN , or equivalently, that each community vector sk (with zero or one components) is Pc−1 a linear combination of the first c − 1 eigenvectors of M , sk = j=1 βkj wj , where P Pc βkj = wjT sk = q∈Ck (wj )q . Together with the equation k=1 sk = u, this set of equations is sufficient to determine all c community vectors. How to choose these coefficients βkj to optimize (2.19) remains a difficult problem. For example, we may adopt the strategy to choose the weights vj so that vj ≥ vj+1 for each 1 ≤ j < c. However, by incorporating an additional eigenvector wc , it can be possible to increase the weight v1 corresponding to λ1 (MP ) more (despite a lower c λc (M ) is included in the sum (2.19) and an extra vc in j=1 vj = N , such that the average weight E [v] = Nc decreases). Numerical computations (see Figure 2.4 below) show that all vector components in (2.19) seem to play a role, but that the first eigenvector w1 is by far the most important. For a regular graph with degree r, it is known [212] that λ1 (M ) = λ2 (A) < λ1 (A) = r = E [D] such that the bound (2.24) equals mregular graph ≤. λ2 (A) λ1 (A). 1 1 1− <1− c c. Since in general λ1 (A) > λ1 (M ) ≥ λ2 (A), we observe that the lowest upper bound in (2.24) is reached for regular graphs. Another consequence of the interlacing is that in graphs with large spectral gap λ1 (A) − λ2 (A), the largest eigenvalue λ1 (M ) can be much smaller than λ1 (A). For example, the complete graph, that possesses the largest possible spectral gap equal to N , has λ1 (M ) = 0, the lowest possible largest eigenvalue of any modularity matrix M . Intuitively, graphs with large spectral gap are difficult to tear apart, which means that they form already a quite tight community or cluster and that further dividing such a graph is hardly possible, resulting in a low modularity m..

(45) 2.1. 2.1.4. MODULARITY. 21. Relations with the assortativity and other spectral properties. Networks where high-degree nodes preferentially connect to other high-degree nodes are called assortative, whereas networks where high-degree nodes connect to low-degree nodes are called disassortative. Assortativity is measured by the linear degree correlation coefficient ρD , but we use here assortativity and ρD interchangeably. Formally, it is defined [152] as P 2 i∼j (di − dj ) (2.28) ρD = 1 − Pi=N PN 2 2 1 3 i=1 di − 2L ( i=1 di ) where i ∼ j denotes a link between nodes i and j, di the degree of node i and D = [d1 , d2 , . . . , dN ] the degree-sequence of the network. The degree assortativity has been shown [49] to be an important indicator for the epidemic spread such that assortative networks spread are more prone to the propagation of epidemics. Van Mieghem et al. [217] have shown that increasing the assortativity also increases the lower bound for λ1 (A), but not necessarily λ2 (A). However, by 1 degree-preserving rewiring the matrix 2L d.dT is not changed, only A is. This implies that increasing ρD via degree-preserving rewiring does not change the sum of the eigenvalues of M , but it may increase the upper bound of λ1 (M ) and, hence, via (2.24) also the modularity. In any case, it will not decrease the upper bound for λ1 (M ) as follows from the interlacing property above. Assortativity, maximum modularity and the spectrum of A and M Figure 2.3 shows the twenty largest (in absolute value) eigenvalues of both the adjacency matrix A and the modularity matrix M of a realization of the BarabásiAlbert scale free graph with N = 500 nodes, L = 1960 links, E [D] ≈ 7.85 and ρD ' −0.05. Each elementary degree-preserving rewiring step3 , specified by the lemma in [217] that changes the assortativity, results in a connected different graph (with the same degree vector). From one rewiring step to another, the large majority of eigenvalues of A (and, similarly of M ) are interlaced as shown in [217], while always, the eigenvalues of M are interlaced by those of A. The white band of eigenvalues around zero in Figure 2.3 thus contains 480 smaller eigenvalues (which are not shown because the picture would color completely). Also the relationship between assortativity (via ρD ) and the percentage of degree-preserving rewired links is shown, together with the hopcount distribution of the original graph (no rewiring), and that at ρD max and ρD min . Figures 2.2a and 2.2b show, for a realization of the rewired Barabási-Albert scale free graph and of the rewired Erd˝ os-Rényi random graph with equal N and almost equal number of links L respectively, how the first few eigenvalues 3 A degree-preserved rewired Barab´ asi-Albert scale free graph (and similarly an Erd˝ os-R´ enyi random graph), where ρD is significantly changed, is not a Barab´ asi-Albert scale free graph anymore, which is characterized by ρD → 0 asymptotically (and ρD = 0 for Erd˝ os-R´ enyi random graphs as shown in [217])..

(46) 22. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. . . . (. . r. . . k H. . 2.1. #$ % &%

(47) % ' %%

(48) % '. . . ! "#

(49) k. A

(50) . . . . . . . . . . . . . . M

(51) . . . Figure 2.1: The 20 largest (in absolute value) eigenvalues of the adjacency matrix A and of the modularity matrix M as function of the percentage degree-preserving rewired links in an instance of the Barabasi-Albert scale free graph with N = 500 and L = 1960 links. The relation between ρD and the percentage of rewired links as well as three hopcount distributions are also plotted.. of the adjacency and modularity matrix vary with the linear degree correlation coefficient ρD . We observe for the two realizations of different classes of graphs that, in the disassortativity region (ρD < 0), λ1 (M ) follows λ2 (A) reasonably well, while, in the assortativity region, λ1 (M ) starts increasing towards λ1 (A). However, λ1 (M ) never reaches λ1 (A), because λ1 (M ) is always strictly smaller than λ1 (A), as proved in [212]. As a consequence and assuming that there is room to in- or decrease ρD , the larger the spectral gap, λ1 (A) − λ2 (A), the larger the potential increase in modularity that can be achieved via degree-preserving rewiring. Figure 2.2a and Figure 2.2b also illustrate that the maximum modularity is roughly proportional to λ1 (M ) as long as λ1 (M ) is close to λ2 (A). For increasing assortativity, the maximum modularity seems to increase faster than λ1 (M )..

(52) 2.1. 23. MODULARITY. .

(53) !. l #$% l#$% l #%. . !

(54) !! !

(55) ".

(56) . . .

(57) . . . . . . !

(58) . l"#$ l "#$ l"$. . "

(59) ! !!

(60) . . . .

(61) r. (a). .

(62)

(63) r. . (b). Figure 2.2: (a) The largest and second largest eigenvalues of the adjacency matrix A and the largest eigenvalues of the modularity matrix M versus the linear degree correlation coefficient ρD for: (a) the Barab´ asi-Albert graph with N = 500 nodes and L = 1960 links; (b) for the Erd˝ os-Rényi random graph Gp (N) with N = 500 nodes L = 1955 links and ρD ' −0.01. Thus, the link density p = L/ N2 equals p ' 1.25pc , where pc ∼ logNN is the critical disconnectivity threshold. The right-hand side axis shows the corresponding maximum modularity.. Apart from the extent in assortativity range, the rewired Barabási-Albert scale free graph (Figure 2.2a) and the rewired Erd˝os-Rényi random graph (Figure 2.2b), both with same number of nodes and almost same number of links, behave surprisingly similar, in spite of their different degree vector. For three instances of the rewired Barab´ asi-Albert scale free graph, Figure 2.4 draws each term vj λj (M ) in the spectral form (2.19) of the modularity, as well as each weight vj and eigenvalue λj (M ) for 1 ≤ j ≤ 500. The insert in Figure 2.4 shows that the weights vj vary irregularly, as a noisy signal around the mean 1, and that, a very high peak (on a log-scale) is observed corresponding to λq (M ) = 0, which has inspired us to the general bound (2.24). Apart from that peak corresponding to the eigenvector wq = √uN , the weights roughly decrease with the component or eigenvector j. Apart from a few values, the resulting product vj λj (M ) is decreasing in j. Although only shown for the Barabási-Albert scale free graph, these observations are generally observed: the first term v1 λ1 (M ) contributes dominantly to the modularity (2.27) and illustrates that the bound (2.24) can be sharp. The other terms j > 1 are initially positive, but then negative (because the eigenvalues become negative) and the whole sum is needed to compute the modularity. Remarkably, a huge cancellation in the sum occurs because we found that the sum (2.19) is close to its first term. “Shifting-the-weights” principle Since the eigenvalues of M are ordered as usual, λ1 (M ) ≥ λ2 (M ) ≥ . . . ≥ λN (M ), the maximum modularity is achieved by shifting in (2.19) as much weight as possible to the larger eigenvalues, which we.

(64) 24. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. . . . (. . r. . . k H. . 2.1. #$ % &%

(65) % ' %%

(66) % '. . . ! "#

(67) k. A

(68) . . . . . . . . . . . . . . M

(69) . . . Figure 2.3: The 20 largest (in absolute value) eigenvalues of the adjacency matrix A and of the modularity matrix M as function of the percentage degree-preserving rewired links in an instance of the Barabasi-Albert scale free graph with N = 500 and L = 1960 links. The relation between ρD and the percentage of rewired links as well as three hopcount distributions are also plotted.. call the “shifting-the-weights” principle. Figure 2.4 supports this principle: fewer eigenvalues in (2.19) imply that the individual weights vj are higher on average, due to the condition (2.22). Furthermore, Figure 2.4 illustrates that, especially in the high assortativity regime, the first c eigenvalues are clearly dominant, as argued in Section 2.1.3. When the largest eigenvalues are close to each other, incorporating additional eigenvalues may increase the weights on the largest eigenvalues, which leads to a larger modularity. As the assortativity increases, the largest eigenvalues of matrix M seem to be dispelled from each other (see Figure 2.3). In other words, spacing λ1 (M ) − λ2 (M ) , λ2 (M ) − λ3 (M ) , ... between the largest eigenvalues of the matrix M seems to grow as the assortativity increases. For large ρD , the maximal.

(70) 2.2. FIEDLER’S CLUSTERING. . . rD rD rD . . 25. . jv. )M(. . . . )M(. j. l. . l jv. j. . . . . . . .

(71)

(72) j. . . . . . .

(73)

(74) j. Figure 2.4: The product vj λj (M ) for each component j for three instances, the original BA-graph (in red), the maximum assortativity rewired version (black) and the maximum disassortative rewired graph (gree). The sum over all j equals 2Lm according to (2.19). The insert shows the weights vj (at the left on log-scale) and the eigenvalues λj (M ) (at the right in bold dash lines).. modularity includes a minimum amount4 of eigenvalues in (2.19). The chance to increase the modularity m by incorporating more eigenvalues is small because a) the average weight is reduced due to the condition (2.22) and b) the extra eigenvalues included are far smaller when the spacings of the leading eigenvalues are large. As a result, the modularity increases with increasing assortativity ρD faster than λ1 (M ) as shown in Figure 2.2a and Figure 2.2b, because v1 also increases with ρD . Since the smallest number of eigenvalues that play a role in the maximum modularity is c − 1, the “shifting-the-weights” principle also implies that the number of clusters c decreases with increasing assortativity.. 2.2. Fiedler’s Clustering. Among other community detection techniques, eigenvector partitioning raised in popularity due to the simplicity of its definition. The idea is that the values of the eigenvector components are close for nodes belonging to the same community, so one can use eigenvectors as coordinates to represent nodes as points in a metric space as exemplified in Figure 2.5. Thereofore, if we use k eigenvectors, one can embed the nodes in an k-dimensional space where the distance between the nodes are related to their clustering proximity. The roots of eigenvector partitioning 4 This observation is consistent with what we can deduce from the modularity bound (2.25) expressed in terms of the eigenvalue spacings: when the spacing is generally large, including an extra eigenvalue (i.e. c → c + 1 in (2.25) in Appendix 2.1.3), will cause a decrease in the last term N λc+1 (M ) of (2.25), that is hardly exceeded by the increase of the first sum, due to the large weight N in that last term and the condition (2.22)..

(75) 26. TOPOLOGICAL AND SPECTRAL PROPERTIES OF NETWORK METRICS. 2.2. date back to the early 70s [63], when Fiedler suggested that the second eigenvector of the Laplacian matrix separates the network into two communities having the fewest connections between them. Spectral partitioning can be applied recursively to find hierarchical graph partitions. These techniques attempts to partition a network by repeated bisections, as illustrated in Figure 2.5. For the purpose of illustration, we will show how the Fiedler partitioning algorithm works. We define the number of links R running between our two groups of nodes, also called the cut size, by R=. 1 2. X. Aij. i,j in different groups. where the factor 12 compensates for counting each link twice in the sum. Let us now define an index vector s such that ( +1 if node i belongs to cluster 1 si = −1 if node i belongs to cluster 2 then 1 (1 − si sj ) = 2. (. 1 if nodes i and j are in different clusters 0 if nodes i and j are in the same clusters. which allows us to redefine R in terms of si as follows R=. 1X 1X 1 (1 − si sj )aij = si sj (di δij − aij ) = sT Qs 4 i,j 4 i,j 4. where di is the degree of node i, δij is the Dirac delta, and Q is the Laplacian matrix corresponding to A. Our objective is now to choose a vector s so as to minimize the cut size R. The vector s can be expressed as a linear combination of the (normalized) eigenvectors xi of the Laplacian matrix as follows s=. XN i=1. xTi sxi. then R can be expressed as R=. X i. xTi sxTi Q. X j. xTj sxj Q =. X ij. xTi sxTj sµj δij =. X. 2. (xTi s) µi. (2.29). i. where µi is the eigenvalue of L corresponding to the eigenvector xi . Without loss of generality, we assume that µN ≤ µN −1 ≤ ... ≤ µ1 . If we ignore the trivial solution R = 0 provided by the eigenvector corresponding to the smallest eigenvalue µN , R is minimized by choosing s proportional to the second smallest eigenvector.