Rearranging Phylogenetic Networks

(1)

Delft University of Technology

Rearranging Phylogenetic Networks

Janssen, R. DOI 10.4233/uuid:1b713961-4e6d-4bb5-a7d0-37279084ee57 Publication date 2021 Document Version Final published version Citation (APA)

Janssen, R. (2021). Rearranging Phylogenetic Networks. https://doi.org/10.4233/uuid:1b713961-4e6d-4bb5-a7d0-37279084ee57

Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

(3)

Rearranging Phylogenetic Networks

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnicus Prof.dr.ir. T.H.J.J. van der Hagen, voorzitter van het College voor Promoties,

in het openbaar te verdedigen op woensdag 26 mei 2021 om 12:30 uur door

Remie Janssen

Master of Science in Mathematische Wetenschappen, Universiteit Utrecht, Nederland,

(4)

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnicus Prof.dr.ir. T.H.J.J. van der Hagen, voorzitter van het College voor Promoties,

in het openbaar te verdedigen op woensdag 26 mei 2021 om 12:30 uur door

Remie Janssen

Master of Science in Mathematische Wetenschappen, Universiteit Utrecht, Nederland,

(5)

Dit proefschrift is goedgekeurd door de promotoren. Samenstelling promotiecommissie:

Rector Magnicus, voorzitter

Prof.dr.ir. K.I. Aardal, Technische Universiteit Delft, promotor Dr.ir. L.J.J. van Iersel, Technische Universiteit Delft, copromotor Onafhankelijke leden:

Prof.dr. M. Fischer Universität Greifswald, Duitsland

Prof.dr. V. Moulton University of East Anglia, Verenigd Koninkrijk

Prof.dr.ir. M.J.T. Reinders Technische Universiteit Delft

Dr. S.M. Kelk Universiteit Maastricht

Prof.dr.ir. A.W. Heemink Technische Universiteit Delft, reservelid Overig lid:

Dr. M.E.L. Jones, Technische Universiteit Delft

Dit onderzoek is deels genancierd door de Nederlandse Organisatie voor Weten-schappelijk Onderzoek (Vidi-beurs 639.072.602).

Keywords: Graph theory, Mathematical biology, Phylogenetics, Rearrange-ment moves

Printed by: GVO drukkers & vormgevers Front & Back: Photo by P.A.M.E. Janssen,

Design by R. Janssen and S. Janssen

Copyright c 2021 by R. Janssen

ISBN 000-00-0000-000-0

An electronic version of this dissertation is available at http://repository.tudelft.nl/.

978-94-6332-758-9

(6)

Samenstelling promotiecommissie:

Rector Magnicus, voorzitter

Prof.dr.ir. K.I. Aardal, Technische Universiteit Delft, promotor Dr.ir. L.J.J. van Iersel, Technische Universiteit Delft, copromotor Onafhankelijke leden:

Prof.dr. M. Fischer Universität Greifswald, Duitsland

Prof.dr. V. Moulton University of East Anglia, Verenigd Koninkrijk

Prof.dr.ir. M.J.T. Reinders Technische Universiteit Delft

Dr. S.M. Kelk Universiteit Maastricht

Prof.dr.ir. A.W. Heemink Technische Universiteit Delft, reservelid Overig lid:

Dr. M.E.L. Jones, Technische Universiteit Delft

Dit onderzoek is deels genancierd door de Nederlandse Organisatie voor Weten-schappelijk Onderzoek (Vidi-beurs 639.072.602).

Keywords: Graph theory, Mathematical biology, Phylogenetics, Rearrange-ment moves

Printed by: GVO drukkers & vormgevers Front & Back: Photo by P.A.M.E. Janssen,

Design by R. Janssen and S. Janssen

Copyright c 2021 by R. Janssen

ISBN 000-00-0000-000-0

An electronic version of this dissertation is available at http://repository.tudelft.nl/.

Summary

Evolution plays an important role in biology, to such an extent that one of the best-known quotes about biology is Theodosius Dobzhansky's Nothing in biology makes sense except in the light of evolution.To study evolution, it is important to have a structured and standardized way to represent hypotheses about evolutionary histories. This is where phylogenetic networks come in. These provide a mathematical and graphical representation of an evolutionary history as a graph.

Finding the most accurate phylogenetic network given some genetic data gives rise to many computationally hard problems. So, one often has to resort to heuristics. An important part of many of these heuristics is a local search through the space of phylogenetic networks; the aim is to nd a good phyloge-netic network by taking small steps through this space. These steps correspond to small changes made to a network, which are called rearrangement moves.

There is currently no standard type of rearrangement move, and each piece of software denes their own set of moves. When such software is published, they often mention the types of rearrangement moves they use. However, they rarely justify their choice of moves, even though this choice can have large con-sequences for the functionality of the heuristic. For example, to guarantee that an optimal network can be found, each network must be reachable from each other network by taking small steps through the space. In this thesis we study such problems, which are all aimed at answering the following question. Which rearrangement moves can be used in local search heuristics?

To answer this question, we take a mathematical approach, where we use a graph to represent the space of phylogenetic networkswhich are graphs themselves as well. A graph is a collection of nodes (points) connected by edges (lines), and in this graph, each node represents a network, and there is an edge between two networks if there is a rearrangement move that changes the one into the other. The requirement for a good move we mentioned before (each network must be reachable from any other network) can then be stated compactly in graph theoretical language as follows: Is the space of phylogenetic networks connected under a certain rearrangement move?

(10)

rear-A.2.1 MCMC_GT . . . 229

A.2.2 InferNetwork_MP . . . 231

A.2.3 InferNetwork_ML and InferNetwork_MPL . . . 232

A.2.4 MCMC_SEQ . . . 233

A.3 BEAST 2.5 . . . 234

A.3.1 SpeciesNetwork . . . 234

A.3.2 BACTER . . . 234

A.3.3 CoalRe . . . 235

A.4 PhyloNetworks: SNaQ . . . 235

A.5 GTmix . . . 237

A.6 RF-Net . . . 239

B Open Problems 240 B.1 Gaps in this thesis . . . 242

B.1.1 Connectedness . . . 242

B.1.2 Diameter bounds . . . 242

B.1.3 Computational complexity . . . 244

B.1.4 Improved algorithms . . . 245

B.2 Alternative network denitions . . . 246

B.2.1 Extra structure . . . 247

B.2.2 Classes of networks . . . 248

B.3 Rearrangement moves in reconstruction . . . 248

B.3.1 Interaction with reconstruction methods . . . 249

B.3.2 Comparing networks . . . 250 Curriculum Vitæ 252 List of Publications 253 Bibliography 255 Symbol Index 274 Index 279

Summary

Evolution plays an important role in biology, to such an extent that one of the best-known quotes about biology is Theodosius Dobzhansky's Nothing in biology makes sense except in the light of evolution.To study evolution, it is important to have a structured and standardized way to represent hypotheses about evolutionary histories. This is where phylogenetic networks come in. These provide a mathematical and graphical representation of an evolutionary history as a graph.

Finding the most accurate phylogenetic network given some genetic data gives rise to many computationally hard problems. So, one often has to resort to heuristics. An important part of many of these heuristics is a local search through the space of phylogenetic networks; the aim is to nd a good phyloge-netic network by taking small steps through this space. These steps correspond to small changes made to a network, which are called rearrangement moves.

There is currently no standard type of rearrangement move, and each piece of software denes their own set of moves. When such software is published, they often mention the types of rearrangement moves they use. However, they rarely justify their choice of moves, even though this choice can have large con-sequences for the functionality of the heuristic. For example, to guarantee that an optimal network can be found, each network must be reachable from each other network by taking small steps through the space. In this thesis we study such problems, which are all aimed at answering the following question. Which rearrangement moves can be used in local search heuristics?

To answer this question, we take a mathematical approach, where we use a graph to represent the space of phylogenetic networkswhich are graphs themselves as well. A graph is a collection of nodes (points) connected by edges (lines), and in this graph, each node represents a network, and there is an edge between two networks if there is a rearrangement move that changes the one into the other. The requirement for a good move we mentioned before (each network must be reachable from any other network) can then be stated compactly in graph theoretical language as follows: Is the space of phylogenetic networks connected under a certain rearrangement move?

(11)

rear-Summary

rangement moves that are quite similar to moves that are used in practice. The general conclusion of this study is that most spaces are connected. And, as a result of the used techniques, we can additionally show that the number of steps between each pair of networks is relatively small compared to the number of networks. This is a nice property for the use of these rearrangement moves in local search heuristics, as it shows that an optimal network can (in principle) be found quickly if the right moves are chosen.

The computational hardness of the reconstruction problems unfortunately implies that choosing the right moves is hard as well. This also holds for another computational problem we study in this thesis: nding the shortest sequence of rearrangement moves between two networks. We show that several versions of this problem are NP-hard. This implies that, given two networks, there is no fast way to nd a rearrangement move that modies one network so that it becomes more like the other network.

Finally, in the discussion, we apply our results to published reconstruction software. As mentioned, most of these publications do not study their search spaces, so it needs to be checked that, at the very least, these search spaces are connected. As the moves used in the software are similar to the moves studied in this thesis, we can apply our results to the search spaces used in the software. Fortunately, we conclude that, with one exception, all these search spaces are connected. This solidies the theoretical basis of these methods, and justies their application to biology.

Samenvatting

Evolutie speelt een belangrijke rol in de biologie. Een van de bekendste uit-spraken over biologie zegt zelfs dat je evolutie nodig hebt om biologie te kunnen begrijpen: Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky. Daarom is het belangrijk dat we een gestructureerde en gestandaardiseerde manier hebben om hypotheses over evolutie weer te kun-nen geven. Dit is waar fylogenetische netwerken het toneel betreden: deze wiskundige structuren worden gebruikt als (grasche) representatie van mo-gelijke evolutionaire geschiedenissen.

Het reconstrueren van de echte evolutionaire geschiedenis komt dan neer op het vinden van het fylogenetische netwerk dat het beste bij de (genetische) data past. Dit geeft ons computationele problemen, die doorgaans moeilijk zijn om op te lossen; ze zijn vaak NP-moeilijk. Het is daarom vaak nodig om heuristieken te gebruiken. Een belangrijk onderdeel van deze heuristieken is een lokale zoektocht naar een goed netwerk: hiervoor beschouwen we de zoekruimte (alle mogelijke fylogenetische netwerken) als een graaf genaamd de ruimte van fylogenetische netwerken, en nemen we kleine stappen door deze ruimte. Deze stappen, die we herschikkingsstappen(rearrangement moves) noe-men, corresponderen met kleine veranderingen in een netwerk.

Er is momenteel geen gestandaardiseerde denitie voor deze herschikkings-stappen. Iedere softwaretool gebruikt zijn eigen denitie. Bij het publiceren van zulke tools wordt doorgaans geen aandacht besteed aan deze keuze, terwijl hij van grote invloed kan zijn op de werking van de geïmplementeerde heuristiek. Het kan bijvoorbeeld onmogelijk zijn om een netwerk in een ander netwerk te veranderen met een gegeven type herschikkingsstap. In dat geval kan het ook onmogelijk zijn om het beste netwerk te vinden gebruik makend van alleen dit type herschikkingstappen. Daarom bestuderen we in dit proefschrift een aantal herschikkingsstappen en de bijbehorende ruimtes van fylogenetische netwerken. We trachten in het bijzonder om de volgende vraag te beantwoorden.

Welke herschikkingsstappen zijn geschikt voor het gebruik in heuristieken? Om deze vraag te beantwoorden gebruiken we wiskundige technieken uit de grafentheorie. Een ruimte van fylogenetische netwerken is voor ons een graaf, waar iedere knoop een fylogenetisch netwerk voorstelt. De

(12)

herschikkingstap-rangement moves that are quite similar to moves that are used in practice. The general conclusion of this study is that most spaces are connected. And, as a result of the used techniques, we can additionally show that the number of steps between each pair of networks is relatively small compared to the number of networks. This is a nice property for the use of these rearrangement moves in local search heuristics, as it shows that an optimal network can (in principle) be found quickly if the right moves are chosen.

The computational hardness of the reconstruction problems unfortunately implies that choosing the right moves is hard as well. This also holds for another computational problem we study in this thesis: nding the shortest sequence of rearrangement moves between two networks. We show that several versions of this problem are NP-hard. This implies that, given two networks, there is no fast way to nd a rearrangement move that modies one network so that it becomes more like the other network.

Finally, in the discussion, we apply our results to published reconstruction software. As mentioned, most of these publications do not study their search spaces, so it needs to be checked that, at the very least, these search spaces are connected. As the moves used in the software are similar to the moves studied in this thesis, we can apply our results to the search spaces used in the software. Fortunately, we conclude that, with one exception, all these search spaces are connected. This solidies the theoretical basis of these methods, and justies their application to biology.

Samenvatting

Evolutie speelt een belangrijke rol in de biologie. Een van de bekendste uit-spraken over biologie zegt zelfs dat je evolutie nodig hebt om biologie te kunnen begrijpen: Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky. Daarom is het belangrijk dat we een gestructureerde en gestandaardiseerde manier hebben om hypotheses over evolutie weer te kun-nen geven. Dit is waar fylogenetische netwerken het toneel betreden: deze wiskundige structuren worden gebruikt als (grasche) representatie van mo-gelijke evolutionaire geschiedenissen.

Het reconstrueren van de echte evolutionaire geschiedenis komt dan neer op het vinden van het fylogenetische netwerk dat het beste bij de (genetische) data past. Dit geeft ons computationele problemen, die doorgaans moeilijk zijn om op te lossen; ze zijn vaak NP-moeilijk. Het is daarom vaak nodig om heuristieken te gebruiken. Een belangrijk onderdeel van deze heuristieken is een lokale zoektocht naar een goed netwerk: hiervoor beschouwen we de zoekruimte (alle mogelijke fylogenetische netwerken) als een graaf genaamd de ruimte van fylogenetische netwerken, en nemen we kleine stappen door deze ruimte. Deze stappen, die we herschikkingsstappen(rearrangement moves) noe-men, corresponderen met kleine veranderingen in een netwerk.

Er is momenteel geen gestandaardiseerde denitie voor deze herschikkings-stappen. Iedere softwaretool gebruikt zijn eigen denitie. Bij het publiceren van zulke tools wordt doorgaans geen aandacht besteed aan deze keuze, terwijl hij van grote invloed kan zijn op de werking van de geïmplementeerde heuristiek. Het kan bijvoorbeeld onmogelijk zijn om een netwerk in een ander netwerk te veranderen met een gegeven type herschikkingsstap. In dat geval kan het ook onmogelijk zijn om het beste netwerk te vinden gebruik makend van alleen dit type herschikkingstappen. Daarom bestuderen we in dit proefschrift een aantal herschikkingsstappen en de bijbehorende ruimtes van fylogenetische netwerken. We trachten in het bijzonder om de volgende vraag te beantwoorden.

Welke herschikkingsstappen zijn geschikt voor het gebruik in heuristieken? Om deze vraag te beantwoorden gebruiken we wiskundige technieken uit de grafentheorie. Een ruimte van fylogenetische netwerken is voor ons een graaf, waar iedere knoop een fylogenetisch netwerk voorstelt. De

(13)

herschikkingstap-Samenvatting

pen worden gecodeerd door de lijnen in deze graaf: er is een lijn tussen twee netwerken precies wanneer het ene netwerk in het andere kan worden veranderd in één herschikkingsstap. Zoals eerder genoemd, is het belangrijk om ons af te vragen of we met een bepaald type herschikkingsstap ieder netwerk in ieder ander netwerk kunnen veranderen. Deze vraag kan compact uitgedrukt wor-den in de taal van de grafentheorie: Is de ruimte van fylogenetische netwerken verbonden als graaf?

Het grootste deel van dit proefschrift is gericht op het beantwoorden van deze vraag voor verschillende herschikkingsstappen, die erg vergelijkbaar zijn met de herschikkingsstappen die in de praktijk gebruikt worden. Over het algemeen concluderen we dat de ruimtes van fylogenetische netwerken verbon-den zijn voor deze herschikkingsstappen. Daarbij zijn de technieken die we gebruiken om dit te bewijzen constructief. Dit betekent dat we daadwerkelijk een reeks herschikkingsstappen kunnen vinden tussen twee gegeven netwerken, en dat we de afstanden tussen netwerken kunnen afschatten. Deze afstanden blijken relatief klein ten opzichte van het aantal fylogenetische netwerken in een gegeven ruimte. Dit is een jne eigenschap in de praktijk, omdat het betekent dat het beste netwerk in principe altijd in een klein aantal stappen gevonden kan worden.

Helaas kunnen we niet makkelijk zo'n korte reeks stappen vinden. Dit is omdat het vinden van het beste netwerk vaak NP-moeilijk is. Een ander NP-moeilijk probleem is het vinden van de korste reeks stappen tussen twee netwerken. We bewijzen in dit proefschrift dat dit probleem daadwerkelijk NP-moeilijk is voor een aantal types herschikkingsstappen. Dit betekent dat, hoewel we een afstand tussen twee netwerken kunnen deniëren als het mini-male aantal stappen tussen deze netwerken, deze afstand niet gemakkelijk te berekenen is. We geven, op basis van onze bewijzen van verbondenheid, wel een aantal heuristieken voor het bepalen van deze afstanden. Het zal blijken dat deze heuristieken in veel gevallen een redelijk korte reeks stappen kunnen produceren.

Afsluitend, in de discussie, beschouwen we ruimtes van fylogenetische netwer-ken die voorkomen in gepubliceerde software tools. We gebruinetwer-ken daar onze resultaten om te controleren of aan de minimale eis voor een goede zoekruimte voldaan wordt, verbondenheid. Omdat we in dit proefschrift herschikkingstap-pen bestuderen die erg lijken op de herschikkingsstapherschikkingstap-pen in deze software tools, kunnen we dit gemakkelijk staven. Gelukkig kunnen we concluderen dat de meeste van deze zoekruimtes verbonden zijn, op een enkele na. Dit geeft een extra theoretische verantwoording van het gebruik van deze software. Dit proef-schrift versterkt dus de fundering van het biologisch onderzoek dat gebruik maakt van deze heuristieken voor fylogenetische netwerken.

(14)

pen worden gecodeerd door de lijnen in deze graaf: er is een lijn tussen twee netwerken precies wanneer het ene netwerk in het andere kan worden veranderd in één herschikkingsstap. Zoals eerder genoemd, is het belangrijk om ons af te vragen of we met een bepaald type herschikkingsstap ieder netwerk in ieder ander netwerk kunnen veranderen. Deze vraag kan compact uitgedrukt wor-den in de taal van de grafentheorie: Is de ruimte van fylogenetische netwerken verbonden als graaf?

Het grootste deel van dit proefschrift is gericht op het beantwoorden van deze vraag voor verschillende herschikkingsstappen, die erg vergelijkbaar zijn met de herschikkingsstappen die in de praktijk gebruikt worden. Over het algemeen concluderen we dat de ruimtes van fylogenetische netwerken verbon-den zijn voor deze herschikkingsstappen. Daarbij zijn de technieken die we gebruiken om dit te bewijzen constructief. Dit betekent dat we daadwerkelijk een reeks herschikkingsstappen kunnen vinden tussen twee gegeven netwerken, en dat we de afstanden tussen netwerken kunnen afschatten. Deze afstanden blijken relatief klein ten opzichte van het aantal fylogenetische netwerken in een gegeven ruimte. Dit is een jne eigenschap in de praktijk, omdat het betekent dat het beste netwerk in principe altijd in een klein aantal stappen gevonden kan worden.

Helaas kunnen we niet makkelijk zo'n korte reeks stappen vinden. Dit is omdat het vinden van het beste netwerk vaak NP-moeilijk is. Een ander NP-moeilijk probleem is het vinden van de korste reeks stappen tussen twee netwerken. We bewijzen in dit proefschrift dat dit probleem daadwerkelijk NP-moeilijk is voor een aantal types herschikkingsstappen. Dit betekent dat, hoewel we een afstand tussen twee netwerken kunnen deniëren als het mini-male aantal stappen tussen deze netwerken, deze afstand niet gemakkelijk te berekenen is. We geven, op basis van onze bewijzen van verbondenheid, wel een aantal heuristieken voor het bepalen van deze afstanden. Het zal blijken dat deze heuristieken in veel gevallen een redelijk korte reeks stappen kunnen produceren.

Afsluitend, in de discussie, beschouwen we ruimtes van fylogenetische netwer-ken die voorkomen in gepubliceerde software tools. We gebruinetwer-ken daar onze resultaten om te controleren of aan de minimale eis voor een goede zoekruimte voldaan wordt, verbondenheid. Omdat we in dit proefschrift herschikkingstap-pen bestuderen die erg lijken op de herschikkingsstapherschikkingstap-pen in deze software tools, kunnen we dit gemakkelijk staven. Gelukkig kunnen we concluderen dat de meeste van deze zoekruimtes verbonden zijn, op een enkele na. Dit geeft een extra theoretische verantwoording van het gebruik van deze software. Dit proef-schrift versterkt dus de fundering van het biologisch onderzoek dat gebruik maakt van deze heuristieken voor fylogenetische netwerken.

(15)

Chapter

(16)

Chapter

(17)

1. Introduction

1.1 What are phylogenetic networks?

Phylogenetic networks are a type of graph used in biology, to represent evolu-tionary history. The most common shape for these networks is a tree. Trees have a long history in biology. This starts with their use in taxonomy, where they became popular in the eighteenth century [Rag09], but examples from as early as 1592 exist as well [Zal40]. These trees had nothing to do with evolution, taxonomic trees simply represented a classication of (living) things.

One of the rst examples of evolutionary trees can be found in the book Philosophie Zoologiqueby Jean-Baptiste Lamarck in 1809. However, the most well-known early examples are by the hand of Charles Darwin, who laid the basis for the currently accepted theory of evolution. For a more complete overview of the history of trees in the representation of evolutionary history, see, for example, [Arc14].

Modern evolutionary trees, also called phylogenetic trees, show a branch-ing pattern that corresponds to the branchbranch-ing pattern of evolution caused by speciation. Such trees are often interpreted both as taxonomies and as phyloge-nies. This dual interpretation of a phylogeny as a taxonomy breaks down when additional non-vertical processes, such as hybridization [e.g. AAA+_13],

hori-zontal gene transfer (HGT) [e.g. ZD11, KGDO05, KP08], and recombination [e.g. VB15] are involved as well.

With such additions, evolutionary histories become reticulate (i.e., net-like), so they can no longer be represented by trees, but only by phylogenetic networks. In such networks, there is no clear hierarchical grouping of the taxa as in a tree. Hence, unlike a phylogenetic tree, a phylogenetic network cannot simply be read as a taxonomy, although some taxonomic information may still be extracted, for example by studying clusters [NW05, KNTX08, HRS10, Ste16]. The main use of phylogenetic networks is therefore as a representation of evolutionary history.

Phylogenetic trees and networks represent evolutionary histories by show-ing the ow of hereditary information. In biological applications, this is most often in the form of genetic information. There are also applications outside of biology such as in linguistics [e.g. Dun15, JL19, LS20] and other anthropo-logical topics like board games [e.g. Kra00, Car14, BSP+_{19] and archaeology}

[Pre19], where, for example, the evolution of tools is subjected to phylogenetic analysis [e.g. Hou12, OBB+_{14, WPR19]. In those cases, it is less clear which}

ow of information is represented in the network exactly, and these types of in-formation may not behave similar to genetic inin-formation, which makes accurate

1.1. What are phylogenetic networks?

a b

c

d e f

Figure 1.1: A phylogenetic network with six leaves (representing extant taxa) at the bottom, and the root (ancestral taxon) at the top. Edges are directed downwards, showing the passing of time. The red nodes are the three reticu-lations (i.e., reticulate evolutionary events), which make this network a tier-3 network.

reconstruction of these phylogenies challenging [Mor13, Str19]. Nevertheless, in all these cases, phylogenetic trees or networks are assumed to represent some kind of evolutionary history.

In its broadest mathematical sense, a phylogenetic network can be thought of as a leaf-labelled graph, usually without parallel edges and degree-2 nodes (Figure 1.1) [Mor11, HRS10]. The underlying graphs of the networks may be directed (and acyclic) or undirected. Between these, directed networks have the simplest interpretation as evolutionary histories (Figure 1.2). In a directed tree, the arcs represent periods of descent with modication, and the nodes represent speciation/divergence events. In a directed network, there is a third type of node, a reticulation node. Such a node represents the combination of hereditary information like in hybrid speciation.

Undirected networks often only represent genetic data, but, in some cases, they may be though of as the undirected version of a directed network, in which we simply ignore or are ignorant of the direction. These two types of networks are sometimes confused, leading to controversy: [FFRF20] uses a median joining network (MJN; a data displaying network) and reads it as an evolutionary history, as [SPKPS+_{20] point out. This paints a sucient, albeit}

strongly simplied, picture of the interpretation of phylogenetic networks as evolutionary histories, to which we will get back in the Section 8.2.1 of the Discussion.

(18)