Reconstructibility of unrooted level-k phylogenetic networks from distances

(1)

Reconstructibility of unrooted level-k phylogenetic networks from distances

van Iersel, Leo; Moulton, Vincent; Murakami, Yukihiro DOI

10.1016/j.aam.2020.102075 Publication date

2020

Document Version Final published version Published in

Advances in Applied Mathematics

Citation (APA)

van Iersel, L., Moulton, V., & Murakami, Y. (2020). Reconstructibility of unrooted level-k phylogenetic networks from distances. Advances in Applied Mathematics, 120, 1-30. [102075].

https://doi.org/10.1016/j.aam.2020.102075 Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

Contents lists available atScienceDirect

Advances

in

Applied

Mathematics

www.elsevier.com/locate/yaama

Reconstructibility

of

unrooted

level-k phylogenetic

networks

from

distances

Leo van Iersela_, _{Vincent Moulton}b_,_{Yukihiro Murakami}a,∗

a_Delft_Institute_of _Applied_Mathematics,_Delft_University_of_Technology,

Van MourikBroekmanweg6, 2628XE,Delft,theNetherlands

b_School_of_Computing_Sciences,_University_of_East_Anglia,_NR4_7TJ,_Norwich,

UnitedKingdom

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received30October2019 Receivedinrevisedform12June 2020 Accepted21June2020 Availableonlinexxxx MSC: 05 Keywords: Phylogeneticnetworks Level-k networks Distancematrix Reconstructibility

A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, whereone is asked to ﬁnd a phylogenetic network that in some way representsa givendistance matrix, whichgives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which k are unrooted level-k networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distancesisgiven.Weprovethatlevel-1 networksandlevel-2 networksarereconstructiblefromtheirshortestdistancesand multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than 1 are not reconstructiblefromshortestdistances andthatnetworksof levelhigherthan 2 arenotreconstructiblefromtheirmultisets ofdistances.

* Correspondingauthor.

E-mailaddresses:L.J.J.vanIersel@tudelft.nl(L. van Iersel),V.Moulton@uea.ac.uk(V. Moulton),

Y.Murakami@tudelft.nl(Y. Murakami).

https://doi.org/10.1016/j.aam.2020.102075

(3)

1. Introduction

Phylogenetic trees are often used to represent the evolutionary history of species,

or moregenerally, taxa [11]. Treescan be apowerful tool for elucidating relationships

between species, especiallyincase the speciesinquestion haveevolved only via

speci-ation events.However,other eventsoften also driveevolution, includinghybridisation,

introgression,andlateralgenetransfer.Whensuchreticulateeventsoccur,moregeneral

graphical structures,knownas phylogeneticnetworks [2,17] canbe auseful additionto

trees.

Therearetwomaintypesofphylogeneticnetworks:rootedandunrootednetworks.A

rootednetworkisadirectedacyclicgraphthatrepresentshowextanttaxahaveevolved

from asingle commonancestor, alsoknownas theroot. Internalvertices denote either

speciation or reticulate events, and edges have directions to indicate the transfer of

genetic material between the two vertices that are incident to it. Unrooted networks

have similar propertiesexceptthey havenodirectionon theedges. A lackof direction

could,forexample,representanambiguityinknowledgeofthedirectioninwhichgenetic

materialistransferredbetweenspecies.Notethateveryrootednetworkhasanunderlying

unrootednetwork,thatcanbeobtainedbysuppressingtherootvertexandignoringedge

directions.Conversely,onecantrytoobtainarootednetworkfromanunrootednetwork

by estimating the location of the root via an outgroup, if it is known which vertices

representreticulations [16].Inthispaperwewillonlyconsiderunrootednetworks,which

we shallcallnetworksforshort.WepresentanexampleofsuchanetworkinFig.1.

As the shiftfrom phylogenetic trees to networks has become moreprevalent in the

biological literature,ﬁnding goodways toconstruct phylogeneticnetworks hasbecome

a core theme in phylogenetics. Such an undertaking has experienced major

develop-mentsthroughvariousreconstructionapproaches(e.g.,maximum-likelihood [20];

build-ing blocks [18,23,19]; distance-based [7,5]; see [17] for an overview). In this paper we

consider thedistance-basedapproach,inwhichoneisgivenadistancematrixontheset

oftaxainquestionandthenaimstobuildanetworkrepresentingthismatrix.Anentry

inadistancematrixgivestheevolutionarydistance,ameasureofgeneticdivergence

be-tweendistincttaxa.Thisraisesthefollowingquestion.‘Isthereanetworkthatprecisely

representsthegivendistance matrix?’

The groundwork for distance-based methods is well-established for phylogenetic

trees [27,26,11,24]. For networks, the story is more complicated. Since networks can

contain cycles, there canbe more than onepath between two taxa, whichcan leadto

morethanonedistance.Thisresultsinvarioustypesofdistancesthatcanbeassociated

to a network. Two such types, which we cover in this paper, include the shortest

dis-tancesandthemultisetsofdistances.Fortheshortestdistances,wesearchforanetwork

in which the distance of ashortest path between each pair of taxa coincides with the

matrix; forthemultisets ofdistances,we searchforanetwork inwhich themultisetof

distances of all paths between each pair of taxa coincides with the matrix. In Fig. 1,

(4)

be worked out from the multisetsof distances by taking the smallestelement for each

matrixentry.

Beforeproceedinganyfurther,wemustacquaintourselveswithtwosimilar,yetsubtly

diﬀerent notionsthat arevital inunderstandingdistance-based methods for networks.

Onecaneitherconstruct or reconstruct networksfrom distance matrices. Constructing

a network means that we initially start with a distance matrix and come up with a

network that is consistent in some way with such a matrix. Some classical network

construction methods from distances include Neighbor-Net [7] and T-Rex [22]. In the

process,oneissometimesinterestedinﬁndinganetworkthatoptimises someparticular

criterion, such as the hybridisation number [9,10]. Networks obtainedvia construction

methodsareoftennon-unique,whichisthebiggestdistinctionbetweenconstructionand

reconstructionmethods.

Reconstructinganetworkmeansthatwestartwithanetwork,ﬁndthedistance

ma-trixthatisassociatedtoit(e.g., shortestdistances), andtrytoreconstructtheoriginal

network from the distance matrix. The goal then is to decide which networks canbe

uniquelyreconstructed from theirdistances, inother words, to decide uponthe

recon-structibility ofdiﬀerentclassesofnetworksfromtheirvariousdistancematrices.Themain resultsof [8,3,4,6,5,28,15] followthisexactformat;theyshowthatsomeunrooted/rooted

networks(orarepresentativeoftheequivalenceclass)canbereconstructedfromcertain

distancematrices. Roughlyspeaking,theyshowthatwithinaparticularnetwork class,

iftwo networkshave thesameparticular distance matrixthen thenetworksare

equiv-alent. Interestingly, although distance-based reconstruction results have been recently

developed for rooted networks, similar problems have been less studied for unrooted

networks.

As a ﬁrst step in this direction, we focus on reconstructing unweighted unrooted

networks. Every edge in the network has a weight of 1, which means that distances

between two taxa correspond to the number of edges contained in paths between two

taxa.Now,toidentifywhichnetworksarereconstructiblefromcertaindistancematrices,

wecall onthenotionof thelevelof anetwork. Thelevel ofanetworkis themaximum

number of edges that need to be deleted from a biconnected component to obtain a

tree [12]. In this paper we consider the problem of reconstructing level-k networks in

general,bothfrom theirshortestdistances andtheirmultisetsofdistances.

Arecent paperhas shownthatoptimalcactusgraphsare reconstructiblefrom their

shortestdistances,while ingeneralthere could be manycactus graphsthatrealise the

same shortest distances [15]. Cactus graphs are connected graphs in which each edge

belongs to at most one cycle – these graphs are a generalisation of level-1 networks.

Here, an optimal network refers to one that realises the shortest distance matrix, in

whichthetotalsumof edgeweights isminimal. Thediﬀerence betweenthisresult and

our result is that we consider unweighted networks, for which we may leave out the

optimality restriction. The problem of reconstructing cactus graphs has also been of

interest withinthe graphtheory literature. Some haveconsideredreconstructing them

(5)

diﬀerent from thedistancedata thatwe considerinthispaper. Therefore,ourproblem

of reconstructingnetworks fromdistances isfundamentallydiﬀerentfrom bothofthese

papers.

The rest ofthepaper is organisedas follows. Inthenext sectionwe introducebasic

deﬁnitions and notations. In Section 3, we show that in general, level-2 networks are

not reconstructible from their shortest distances (Lemma 3.1), and that networks of

level higher than 2 arenotreconstructible from theirshortest distancesnor from their

multisets of distances (Lemma 3.2). In Section 4, we show that level-1 networks as

well as level-2 networks on fewer than 4 leaves are reconstructible from their shortest

distances (Theorem 4.2 and Lemma 4.4). In Section5, we show that level-2 networks

arereconstructiblefromtheirmultisetsofdistances(Theorem5.1).Weconcludewitha

discussion inSection6onopenproblems andpossiblefuturedirectionsinthisarea.

2. Preliminaries

Deﬁnition 2.1.Let X be anon-emptyﬁnite set.An(unweighted unrooted binary

phylo-genetic) network N on X is asimplegraph(anunweighted, undirected graphwith no

loopsor multipleedges)with

1. |X| verticesofdegree-1(theleaves);and

2. allotherverticesofdegree-3 (theinternal vertices).

Theleaves arebijectivelylabelled bytheset X.If|X|= 1 thenwe deﬁnethesingleton

graph with one vertex labelled by the element of X as the network on X. A network

with nocyclesis a(phylogenetic) tree.

Deleting an edge uv from anetwork is theaction ofremoving theedge uv and

sup-pressinganydegree-2 verticesintheresultingsubgraph.Deletingavertex fromanetwork

istheactionofremovingthevertex,deletingallitsincidentedges,andsuppressingany

degree-2 vertices in the resulting subgraph. A cut-edge of a network is an edge whose

deletion disconnects thenetwork. Wecall acut-edge trivial if theedge isincident to a

leaf, and non-trivial otherwise. Note that for a network N on X, deleting acut-edge

breaks the network into two components. The leaf-set X can be partitioned into the

leaves that are contained in one component and the leaves that are contained in the

other; thereforeeverycut-edgeofanetworkinduces apartition X = Y ∪ Z of X (where

oneof Y or Z couldpossiblybeempty).Thesepartitionsarenotuniqueingeneral(i.e.,

two distinct cut-edgescan inducethe samepartition). Uponcutting anon-trivial

cut-edge,ifoneofthecomponentsisatree,thenwesaythatthesubgraphthatcorresponds

to this component is a pendant subtree. Given a cut-edge uv we say that aleaf x can

be reached from u via uv if, upon deleting the edge uv without suppressing degree-2

(6)

Abiconnectedcomponent(blob) ofanetworkisamaximal2-connectedsubgraphwith

at least three vertices. We say that a network is a level-k network if at most k edges

must be deletedfrom every blob to obtainatree. We saythata leafis contained ina

blobiftheneighbouroftheleafisavertex oftheblob.Acut-edgeisincident toablob

ifoneoftheendpoints of theedgeis avertexof theblob.A blobis pendant ifthere is

exactlyone non-trivialcut-edge that is incident to theblob. We saythat aleaf x can

be reached fromablob B viaacut-edge uv if u isavertex of B and x canbe reached from u via uv.

Let N be anetwork on X and let x and y be leaves in N . We recall the notation

usedin [5].Themultisetof distances between x and y,denoted d(x,y) (andsometimes as dN_(x,_{y) where} _necessary), _is _the_multiset _consisting _of _lengths_of _all _possible _paths

between x and y in N .Since N isanunweightednetwork,thelengthofapathissimply

thenumber of edgescontained inthe path. WeletD(N) denote the|X|× |X| matrix

whose (x,y)-thentryis d(x,y).Wefurtherdeﬁnetheshortestdistance between x and y,

denoted dm(x,y),bytaking dm(x,y)= min d(x,y).WeanalogouslydeﬁneDm(N ) tobe

the|X|× |X| matrixwhose (x,y)-thentryis dm(x,y).Anexampleofanetworkwithits

multisetsofdistancesisillustratedinFig.1.

Weusethefollowingnotationforthemultisets.Amultiset is atuple (A,m) where A

isasetand m isafunctionthatspeciﬁesthemultiplicityofeachelementin A.For x∈ A,/

welet m(x)= 0.We will, forthemost part,write (A,m) as A={am(a1)

1 ,. . . ,a

m(ak)

k }.

Let n be aninteger.We let A− n denote themultisetobtained bysubtracting n from

eachelementof A (i.e., A− n={(ai− n)m(ai): i∈ [k]}.)Given twomultisets (A,mA)

and (B,mB),thesum A+ B is deﬁnedasthemultiset (A∪ B,mA+B) where mA+B =

mA(x)+ mB(x) for x∈ A∪ B.

Anetwork N realises themultisetsofdistancesD if D(N)=D.Similarly,anetwork N

realisestheshortestdistancesDm ifDm(N )=Dm.Aswewillshowinthenextsection,

therecouldbe manydistinctnetworksthatrealisethesamedistance matrix.Therefore

weemphasise thefollowingnotion.

Deﬁnition2.2.Anetwork N isreconstructible fromitsmultisetsofdistances(respectively

theshortestdistances)if N istheonlynetworkthatrealisesD(N) (respectivelyDm(N )).

Wenowintroducetwosubstructuresofnetworks,thecherry andthechain,whichare

keyingredientsinprovingthemainresultsofthispaper.

Deﬁnition2.3.Twoleaves x and y formacherry iftheyshare acommonneighbour.

Observethat x and y formacherryifandonlyif d(x,y)={2}.Inaddition, x and y

formacherryifandonlyif dm(x,y)= 2.

Deﬁnition 2.4. A chain of length k ≥ 1 is a k-tuple of leaves (a1,. . . ,ak) such

(7)

a a a a a a a a a a a a a b c d e f a b c d e a {01_} _{31_{, 6}2_} _{41_{, 5}1_{, 6}1_{, 7}1_} _{51_{, 6}1_{, 7}1_{, 8}1_} _{51_{, 6}1_{, 7}1_{, 8}1_} b {01_} _{41_{, 5}1_{, 6}1_{, 7}1_} _{51_{, 6}1_{, 7}1_{, 8}1_} _{51_{, 6}1_{, 7}1_{, 8}1_} c {01_} _{52_{, 8}2_} _{52_{, 8}2_} d {01_} _{21_} e {01_}

Fig. 1. Alevel-2 networkwithitsmultisetsofdistances.Thenetworkcontainstwochains (a,b) and (c),and acherry{d,e}.Alledgesincidenttoleavesaretrivialcut-edges,andedge f istheonlycut-edgethatis non-trivial. Thedashedpathisthesideoftheblobthatcontainstheleaf c.Inthedistancematrix,the diagonalelementsare{0},andasthematrixissymmetric,manyoftheelementsareomitted.Theshortest distancematrixcanbeobtainedbytakingthesmallestelementineachmultisetstobetheelementofthe matrixinthesameposition.

Call achain (a1,. . . ,ak) maximal if there is nochain (b1,. . . ,b) such that{a1,. . . ,

ak} {b1,. . . ,b}. We assumeall chainsto be maximal, unless stated otherwise. Two

chains (a1,. . . ,ak) and (b1,. . . ,b) are adjacent if dm(ai,bj)= 4 forat least oneof i∈ {1,k} and j ∈ {1,}. Two chainsare adjacent twice if dm(a1,b1) = dm(ak,b) = 4 or

if dm(a1,b)= dm(ak,b1)= 4.

Given achain a= (a1,. . . ,ak),let pi denote theneighbour ofthe leaf ai for i∈ [k].

Theedges pipi+1 for i∈ [k − 1] arecalled theedgesof thechain. Wesaythatthechain

isincident tocut-edges iftheedgesofthechainarecut-edges.Observethatoneofthese edgesisacut-edgeifandonlyiftheyareallcut-edges.Wesaythatthechainiscontained in ablob B if theedgesof thechainareedgesin B.Observethatoneofthese edgesis

anedge of B ifand onlyiftheyarealledgesin B.

Note thataleaf canbe inbothacherryand achain.Inanetwork withoutcherries,

itis possibletopartitiontheleavesintochains.

Let B be alevel-2 blob of somenetwork N . We mayobtain the generator of B by

deleting allcut-edgesthatare incident to B andtaking thecomponentthatis B.The

edgesofthegeneratorof B arecalledthesides ofthegenerator,orsimplythesidesof B.

Let N beanetwork withno pendantsubtrees,let e beasideof B,and let x be aleaf

in N .Iftheneighbourof x,say p,subdivides e in N thenwesaythatx is ontheside e

or thattheside e contains x. We say thatachain a = (a1,. . . ,ak) is on the side e or

that theside e contains thechain a if everyleaf ai inthe chainis on theside e. If an

endpointofacut-edge uv subdivides e thenwesaythattheside e isincident to uv.

Foranoverviewofthedeﬁnitionspresented inthissection, seeFig.1.

3. Networksthatcannotbereconstructed

Inthissectionwegiveexamplesofnetworksthatcannotbereconstructedfrom their

shortest distancesor from theirmultisetsofdistances.Fig.2shows twodistinct level-2

networks with the same shortest distance matrix. Observing thatwe may replace the

leaves with the same label by the same pendant subtree to extend this example to a

(8)

a a a a a a a a a a a a a b c d a a a a a a a a a a a a c d a b

Fig. 2. Two level-2 networks with the same shortest distances between any pair of leaves.

a a a a a a a a a a a a _b a a a a a a a a a a a a a a b

Fig. 3. Twolevel-3 networksthat have thesame shortestdistancesand the samemultisets of distances betweenanypairofleaves.

Lemma3.1. Thereexisttwodistinctlevel-2 networkson n leavesfor n≥ 4 withthesame

shortestdistancematrix.

NotethatthenetworksinFig.2havediﬀerentmultisetsofdistances–weinvestigate

this furtherinSection5and showthere thatlevel-2 networks are reconstructiblefrom

theirmultisetsofdistances.

Fig. 3 presents two level-3 networks on 2 leaves that have the same multisets of

distances.Becausetheshortestdistance matrixcanbe obtainedby takingthesmallest

numberfor each element inthe multisets of distances,the two networks also havethe

sameshortestdistance matrix.Observethatthiscanbe generalizedtolevel-k networks

for k≥ 3 byreplacingthelevel-3 blobbyanarbitrarylevel-k blob.Inaddition,applying

thesamependantsubtreeargumentasinthelevel-2 networkcasegivesusthefollowing

lemma.

Lemma3.2. Thereexisttwodistinctlevel-k networksforall k≥ 3 withthesameshortest

distancematrix/multisetsof distances.

Therefore,networksoflevelhigherthan 1 arenotreconstructiblefromtheirshortest

distances ingeneral; networksof level higher than 2 arenotreconstructible from their

multisetsofdistancesingeneral.

4. Reconstructibilityfromshortestdistances

In this section we show that level-1 networks as well as level-2 networks on fewer

than 4 leaves are reconstructiblefrom theirshortest distances.We ﬁrst look at level-1

networks. Noting that pendant blobs contain exactly one chain, the following lemma

(9)

Lemma 4.1. Let (a1,. . . ,ak) be a chain of length k ≥ 2 in a level-1 network. Then (a1,. . . ,ak) iscontainedinapendantblobifandonlyif dm(a1,x)= dm(ak,x) for all x∈ X − {a1,. . . ,ak}.

Proof. Suppose ﬁrstthatachain (a1,. . . ,ak) iscontained inapendant blobB. Let p1

and pk denote the neighbours of a1 and ak respectively,and let q denote thecommon

neighbourof p1and pk.Let x∈ X −{a1,. . . ,ak}.Observethatanyshortestpathfrom x

to aleafcontainedin B mustpassthroughthevertex q.Thereforewehavethat

dm(a1, x) = 2 + dm(q, x) = dm(ak, x).

Toshowtheotherdirection,weprovethecontrapositive.Supposethat(a1,. . . ,ak) is

not containedinapendant blob.Theneitherthechain isincident tocut-edges, or the

chainiscontainedinanon-pendantblob.Let pi denotetheneighboursof ai for i∈ [k],

andlet q denotetheneighbourof p1thatisnot a1norp2.Supposeﬁrstthatthechainis

incidenttocut-edges.Let x bealeafinthenetworkthatisnotonthechain,suchthat x

is reachablefrom p1 via p1q.Theneverypathbetween x and ak mustpassthroughthe

vertices pi for i ∈ [k], and therefore dm(x,ak) = dm(x,a1)+ k− 1. Since k ≥ 2, the

equality inthestatementofthetheoremdoesnothold.

So nowconsider thecase thatthe chainiscontainedinanon-pendantblob.Then q

is not a neighbour of pk; the path between q and pk that does not contain the

ver-tices{p1,. . . ,pk−1} containsatleastthreevertices.Nowlet x bealeafnotonthechain

thatcanbereachedfrom q viaitsincidentnon-trivialcut-edge.Theshortestpathfrom x

to a1 andtheshortestpathfrom x to ak bothcontaintheshortestpathfrom x to q.By

observingthattheshortestpathfrom q to a1isshorterthantheshortestpathbetween q

and ak,itfollowsthat dm(x,a1)< dm(x,ak).Thereforetheequalityinthestatementof

thetheorem doesnothold.

Theorem 4.2.Level-1 networks arereconstructiblefrom theirshortest distances.

Proof. Firstweshowthatwecanrecognisecherries,reducethemandchangetheshortest

distances accordingly. Note that as mentioned above, a pair of leaves forms a cherry

precisely iftheir shortestdistance is 2. Ifthere exists acherry{x,y},we replace it by

a leaf z and set dm(z,a) := dm(x,a)− 1 for all a ∈ X − {x,y}. All other shortest

distances betweenleaf-pairs remain unchanged.After reconstructing thenetwork from

the modiﬁed distance matrix,we replace the leaf z by acherry on {x,y}. So, without

loss ofgenerality,weassumefrom nowonthatthere arenocherries.

Wenow consider thecasethatthere isexactlyoneblob.Since thereare nocherries,

allleavesarecontainedinthisblob.Wecanrecognise thisbyseeingthatthereisachain

(a1,. . . ,ak) of length k≥ 3 that satisﬁes dm(a1,ak)= 3. Thisimmediately shows how

to reconstructlevel-1 networks that contain exactlyone blob.Hence, we assume from

(10)

Note that pendant blobs must contain a chain of length at least 2 since networks

do not contain parallel edges. By Lemma 4.1, we can ﬁnd chains on pendant blobs.

Wereduceachain (a1,. . . ,ak) contained inapendantblob byreplacing theblob bya

leaf z and setting dm(x,z) := dm(x,a1)− 2 for all x ∈ X − {a1,. . . ,ak}. All shortest

distances between other leaf-pairs remain unchanged, since their paths do not travel

throughpendantblobs.It isagaineasytoreconstructtheblobafter reconstructingthe

reducednetwork,sinceweknowthat(a1,. . . ,ak) mustformachainontheblob,inthat

order.

Thisﬁnishestheproofofthetheoremsinceanylevel-1networkhasacherry,apendant

blob,or exactlyoneblob.

WenotethattherestrictionofTheorem4.2tonetworkswithouttrianglesalsofollows

from Theorem 5 of [15]. Wegive the proof above to account for the triangle case and

togiveamoredirectgraph-theoreticalproofthatisindependentoftheresultsprovided

byHayamizuetal. Observethattrees(level-0 networks)arealsolevel-1 networks.Thus

Theorem4.2givesthefollowingcorollary,whichweincludehereforcompleteness. This

isaclassicalresultthatwasprovenin [14].

Corollary4.3. Treesare reconstructiblefromtheir shortestdistances.

Next,we show that level-2 networkson fewer than 4 leaves are also reconstructible

fromtheirshortestdistances.

Lemma 4.4. Level-2 networks on X for|X| ≤ 3 are reconstructible from their shortest

distances.

Proof. Therecan only be one network ona single taxon, namely the singleton graph.

Suchagraphistriviallyreconstructiblefromitsshortestdistances.Sosupposethat|X|=

2,say X ={x,y},andlet N beanetworkon X.Below,wewillprovetheclaimthat N

consistsonlyoflevel-2 blobs,whereeachlevel-2 blobisincidenttoexactlytwocut-edges.

Inparticular, N containsatmosttwopendantblobs,oneofwhichcontainstheneighbour

of x and the other theneighbour of y. Since eachadditional level-2 blobincreases the

shortestdistancebetween x and y by 3,itfollowsthat dm(x,y)= 3k + 1 where k denotes

thenumberoflevel-2 blobs in N .From there,itfollows that N isreconstructiblefrom

itsshortestdistances.

Wenowprovetheclaim.Note ﬁrstthateveryblobin N mustbe incidenttoexactly

twocut-edges.Ablobcannotbeincidenttoonlyonecut-edge.Iftheblobislevel-1 then

thiswouldimplythatitcontainsaloop;iftheblobislevel-2 thenthiswouldimplythat

itcontainsparalleledges.Thisalsoimpliesthateverypendantblobmustbeincidentto

at leastone trivialcut-edge. Onthe other handifablob is incident to morethantwo

cut-edges, say c cut-edges, then this would imply thatthe network contains at least c

(11)

a a a a x y z a a a a a a x y z a a a a a a a a x y z a a a a a a a a x y z (a) (b) (c) (d)

Fig. 4. Thefourpossible degree-3 vertices intheblob-tree ofalevel-2 network onthree leaves {x,y,z}.

(a)Aninternalvertex.(b)Alevel-1 blob.(c)Alevel-2 blobwithallleavesreachablefromdiﬀerentsides ofthe blob.(d) Alevel-2 blobwhere y and z arereachablefromthesamesideofthe blob.Thedashed linescanbereplacedbypathsthatcontainanynumberoflevel-2 blobs.Thisispossiblebecausewetake thedistancesmodulo 3 andsinceeachadditionallevel-2 blobcontributesanextralength-3 totheshortest inter-taxadistance.

this implies that the network contains at least c > 2 leaves, which is a contradiction.

Thereforeeveryblobin N mustbeincidentto exactlytwocut-edges.Now observethat

alevel-1 blobthatisincidenttoexactlytwocut-edgescontainsparalleledges.Itfollows

that everyblob in N must be alevel-2 blob thatis incident to exactlytwo cut-edges.

Thisproves theclaim,fromwhichitfollowsbytheargumentpresentedabovethat N is

reconstructiblefrom itsshortestdistancesfor|X|= 2.

Suppose now that |X| = 3, and let X = {x,y,z}. Here we consider BT (N ), the

blob-tree of N ,whichisobtainedfrom N byreplacingeachblobof N byasinglevertex.

Since|X|= 3, BT (N ) containsexactlyonevertexofdegree-3,threeverticesofdegree-1

(which are theleaves x,y, and z), and all other vertices are of degree-2. Bya similar

argument as presented inthe |X|= 2 case, thedegree-2 vertices of BT (N ) correspond

tolevel-2 blobs.Thedegree-3 vertexcouldbeaninternalvertexofthenetwork,alevel-1

blob, or alevel-2 blob. In thecase thatit is alevel-2 blob, there aretwo possibilities.

Eitherthethreeedgesareincidenttodiﬀerentsidesoftheblob,ortwoedgesareincident

to thesamesideoftheblob andthethirdedgetoanotherside.SeeFig.4forthesefour

possibilities.Observethatthesefourpossibilitiesallcontributediﬀerentdistancelengths

to inter-taxadistances.Inparticular, wehavethatthedegree-3 vertexisa(an)

• internalvertex ifandonlyif

(d(x, y), d(y, z), d(x, z)) = (2(mod3), 2(mod3), 2(mod3));

• level-1 blobifandonlyif

• level-2 blobwithalledgesondiﬀerentsidesifandonlyif

• level-2 blobwiththetwo edges thatleadtoleaves x and y onthe samesideifand

(12)

(d(x, y), d(y, z), d(x, z)) = (0(mod3), 1(mod3), 1(mod3)).

Thereforewemayidentifytheblobcorrespondingtothedegree-3 vertexoftheblob-tree

bytakingthedistancesmodulo 3.

To ﬁnish the proof, take two networks N,N with the same shortest distance

ma-trix. By the previous paragraph, we may assume that N and N have the same blob

corresponding to the degree-3 vertex of their blob-trees. Assume that N = N. Then

the two blob-trees BT (N ) and BT (N) are diﬀerent. Note that the shortest distances

are determined by the number of degree-2 vertices between leaves in the blob-tree.

Since Dm(N ) = Dm(N), we have that the number of degree-2 vertices between two

leaves, say x and y, is the same in both BT (N ) and BT (N). However since BT (N )

diﬀersfrom BT (N),the positioningofthe degree-3 vertexmustdiﬀer.Butthis would

imply that upon placing z together with some degree-2 vertices, we can only satisfy

one of dN

m(x,z) = dN

m(x,z) or dNm(y,z) = dN

m(y,z). This contradicts the assumption

that Dm(N ) = Dm(N). Therefore we must have N = N, and that level-2 networks

on X for|X|= 3 are reconstructiblefromtheirshortestdistances.

5. Reconstructibilityoflevel-2 networksfromtheirmultisets ofdistances

Inthelasttwosections,weshowedthatlevel-1 networksarereconstructiblefromtheir

shortestdistances,level-k networksfor k≥ 2 areingeneralnotreconstructiblefromtheir

shortestdistances,andlevel-k networksfor k≥ 3 areingeneralnotreconstructiblefrom

theirmultisetsofdistances.Inthissection,weinvestigatetheremainingcase,andshow

that level-2 networks are reconstructible from their multisets of distances. The main

theoremisthefollowing.

Theorem5.1. Level-2networks are reconstructiblefromtheir multisetsof distances.

Thekeyideasinprovingthetheorem areas follows. Weﬁrstidentifyandreduceall

cherriesofthe network.To identifycherriesweobserve thattwo leaves x and y forma

cherryifandonlyif d(x,y)={2}.Toreducecherrieswereplaceitbyanewleaf z and

adjustthedistance matrixaccordingly,as doneforthe level-1 networksintheproof of

Theorem 4.2. Next, weidentify allleaves thatare notcontained inblobs, delete those

leaves,andadjustthedistancematrixaccordingly.Weshowthateachleafthatisdeleted

in this manner can be reattached to the reduced network in a unique fashion. After

applyingthesetwo reductions,two chainsareadjacentifandonlyiftheyarecontained

in the same blob. Using this observation, we then show that it is possible to identify

pendantblobs, replacethembyanewleaf,andadjustthedistancematrixaccordingly.

Continuinginthisfashion,weeventuallyreachthesituationwhenthereducednetwork

contains exactly one blob. We show that networks on single blobs are reconstructible

from their multisets of distances, at which point it follows that simply reversing the

(13)

Westartwiththetwoeasycases,whenthenetworkcontainsacherryorasingleblob.

Observation 5.2.Let N be a level-2 network on X and suppose that leaves x and y

form a cherry in N . Upon replacing the cherry by a leaf z, we obtain a network N

on X = X ∪ {z}− {x,y} such that the multisets of distances for N contains the elements

dN(a, b) =

dN_{(a, b)} _{if a, b}_{∈ X − {x, y}}

dN_{(a, x)}_{− 1 if a ∈ X − {x, y} and b = z.}

Onemayobtain N from N byreplacingtheleaf z by acherry{x,y}.

Lemma5.3. Level-2 networkscontainingasingleblobarereconstructiblefromtheir

short-est distances.

Proof. Let N be a level-2 network containing a single blob. Assume without loss of

generality that N contains no cherries, as we can recognise them from the shortest

distances and reduce them by Observation 5.2. If N is a level-1 blob then we may

reconstruct it from shortest distances by Theorem 4.2. If N is alevel-2 blob then the

blob must contain atleast two chainssinceit hasno paralleledges, and at mostthree

chains. Notingthatchainscanbe identiﬁed from theshortestdistances,theplacement

of the chainson the blob sidescan be done bymatching the end-leaves of chainsthat

haveshortestdistance 4.

5.1. Leavesnot containedinblobs

Lemma5.4. Let N bealevel-2 networkon X where|X|≥ 3.Aleaf x isnotcontainedin

ablob ifandonlyif thereexistsauniquepartition Y ∪ Z of X − {x} suchthat Y,Z= ∅ and dm(y,z)= dm(x,y)+ dm(x,z)− 2 for all y∈ Y and z ∈ Z.

Proof. Supposeﬁrstthataleaf x isnotcontainedinablob.Let pxdenotetheneighbour

of x,and let p,q denotethetwo neighboursof px thatisnot x.Observethateveryleaf

in X−{x} canbereachedfrom pxviaoneofthecut-edges pxp or pxq.Let Y and Z denote

thesetofallleavesthatcanbereachedfrom pxviathecut-edge pxp and pxq,respectively.

Note that a shortest path between some y ∈ Y and some z ∈ Z passes through the

edges pxp and pxq.Thenbyobservingthattheshortestpathfrom x to y andtheshortest

path from x to z uses the sameedges as theshortest path from y to z, barthe useof

theedge incidentto x twice,weobtaintheequation dm(y,z)= dm(x,y)+ dm(x,z)− 2

forall y∈ Y and z ∈ Z.

We now show that such apartition is unique.We claim that allleaves that canbe

reached from px viathe edge pxp mustbe contained inthe same set in the partition.

(14)

edge pxp, and suppose for acontradiction thatthey are placedin diﬀerent sets of the

partition.Then,

dm(x, y1) + dm(x, y2)− 2 = dm(p, y1) + dm(p, y2) + 2

> dm(p, y1) + dm(p, y2)

≥ dm(y1, y2),

wheretheﬁnalinequalityisthetriangleinequality.Hence y1and y2mustbecontainedin

thesamesetofthepartition;since y1and y2werechosenarbitrarily,allleavesthatcan

bereachedfrom px viatheedge pxp mustbecontainedinthesamesetinthepartition.

Similarly,allleavesthatcanbe reachedfrom px viatheedge pxq mustbecontainedin

thesamesetinthepartition.Observethatallleavesin X− {x} canbereachedfrom px

viatheedge pxp orviatheedge pxq.Sinceneithersetsofthepartitioncanbeempty,it

followsthenthatthepartitionmustbeunique,with Y and Z containingallleavesthat

canbe reachedfrom px via pxp and pxq, respectively.

Toprovetheother direction,weshowthatifaleaf x iscontained inablob B, then

thereisnosuchpartitionthatsatisﬁesthegiven equation.Let px denotetheneighbour

of x.Weﬁrstshowthatforleaves y,z∈ X − {x},ifallshortestpathsbetween y and z

donotcontainthevertex px,thentheequationisnotsatisﬁedby y and z.Let py and pz

denotetheverticeson B thatareclosesttotheleaves y and z respectively.Notethatit

ispossible to have py = pz –this is thecasewhere all shortestpaths between y and z

donotpassthrough B.Thenthefollowingequations hold:

dm(x, y) = 1 + dm(px, py) + dm(py, y) dm(x, z) = 1 + dm(px, pz) + dm(pz, z).

Wenowdistinguishtwocases.

1. If py= pz,thenbythetriangleinequalityandasallshortestpathsbetween y and z

donotcontainthevertex px,wemusthavethat

dm(py, pz) < dm(px, py) + dm(px, pz). (1) It followsthat dm(y, z) = dm(y, py) + dm(py, pz) + dm(pz, z) = (dm(x, y)− dm(px, py)− 1) + dm(py, pz) + (dm(x, z)− dm(px, pz)− 1) = dm(x, y) + dm(x, z)− 2 + dm(py, pz) − (dm(px, py) + dm(px, pz))

(15)

< dm(x, y) + dm(x, z)− 2,

wheretheﬁnalinequalityfollowsfrom Inequality (1).

2. If py = pz,thenlet p denotetheneighbourof py thatisnotontheblob B.Then

dm(y, z)≤ dm(y, py) + dm(z, py)− 2dm(py, p)

= (dm(x, y)− dm(px, py)− 1) + (dm(x, z)− dm(px, py)− 1) − 2

= dm(x, y) + dm(x, z)− 2 − 2dm(px, py)− 2 < dm(x, y) + dm(x, z)− 2,

where theﬁrst inequalityfollows sincethe shortestpath between y and z may not

pass through p (e.g., if p is a vertex on a blob), and the ﬁnal inequality follows

as dm(px,py)≥ 1 and dm(py,p)= 1.

It remainstoshowthatforany partition Y ∪ Z of X − {x} where Y,Z = ∅,thereexists

aleafpair y∈ Y and z ∈ Z suchthatnoshortestpathbetween y and z uses px.

SupposeﬁrstthatB isalevel-1 blob.Sinceournetworkcontainsnoparalleledges, B

must be incident to at least two cut-edges in addition to the edge pxx. If two leaves

that can be reached from B via the same cut-edge are placed in diﬀerent sets of the

partition, thenwe aredoneas noshortest pathbetween theseleaves uses px;therefore

wemayassumethatleavesthatcanbereachedfrom B viathesamecut-edgeareplaced

in thesame set inthe partition. Since Y and Z are both non-empty, there must exist

twocut-edges e1,e2(excluding pxx)whoseendpointsformanedgeof B,suchthatthere

exists aleaf thatcanbe reachedfrom B via e1 and aleafthatcanbe reached from B

via e2 forwhichthetwo leaveslieindiﬀerent setsofthepartition.Everyshortestpath

betweenthesetwoleavespassesthroughtheedgeconnectingtheendpointsof e1 and e2

and thereforedoesnotuse px.Thereforeweare done.

Now supposethat B isalevel-2 blob.Forthesamereasonasinthelevel-1case(see

proof ofTheorem4.2), ifthere aretwoleavesthatcanbe reachedfrom B viathesame

cut-edgethatareplacedindiﬀerentsetsofthepartition,thenwearedone;thereforewe

mayassumethatleavesthatcanbereachedfrom B viathesamecut-edgeareplacedin

the sameset inthepartition.Since Y and Z are both non-empty,itfollows thatthere

exist two cut-edges e1,e2 incident to B, such that leaves y,z can be reached from B

via e1,e2, respectively, for which y ∈ Y and z ∈ Z. There must exist a pair of such

cut-edges suchthatallshortestpathsbetweentheirendpoints on B donotcontain px,

sincethereexistenoughcut-edgestoensuretherearenoparalleledgesin B.Givensuch

apairofcut-edges, takeoneleafthatcanbe reachedfrom B viatheﬁrstcut-edgeand

takeanother leafthatcanbe reachedfrom B viatheothercut-edge. Thenno shortest

path betweenthis pairofleavesuses px,andthuswearedone.

Lemma5.4doesnothold ingeneralfornetworksof levelhigherthan 2.Anexample

(16)

a a a a a a a a a a x y z

Fig. 5. Alevel-3 networkon X ={x,y,z} whereallofitsleavesarecontainedinablob.Y ={y} and Z = {z}

isapartitionof X− {x} suchthat Y,Z= ∅ and dm(y,z)= dm(x,y)+ dm(x,z)− 2 forall y∈ Y and z ∈ Z.

Observethatthisholdsin generalforlevel-k networkswhere k≥ 3 by replacingthe level-3 blobbyan arbitrarylevel-k blob.

Wenowshowthatafteridentifyingaleafthatisnotcontainedinablob,wecandelete

itfromthenetworkandadjustthedistancematrixaccordingly.Wealsoshowthatupon

reconstructingthereducednetworkfromthemodiﬁeddistancematrix,thereisaunique

cut-edgeto which we mayreattach thedeletedleaf. Reattaching a leaf x to acut-edge

istheactionofsubdividingthecut-edgebyavertex px,andaddinganedge pxx.Inthe

settingofLemma5.4,wesaythattheuniquepartition Y ∪ Z isinduced bytheleaf x.

Lemma5.5. Let N bea level-2 networkon X where |X|≥ 3, and let x bealeaf that is

notcontained inablob. Let Y ∪ Z denotetheunique partitionof X= X− {x} that is inducedby x.Thenupondeletingtheleaf x,weobtainanetwork N on X suchthatthe multisetsof distancesfor N containstheelements

dN(y, z) =

dN_{(y, z)} _{if y, z}_{∈ Y or y, z ∈ Z}

dN(y, z)− 1 if y ∈ Y, z ∈ Z or z ∈ Y, y ∈ Z.

Inaddition,thereisonly oneedgelocationin N where x canbereattached to,toobtain a network with the same multisets of distances as N . In particular, this network is isomorphic to N .

Proof. Let pxbetheneighbourof x in N ,andlet p and q betheotherneighboursof px

thatarenot x.AsshownintheproofofLemma5.4,thesets Y and Z correspondtothe

leaves thatcanbe reachedfrom px via pxp and via pxq, respectively.Upon deleting x

from N ,wenotethat pxbecomesavertexofdegree-2 andisthereforesuppressedinthe

resultingsubgraph. Thenallpaths in N thatused theedge pxp and theedge pxq have

theirlengthdecreasedby 1 in N;allpathsin N thatdidnotusetheedges pxp and pxq

are unaﬀected by this vertex suppression. Observethat any path between aleaf in Y

andaleafin Z usestheedges pxp,pxq in N .Furthermore,anypathbetweentwoleaves

in Y oranypathbetweentwoleavesin Z didnotusetheedges pxp,pxq in N .Therefore

themultisetsofdistancesof N canbe obtainedfromthemultisetsofdistancesof N as

(17)

Wenowprovethesecond statement,namelythat N containsonlyoneedgewhere x

canbereattachedto,soastoobtainanetworkwiththesamemultisetsofdistancesas N .

ByLemma5.4,we knowthat x is notinablob,andthat x induces apartition Y ∪ Z

of X. This implies that x must be reattached to N at a cut-edge that induces the

partition Y ∪ Z. Wenow show thatthere is only onesuch cut-edgein N ifwe are to

obtainanetworkwiththesamemultisetsofdistancesas N uponreattaching x.Ifthere

aretwocut-edges e1,e2in Nthatinducethesamerequiredpartition Y∪Z,observethat

anypathfrom e1to e2mustconsistonlyoflevel-2 blobsthatareincidenttoexactlytwo

cut-edges.Notethatlevel-1 blobscannotbeincludedhereasotherwisewewouldproduce

parallel edges.Now takeany leaf y∈ X − {x}, andlet N1 and N2 denotethenetworks

obtainedbyattaching x to e1and e2respectively.Becauseofthelevel-2 blobsbetween e1

and e2,wehavethat dNm1(x,y)= dmN2(x,y).Butweknowthatthere mustexistone

cut-edge e in N towhichwecanattach x toobtain N .Welocatethisedge e byﬁndingone

that inducesthe correctpartition andsatisﬁes theequation dNe

m(x,y)= dNm(x,y).This

proves theclaimthat x canbe addedbackto N viaauniqueedgetoobtainanetwork

with thesamemultisetsof distancesas N . Sincethereisauniqueedgewhere x canbe

attached toinorderto obtainanetworkwith thesamemultisetsofdistances as N ,the

network obtainedthis waymustbe isomorphicto N .

5.2. Pendantblobs

Fortheremainderofthissection,wewillrestricttolevel-2 networkswithatleasttwo

blobs andinwhichallleavesarecontainedinblobs.Wecandothis byObservation5.2

and Lemmas5.3,5.4,and5.5.

5.2.1. Pendant level-1blobs

Lemma5.6. Let N bealevel-2 networkon X.Achain (a1,. . . ,ak) with k≥ 2 iscontained in apendantlevel-1 blob ifandonly if d(a1,ak)={41,(k + 1)1}.

Proof. Supposeﬁrstthatachain (a1,. . . ,ak) with k≥ 2 iscontainedinapendantlevel-1

blob B. As there is only one non-trivial cut-edgeincident to B, this chain is theonly

chainthatiscontainedin B.Itisthenclearthat,wemusthave d(a1,ak)={41,(k +1)1}.

Now suppose thatthere exists achain (a1,. . . ,ak) with k≥ 2 such that d(a1,ak)=

{41_,_{(k + 1)}1_}. _Clearly _the _{distance k + 1 corresponds} _to _the _path _{between a}

1 and ak

that passes through the neighbours of ai for i ∈ [k]. Therefore we examine the path

between a1 and ak thatdoes not pass throughthe neighbours of ai+1 for i ∈ [k − 2].

Noteﬁrstthatthechaincannotbecontainedinanon-pendantlevel-1 blob,asotherwise

thispathbetween a1and akwouldpassthroughatleasttwoverticesthatareincidentto

non-trivialcut-edges.Inthiscase,thelengthofthepathbetween a1and ak wouldbeat

(18)

otherwisetheset d(a1,ak) wouldcontainat least 3 elements.Therefore thechainmust

becontainedinapendant level-1 blob.

Lemma 5.7. Let N be a level-2 network on X in which (a1,. . . ,ak) is a chain that is

containedinapendantlevel-1 blob.Let N bethenetworkon X= X∪{z}−{a1,. . . ,ak} obtained from N by replacing thependant blob by aleaf z. Forevery x∈ X− {z}, we can uniquely partition the multiset of distances dN_(x,_a

1) into two equal sized sets A

and B suchthat A− 2= B− (k + 1).Thenthemultisetsofdistancesof N containsthe elements

dN(x, y) =

dN_{(x, y)} _{if x, y}_{∈ X}_{− {z}}

A− 2 if y = z.

Proof. Weﬁrstprovetheclaim thatfor every x∈ X− {z},wecanuniquelypartition

themultiset ofdistances d(x,a1) into twoequalsized sets A and B such that A− 2=

B− (k + 1).Asusual,let pi denotetheneighboursof ai for i∈ [k],andlet q denotethe

neighbour of p1 thatis not a1 nor p2. Note that k ≥ 2 sinceotherwise there wouldbe

paralleledges.Let x∈ X.Thenanypathfrom x to a1consistsofapathfrom x to q anda

pathfrom q to a1.Therearetwopossiblepathsfrom q to a1:oneisoflength 2 andusesthe

edges qp1,p1a1;theotherisoflength k +1 andusestheedges qpk,pkpk−1,. . . ,p2p1,p1a1.

Therefore every path from x to q yieldstwo paths from x to a1, for which one of the

paths is longer than the other by a length of k− 1. This implies that the size of the

multiset d(x,a1) iseven,sinceeverypathfrom x to a1 canbematched toanotherpath

from x to a1thatsharesthesamepartofthepathbetween x and q.Nowtakethesmallest

element d∈ d(x,a1).Bytheargumentpresentedabove,theremustexistacorresponding

element d+ k− 1∈ d(x,a1).Weplace d inset A andweplace d+ k− 1 inset B,remove

both elements from d(x,a1) and recurse. By continuing this for the smallest element

in d(x,a1) at eachstep,this partitionsthemultisetinto abipartition d(x,a1)= A∪ B

where |A| = |B| = d(x,a1)/2, such that A+ (k− 1) = B. It follows from iteratively

adding the smallest element from d(x,a1) to A, that this bipartition is unique. This

provestheclaim.

Toprovethesecondpartofthelemma,ﬁrstobservethatanypathbetweenaleaf x∈

X− {z} and z inthenetwork N correspondstoapathbetween x and q in N .Nowthe

multisetof distances between x and q in N canbe obtained byﬁnding themultiset of

distances between x and a1 thatused theedges qp1,p1a1,and subtracting2from each

element.Thisispreciselytheset A− 2 thatwehavefoundabove.Foranyotherleaf y∈

X − {z}, we have that all paths between x and y are unaﬀected by the replacement

of the blob by z, as the blob is pendant in N . Therefore d(x,y) remains unchanged

for x,y∈ X− {z}.

Itisagaineasytoreconstructtheblobafterreconstructingthereducednetwork,since

(19)

a a a a a a a a a a a1 · · · ak f a a a a a a a a a a a a a a a a a a1 · · · ak b1 · · · b f

Fig. 6. Apendantlevel-2 bloboftheform (k,0,0,0) containingthechain (a1,. . . ,ak) (left)andapendant level-2 bloboftheform (k,,0,0) containingthechains (a1,. . . ,ak) and (b1,. . . ,b).Theedgeslabelled f denotethenon-trivialcut-edgesinbothnetworks.

5.2.2. Pendant level-2blobs

Weadoptthefollowingnotationforpendantlevel-2 blobs.Let B beapendantlevel-2

blob,andlet a,b,c,d denotethefourchainscontainedin B oflengths k,,m,n≥ 0 such

thatchains c and d areon thesameside ofB asthenon-trivialcut-edge. Thenwesay

that B isoftheform (k,,m,n).Foreaseofnotation,asidewithoutleavesisseenasa

length-0chain.SeeFig.6forpendantlevel-2 blobsoftheforms (k,0,0,0) and (k,,0,0).

Lemma 5.8. A level-2 network N containsa pendant level-2 blob of theform (k,0,0,0)

for k≥ 2 withthechain (a1,. . . ,ak) if andonlyif d(a1,ak)={51,61,(k + 1)1}.

Proof. Suppose ﬁrst that N contains apendant level-2 blob B of theform (k,0,0,0).

Let e denotethenon-trivialcut-edgethatisincidentto B.Thenthepathfrom a1to ak

thatuses thesideof B without e andwithoutthechain, thesideof B with e, and the

side of B withthechainareof distances 5,6,and k + 1 respectively.

Supposenowthatthereexistsachain (a1,. . . ,ak) where k≥ 2 suchthat d(a1,ak)= {51_,₆1_,_{(k + 1)}1_}._First,_since_|d(a

1,ak)|> 2,wenotethatthechain (a1,. . . ,ak) mustbe

containedinalevel-2 blob.Consideralevel-2 blob B thatcontainsthechain (a1,. . . ,ak)

on oneof itssides, andsuppose thatthereis asinglenon-trivialcut-edge e onanother

oneof itssides. There mustbe at leastonesuch edge e because otherwisethere would

be parallel edges. Currently we have that d(a1,ak) = {51,61,(k + 1)1}: adding more

cut-edges (trivial or non-trivial) to the sides of B would change the set of distances.

Since B is incidenttoexactlyonenon-trivialcut-edge,itisalevel-2 pendantblob.

Lemma 5.9. A level-2 network N contains apendant level-2 blob of theform (1,0,0,0)

containing theleaf a if and only if dm(a,x) ≥ 6 for all x∈ X − {a} and for any two leaves y,z∈ X − {a}, dm(a,y)+ dm(a,z)− dm(y,z)≥ 8.

Proof. Suppose ﬁrst that a pendant level-2 blob B contains only the leaf a. Let uv

denote thenon-trivialcut-edgeincident to B,where u is thevertex thatis on B.Now,

theshortestdistancefrom a to u isexactly 3.Furthermore,theshortestdistancefrom u

toaleaf x thatisnot a isatleast 3,sincesuchapathmustcontaintheedge uv,anedgeof

(20)

anotherblobsinceallleavesareassumedtobecontainedinblobs.Therefore dm(a,x)≥ 6

for all x ∈ X − {a}. To prove the second statement, let y,z ∈ X − {a}. Then by the

triangleinequality,wehave

dm(a, y) + dm(a, z)− dm(y, z) = dm(v, y) + dm(v, z)− dm(y, z) + 8≥ 8.

Now suppose that dm(a,x) ≥ 6 for all x ∈ X − {a} and for any two leaves y,z ∈

X− {a},wehave dm(a,y)+ dm(a,z)− dm(y,z)≥ 8.Theﬁrstconditionimpliesthat (a)

isamaximalchain.Supposeﬁrstthat a wascontainedinalevel-1 blob B.Notethat B

cannot be pendant as otherwise thenetwork would have parallel edges. Let pa denote

theneighbourof a (avertex of B),andlet py,pzdenote thetwoneighboursof pa on B

thatarenot a.Thevertices py and pzarenecessarilyincidenttonon-trivialcut-edges,as

otherwise a wouldbecontainedinachain,inwhichcasethecondition dm(a,x)≥ 6 would

beviolated forsomeleaf x in thechain.Nowlet y and z denoteany leavesin X− {a}

thatcanbereachedfrom B viathecut-edgesincidentto pyand pzrespectively.Thenwe

havethat dm(a,y)+ dm(a,z)− dm(y,z)= 2 ifashortestpathbetween py and pzpasses

thevertex pa,andwehave dm(a,y)+ dm(a,z)−dm(y,z)= 3 otherwise.Thiscontradicts

our second condition, and therefore we may assume that the leaf a is contained in a

level-2 blob B.Suppose that B is a non-pendant blob,in other words, thatthere are

atleasttwonon-trivialcut-edgesincidentto B.Taketwo non-trivialcut-edgesthatare

closest to a, and take any two leaves y and z that can be reached from B via these

cut-edges. Theshortestdistance from a to theendpoints of these cut-edges on B isat

most 3. Therefore we have dm(a,y)+ dm(a,z)− dm(y,z) ≤ 6, which contradicts our

second condition. Therefore we mayassume that the leaf a is contained in apendant

level-2 blob B.Butasidefromtheleaf a andthesinglenon-trivialcut-edge,noother

cut-edgescanbe incidentto B. Indeed,having anotherleafthatis containedin B violates

theﬁrstcondition, andhaving anothernon-trivialcut-edgecontradicts thefactthat B

waspendant.Therefore B isapendantlevel-2 bloboftheform (1,0,0,0) thatcontains

asingleleaf a.

Lemma 5.10.Let N be a level-2 network on X containing a pendant level-2 blob of

the form (k,0,0,0) for k ≥ 1 with the chain (a1,. . . ,ak). Then we can replace the pendant blob by a leaf z to obtain a network N on X = X∪ {z}− {a1,. . . ,ak}. For every x∈ X− {z},wecanuniquelypartitionthemultisetofdistances d(x,a1) intofour

equalsizedsets A,B,C,D suchthat A− 3= B− 4= C− (k + 2)= D− (k + 3). Then themultisetsof distancesof N containstheelements

dN(x, y) =

dN_{(x, y)} _{if x, y}_{∈ X}_{− {z}}

A− 3 if y = z.

Proof. We ﬁrst show thatthe partition of d(x,a1) exists and that it is unique. Let B

(21)

thatisanendpointofanon-trivialcut-edge.Let x∈ X− {z}.Everypathfrom x to a1

consists ofa path from x to q and apath from q to a1. There are four possible paths

from q to a1oflengths 3,4,k + 2,and k + 3.Byananalogousargumentusedintheproof

ofLemma5.7,thereisauniquepartitionof d(x,a1) intofourequalsizedsets A,B,C,D

suchthat A− 3= B− 4= C− (k + 2)= D− (k + 3).

Uponreplacingthependantblob B byaleaf z,wenotethatthemultisetofdistances

between a leaf x ∈ X − {z} and z in N is equivalent to the multiset of distances

between x and q in N .Thismultisetofdistancesispreciselytheset A−3.Let y∈ X−{z}

be anotherleafthatisnot x.Thenallpathsbetween x and y in N areunaﬀectedafter

replacing B by aleaf z;therefore dN(x,y)= dN(x,y). Pendant level-2 blobs withatleasttwochains

Lemma 5.11.A level-2 network N on X contains a pendant level-2 blob of the

form (k,,0,0) with chains a = (a1,. . . ,ak) and b = (b1,. . . ,b) with k, ≥ 1 if and only if a and b areadjacenttwice,andforall c∈ a∪ b,wehave dm(c,x)≥ 6 forall x∈ X− (a∪ b) and dm(c,y)+ dm(c,z)− dm(y,z)≥ 8 for anytwoleaves y,z∈ X − (a∪ b).

Proof. Onedirectionfollowsananalogousargument usedintheproofof Lemma5.9.

Toshowtheotherdirection,supposethat a and b areadjacenttwice,andforall c∈ a∪

b,wehave dm(c,x)≥ 6 forall x∈ X−(a∪b) and dm(c,y)+dm(c,z)−dm(y,z)≥ 8 forany

twoleaves y,z∈ X −(a∪b).Since a and b areadjacenttwice,either a and b arecontained

inthesamelevel-1 blobsuchthatthecycleoftheblobis up1p2. . . pkvq1q2. . . qu where pi

and qj denote theneighboursof ai and bj for i∈ [k],j ∈ [], respectively,and u and v

areincidenttonon-trivialcut-edges,or a and b arecontainedinthesamelevel-2 blob B

in which a and b are on two diﬀerent sides of B and there are no other vertices that

subdividethesetwo sidesof B (seeFig.7).

In the ﬁrst case,let B denote the level-1 blob. We takeleaves y and z that canbe

reached from B via the two non-trivial cut-edges. Without loss of generality, assume

that k ≤ . Thentheshortestpathfrom y to z mustpassthroughtheneighboursof ai

forall i∈ [k].Butthen forany c∈ a,wehavethat

dm(c, y) + dm(c, z)− dm(y, z) = 2,

whichcontradicts ouroriginal assumption.

In thesecondcase, let B denotethelevel-2 bloband let e denotethesideof B that

doesnotcontain a nor b.Sincethenetworkcontainsatleasttwoblobs, theside e must

be incident to at least onenon-trivialcut-edge. Supposefor acontradiction thatthere

areatleasttwocut-edgesincidenttotheside e.Let p and q denotetheverticesonside e

such thatif k ≥ 2 then they haveshortest distance 3 and 4 from a1, respectively,and

if k = 1 then they haveshortest distance 3 and at most 4 from a1, respectively. Note

(22)

a a a _{a a a} a a a a a a a a a a a a a1 · · · ak b1 · · · b y z a a a a a a a a a a a a a a a a a a a a a1 · ·_· ak b1 · · · b y z

Fig. 7. Thetwopossibilitiesforwhentwochains a= (a1,. . . ,ak) and b= (b1,. . . ,b) areadjacenttwiceand theyarenotcontainedinapendantlevel-2 blob,asintheproofofLemma5.11.Alevel-1 blob(left)and anon-pendantlevel-2 blob(right).Thedashededgesinbothnetworksrepresentpathsthatarenottrivial cut-edgesfromtheblobtotheleaves y and z.Inthenon-pendantlevel-2 blob,therecouldbeadditional cut-edgesonthesidenotcontainingthechains a and b.

wouldcontradictourassumptionthatforanyleaf x∈ X −(a∪b),wehave dm(a1,x)≥ 6.

Let y and z denote leaves thatcanbe reachedfrom B via thecut-edges incident to p

and q,respectively.Then

dm(a1, y) + dm(a1, z)− dm(y, z)≤ 3 + dm(p, y) + 4 + dm(q, z)− dm(y, z)

= 7− dm(p, q) ≤ 6,

wheretheﬁnalinequalityfollowsas dm(p,q)> 0.Thisisacontradiction.Thereforethere

isexactlyonecut-edgethatisincidentto theside e, fromwhichitfollows that a and b

aretheonlychainscontainedinapendantlevel-2 bloboftheform (k,,0,0).

Lemma5.12. Let N bealevel-2 networkon X thatcontainsapendantlevel-2 blobofthe

form (k,,0,0) with chains a= (a1,. . . ,ak) and b = (b1,. . . ,b). Then we can replace the pendant blob by a leaf z to obtain a network N on X = X∪ {z}− (a∪ b). For every x∈ X,wecanuniquelypartition themultisetofdistances d(x,a1) into fourequal

sized sets A,B,C,D such that A− 3= B− (+ 4)= C− (k + 2) = D− (k + + 3).

Thenthemultisetsof distancesof N containstheelements

dN(x, y) =

dN_{(x, y)} _{if x, y}_{∈ X}_{− {z}}

A− 3 if y = z.

(23)

Table 1

Thenumberof greenedgesbetween two adjacentchains a = (a1,. . . ,ak) and b= (b1,. . . ,b) fordiﬀerent k and values.

= 1 = 2 > 2

k = 1 mA(5) mA+B(5)− 1 mA+B(5)

k = 2 mA+C(5)− 1 mA+B+C+D(5)− 2 mA+B+C+D(5)− 1

k > 2 mA+C(5) mA+B+C+D(5)− 1 mA+B+C+D(5)

Chain-Adjacency Graphs We have now dealt with pendant level-2 blobs of the

forms (k,0,0,0) (Lemmas5.8 and5.9)and (k,,0,0) (Lemma5.11).Fortheremaining

fourcases(ignoringsymmetriccases)lefttoexamine,(k,0,m,0);(k,0,m,n);(k,,m,0);

and (k,,m,n),weemploythefollowinggraph.

Deﬁnition5.13.Achain-adjacencygraph (CAG)hasavertexforeachchain,andbetween

two vertices,

• weinsertarededge ifthechainsare adjacentonceand twored edgesifthechains

areadjacenttwice;and

• if the two chains are adjacent once, we inserta green edge for each length-5 path

betweenendpoints ofthechains(oneper chain)thatdoesnotcontainanyedges of

thetwochains.

TheconditionforjoiningtwoverticesontheCAGviaagreenedgecanindeedbe

ver-iﬁedfromthemultisetsofdistances.Let a= (a1,. . . ,ak) and b= (b1,. . . ,b) denotetwo

chainsthatareadjacentonce,andsupposewithoutlossofgeneralitythat dm(a1,b1)= 4.

To countthe numberofgreenedges between a and b,we fallinto the 9 casesshownin

Table 1. This number is obtained by taking the multiplicity of 5’s in the multiset of

distances between apair of endpoints, minus the number of length-5 paths that pass

through edges of the chains.Let (A,mA) = d(a1,b1); (B,mB)= d(a1,b); (C,mC) = d(ak,b1); (D,mD)= d(ak,b).

We only insert green edges between chains that are adjacent, rather than between

all chains thatare distance-5 apart, to ensure thatchains contained indiﬀerent blobs

are not connected in the CAG. Since we mayassume thatall leaves are contained in

blobs, wenotethattwochainsareadjacentandinthesameblobifandonlyiftheyare

connected by ared edge in theCAG. Note thatthere maybe multiple edges between

twoverticesinaCAG(seeFig.8).WenowshowhowwecanusetheCAGtodistinguish

theconﬁgurationsofpendantblobs fromnon-pendantblobs,andhowitcanbeusedto

distinguishtheremaininglevel-2 pendantblob structures.

ObservethateveryedgeintheCAGcorrespondstoadistinctdistance-4 ordistance-5

pathbetweenapairofchainendpoints.Wesaythatthispathinthenetworkiscovered

by the edge of the CAG. In particular, we also say that the edges of the path of the

(24)

a a a a a a a a c f a a a c a a a a a a a a a a b c f a a a a b c (a) (k, 0, m, 0). (b) (k, , m, 0). a a a a a a a a a a c d f a a a a c d a a a a a a a a a a a a _b c d f a a a a a b c _d (c) (k, 0, m, n). (d) (k, , m, n).

Fig. 8. Eachsubfigureshowsapendantlevel-2 blobtogetherwithitsCAGdirectlybelowit.Oneachblob, f denotesthenon-trivialcut-edge.Eachoftheleaves a,b,c,d canbereplacedbyalongerchainwhilstkeeping thesameCAG.ByTheorem5.14,wehavethatthenetworkcontainsoneofthefourpendantblobsifand onlyiftheCAG(whichcan beobtainedfromthemultisets ofdistances)is exactlytheone inthesame subfigure.In the CAG,thedashedlines representthe rededges andthesolidlines representthe green edges.In(c),thegreenedge cd intheCAGcoversthedottedpathbetween c and d.(Forinterpretationof thecoloursinthefigure,thereaderisreferredtothewebversionofthisarticle.)

coveredbymorethanoneedgeoftheCAG.SeeFig.8(c)foranexampleofadistance-5

(25)

Theorem 5.14.(See Fig. 8.) Let N be a level-2 network on X with at least two blobs, where no pendantblobs areof theform (k,0,0,0) and (k,,0,0) in whichallleavesare contained inblobs.For k,,m,n≥ 1, N contains apendantlevel-2 blob oftheform

• (k,0,m,0) if andonly if there existvertices a and c which forma blob inthe CAG

with 1 rededgeand 2 green edgesbetweenthem.

• (k,,m,0) ifandonlyifthereexistvertices a,b,and c whichformablobintheCAG, where a and b are connected by 2 red edges and the othertwo pairs are connected by 1 rededgeand 1 green edge.

• (k,0,m,n) if and only if there exist vertices a,c, and d which form a blob in the

CAG, whereevery pairofvertices areconnectedby 1 rededgeand 1 greenedge.

• (k,,m,n) if andonly if there exist vertices a,b,c, and d which form ablob in the

CAG, where every pair of vertices are connected by 1 red edge, and a and b are connectedby anadditionalred edge.

Proof. All other possible pendant level-2 blobs are of the form (k,0,0,0) or of the

form (k,,0,0). The CAG of the blob of the form (k,0,0,0) is the singleton graph;

theCAGofthebloboftheform (k,,0,0) istwoverticesconnectedby 2 rededges.The

CAG foreitherof these two pendant blobs is notthe sameas any of theCAG for the

four pendant blobsthatwe investigatehere.Thereforewemaydistinguish theCAG of

thependant level-2 blobsfromoneanother.

Nowweconsider non-pendantlevel-2 blobs.First,iftheblobcontainsnoleavesthen

theCAGofsuchablobisempty,sowearedone.Hence,supposethatsomenon-pendant

level-2 blob B containssomeleaves.Observethat B canbeobtainedbyintroducing

non-trivialcut-edgestooneofthesixpossiblelevel-2 pendantblobs.

Supposeﬁrstthat B canbeobtainedbyintroducingnon-trivialcut-edgestoapendant

bloboftheform (k,0,0,0).Then, B containsoneormorechainsononesideoftheblob,

and the possible CAGs would be a path (or disjoint paths) of red edges that connect

adjacent chains,orifitcontainsagreenedge, twovertices thatareconnectedby 1 red

and 1 greenedge.However,noneoftheseCAGscorrespond tothatofthefourpendant

blobs weconsiderhere.

Nowsupposethat B canbeobtainedbyintroducingnon-trivialcut-edgestoapendant

bloboftheform (k,,0,0).Then, B containsoneormorechainsontwosidesoftheblob,

and at least onenon-trivialcut-edge on the thirdside. None of theedges inthe CAG

of B will cover anedge ofthis thirdside, sinceallpaths betweenchainendpoints that

uses thissidewill be oflengthat least 6.Thereforetheonlypossible CAGswe canget

on B is a cycle or a path (or paths) of red edges, or two vertices connected by 1 red

and 1 greenedge.

Suppose now that B can be obtainedby introducingnon-trivial cut-edgesto oneof

thefourremaininglevel-2 pendantblobs.Uponintroducingnon-trivialcut-edgesto the