Reconstructibility of unrooted level-k phylogenetic networks from distances
van Iersel, Leo; Moulton, Vincent; Murakami, Yukihiro DOI
10.1016/j.aam.2020.102075 Publication date
2020
Document Version Final published version Published in
Advances in Applied Mathematics
Citation (APA)
van Iersel, L., Moulton, V., & Murakami, Y. (2020). Reconstructibility of unrooted level-k phylogenetic networks from distances. Advances in Applied Mathematics, 120, 1-30. [102075].
https://doi.org/10.1016/j.aam.2020.102075 Important note
To cite this publication, please use the final published version (if applicable). Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy
Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
Contents lists available atScienceDirect
Advances
in
Applied
Mathematics
www.elsevier.com/locate/yaama
Reconstructibility
of
unrooted
level-k phylogenetic
networks
from
distances
Leo van Iersela, Vincent Moultonb,Yukihiro Murakamia,∗
aDelftInstituteof AppliedMathematics,DelftUniversityofTechnology,
Van MourikBroekmanweg6, 2628XE,Delft,theNetherlands
bSchoolofComputingSciences,UniversityofEastAnglia,NR47TJ,Norwich,
UnitedKingdom
a r t i c l e i n f o a b s t r a c t
Articlehistory:
Received30October2019 Receivedinrevisedform12June 2020 Accepted21June2020 Availableonlinexxxx MSC: 05 Keywords: Phylogeneticnetworks Level-k networks Distancematrix Reconstructibility
A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, whereone is asked to find a phylogenetic network that in some way representsa givendistance matrix, whichgives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which k are unrooted level-k networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distancesisgiven.Weprovethatlevel-1 networksandlevel-2 networksarereconstructiblefromtheirshortestdistancesand multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than 1 are not reconstructiblefromshortestdistances andthatnetworksof levelhigherthan 2 arenotreconstructiblefromtheirmultisets ofdistances.
©2020TheAuthor(s).PublishedbyElsevierInc.Thisisan openaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
* Correspondingauthor.
E-mailaddresses:L.J.J.vanIersel@tudelft.nl(L. van Iersel),V.Moulton@uea.ac.uk(V. Moulton),
Y.Murakami@tudelft.nl(Y. Murakami).
https://doi.org/10.1016/j.aam.2020.102075
0196-8858/©2020TheAuthor(s). PublishedbyElsevierInc. ThisisanopenaccessarticleundertheCC BYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Phylogenetic trees are often used to represent the evolutionary history of species,
or moregenerally, taxa [11]. Treescan be apowerful tool for elucidating relationships
between species, especiallyincase the speciesinquestion haveevolved only via
speci-ation events.However,other eventsoften also driveevolution, includinghybridisation,
introgression,andlateralgenetransfer.Whensuchreticulateeventsoccur,moregeneral
graphical structures,knownas phylogeneticnetworks [2,17] canbe auseful additionto
trees.
Therearetwomaintypesofphylogeneticnetworks:rootedandunrootednetworks.A
rootednetworkisadirectedacyclicgraphthatrepresentshowextanttaxahaveevolved
from asingle commonancestor, alsoknownas theroot. Internalvertices denote either
speciation or reticulate events, and edges have directions to indicate the transfer of
genetic material between the two vertices that are incident to it. Unrooted networks
have similar propertiesexceptthey havenodirectionon theedges. A lackof direction
could,forexample,representanambiguityinknowledgeofthedirectioninwhichgenetic
materialistransferredbetweenspecies.Notethateveryrootednetworkhasanunderlying
unrootednetwork,thatcanbeobtainedbysuppressingtherootvertexandignoringedge
directions.Conversely,onecantrytoobtainarootednetworkfromanunrootednetwork
by estimating the location of the root via an outgroup, if it is known which vertices
representreticulations [16].Inthispaperwewillonlyconsiderunrootednetworks,which
we shallcallnetworksforshort.WepresentanexampleofsuchanetworkinFig.1.
As the shiftfrom phylogenetic trees to networks has become moreprevalent in the
biological literature,finding goodways toconstruct phylogeneticnetworks hasbecome
a core theme in phylogenetics. Such an undertaking has experienced major
develop-mentsthroughvariousreconstructionapproaches(e.g.,maximum-likelihood [20];
build-ing blocks [18,23,19]; distance-based [7,5]; see [17] for an overview). In this paper we
consider thedistance-basedapproach,inwhichoneisgivenadistancematrixontheset
oftaxainquestionandthenaimstobuildanetworkrepresentingthismatrix.Anentry
inadistancematrixgivestheevolutionarydistance,ameasureofgeneticdivergence
be-tweendistincttaxa.Thisraisesthefollowingquestion.‘Isthereanetworkthatprecisely
representsthegivendistance matrix?’
The groundwork for distance-based methods is well-established for phylogenetic
trees [27,26,11,24]. For networks, the story is more complicated. Since networks can
contain cycles, there canbe more than onepath between two taxa, whichcan leadto
morethanonedistance.Thisresultsinvarioustypesofdistancesthatcanbeassociated
to a network. Two such types, which we cover in this paper, include the shortest
dis-tancesandthemultisetsofdistances.Fortheshortestdistances,wesearchforanetwork
in which the distance of ashortest path between each pair of taxa coincides with the
matrix; forthemultisets ofdistances,we searchforanetwork inwhich themultisetof
distances of all paths between each pair of taxa coincides with the matrix. In Fig. 1,
be worked out from the multisetsof distances by taking the smallestelement for each
matrixentry.
Beforeproceedinganyfurther,wemustacquaintourselveswithtwosimilar,yetsubtly
different notionsthat arevital inunderstandingdistance-based methods for networks.
Onecaneitherconstruct or reconstruct networksfrom distance matrices. Constructing
a network means that we initially start with a distance matrix and come up with a
network that is consistent in some way with such a matrix. Some classical network
construction methods from distances include Neighbor-Net [7] and T-Rex [22]. In the
process,oneissometimesinterestedinfindinganetworkthatoptimises someparticular
criterion, such as the hybridisation number [9,10]. Networks obtainedvia construction
methodsareoftennon-unique,whichisthebiggestdistinctionbetweenconstructionand
reconstructionmethods.
Reconstructinganetworkmeansthatwestartwithanetwork,findthedistance
ma-trixthatisassociatedtoit(e.g., shortestdistances), andtrytoreconstructtheoriginal
network from the distance matrix. The goal then is to decide which networks canbe
uniquelyreconstructed from theirdistances, inother words, to decide uponthe
recon-structibility ofdifferentclassesofnetworksfromtheirvariousdistancematrices.Themain resultsof [8,3,4,6,5,28,15] followthisexactformat;theyshowthatsomeunrooted/rooted
networks(orarepresentativeoftheequivalenceclass)canbereconstructedfromcertain
distancematrices. Roughlyspeaking,theyshowthatwithinaparticularnetwork class,
iftwo networkshave thesameparticular distance matrixthen thenetworksare
equiv-alent. Interestingly, although distance-based reconstruction results have been recently
developed for rooted networks, similar problems have been less studied for unrooted
networks.
As a first step in this direction, we focus on reconstructing unweighted unrooted
networks. Every edge in the network has a weight of 1, which means that distances
between two taxa correspond to the number of edges contained in paths between two
taxa.Now,toidentifywhichnetworksarereconstructiblefromcertaindistancematrices,
wecall onthenotionof thelevelof anetwork. Thelevel ofanetworkis themaximum
number of edges that need to be deleted from a biconnected component to obtain a
tree [12]. In this paper we consider the problem of reconstructing level-k networks in
general,bothfrom theirshortestdistances andtheirmultisetsofdistances.
Arecent paperhas shownthatoptimalcactusgraphsare reconstructiblefrom their
shortestdistances,while ingeneralthere could be manycactus graphsthatrealise the
same shortest distances [15]. Cactus graphs are connected graphs in which each edge
belongs to at most one cycle – these graphs are a generalisation of level-1 networks.
Here, an optimal network refers to one that realises the shortest distance matrix, in
whichthetotalsumof edgeweights isminimal. Thedifference betweenthisresult and
our result is that we consider unweighted networks, for which we may leave out the
optimality restriction. The problem of reconstructing cactus graphs has also been of
interest withinthe graphtheory literature. Some haveconsideredreconstructing them
different from thedistancedata thatwe considerinthispaper. Therefore,ourproblem
of reconstructingnetworks fromdistances isfundamentallydifferentfrom bothofthese
papers.
The rest ofthepaper is organisedas follows. Inthenext sectionwe introducebasic
definitions and notations. In Section 3, we show that in general, level-2 networks are
not reconstructible from their shortest distances (Lemma 3.1), and that networks of
level higher than 2 arenotreconstructible from theirshortest distancesnor from their
multisets of distances (Lemma 3.2). In Section 4, we show that level-1 networks as
well as level-2 networks on fewer than 4 leaves are reconstructible from their shortest
distances (Theorem 4.2 and Lemma 4.4). In Section5, we show that level-2 networks
arereconstructiblefromtheirmultisetsofdistances(Theorem5.1).Weconcludewitha
discussion inSection6onopenproblems andpossiblefuturedirectionsinthisarea.
2. Preliminaries
Definition 2.1.Let X be anon-emptyfinite set.An(unweighted unrooted binary
phylo-genetic) network N on X is asimplegraph(anunweighted, undirected graphwith no
loopsor multipleedges)with
1. |X| verticesofdegree-1(theleaves);and
2. allotherverticesofdegree-3 (theinternal vertices).
Theleaves arebijectivelylabelled bytheset X.If|X|= 1 thenwe definethesingleton
graph with one vertex labelled by the element of X as the network on X. A network
with nocyclesis a(phylogenetic) tree.
Deleting an edge uv from anetwork is theaction ofremoving theedge uv and
sup-pressinganydegree-2 verticesintheresultingsubgraph.Deletingavertex fromanetwork
istheactionofremovingthevertex,deletingallitsincidentedges,andsuppressingany
degree-2 vertices in the resulting subgraph. A cut-edge of a network is an edge whose
deletion disconnects thenetwork. Wecall acut-edge trivial if theedge isincident to a
leaf, and non-trivial otherwise. Note that for a network N on X, deleting acut-edge
breaks the network into two components. The leaf-set X can be partitioned into the
leaves that are contained in one component and the leaves that are contained in the
other; thereforeeverycut-edgeofanetworkinduces apartition X = Y ∪ Z of X (where
oneof Y or Z couldpossiblybeempty).Thesepartitionsarenotuniqueingeneral(i.e.,
two distinct cut-edgescan inducethe samepartition). Uponcutting anon-trivial
cut-edge,ifoneofthecomponentsisatree,thenwesaythatthesubgraphthatcorresponds
to this component is a pendant subtree. Given a cut-edge uv we say that aleaf x can
be reached from u via uv if, upon deleting the edge uv without suppressing degree-2
Abiconnectedcomponent(blob) ofanetworkisamaximal2-connectedsubgraphwith
at least three vertices. We say that a network is a level-k network if at most k edges
must be deletedfrom every blob to obtainatree. We saythata leafis contained ina
blobiftheneighbouroftheleafisavertex oftheblob.Acut-edgeisincident toablob
ifoneoftheendpoints of theedgeis avertexof theblob.A blobis pendant ifthere is
exactlyone non-trivialcut-edge that is incident to theblob. We saythat aleaf x can
be reached fromablob B viaacut-edge uv if u isavertex of B and x canbe reached from u via uv.
Let N be anetwork on X and let x and y be leaves in N . We recall the notation
usedin [5].Themultisetof distances between x and y,denoted d(x,y) (andsometimes as dN(x,y) where necessary), is themultiset consisting of lengthsof all possible paths
between x and y in N .Since N isanunweightednetwork,thelengthofapathissimply
thenumber of edgescontained inthe path. WeletD(N) denote the|X|× |X| matrix
whose (x,y)-thentryis d(x,y).Wefurtherdefinetheshortestdistance between x and y,
denoted dm(x,y),bytaking dm(x,y)= min d(x,y).WeanalogouslydefineDm(N ) tobe
the|X|× |X| matrixwhose (x,y)-thentryis dm(x,y).Anexampleofanetworkwithits
multisetsofdistancesisillustratedinFig.1.
Weusethefollowingnotationforthemultisets.Amultiset is atuple (A,m) where A
isasetand m isafunctionthatspecifiesthemultiplicityofeachelementin A.For x∈ A,/
welet m(x)= 0.We will, forthemost part,write (A,m) as A={am(a1)
1 ,. . . ,a
m(ak)
k }.
Let n be aninteger.We let A− n denote themultisetobtained bysubtracting n from
eachelementof A (i.e., A− n={(ai− n)m(ai): i∈ [k]}.)Given twomultisets (A,mA)
and (B,mB),thesum A+ B is definedasthemultiset (A∪ B,mA+B) where mA+B =
mA(x)+ mB(x) for x∈ A∪ B.
Anetwork N realises themultisetsofdistancesD if D(N)=D.Similarly,anetwork N
realisestheshortestdistancesDm ifDm(N )=Dm.Aswewillshowinthenextsection,
therecouldbe manydistinctnetworksthatrealisethesamedistance matrix.Therefore
weemphasise thefollowingnotion.
Definition2.2.Anetwork N isreconstructible fromitsmultisetsofdistances(respectively
theshortestdistances)if N istheonlynetworkthatrealisesD(N) (respectivelyDm(N )).
Wenowintroducetwosubstructuresofnetworks,thecherry andthechain,whichare
keyingredientsinprovingthemainresultsofthispaper.
Definition2.3.Twoleaves x and y formacherry iftheyshare acommonneighbour.
Observethat x and y formacherryifandonlyif d(x,y)={2}.Inaddition, x and y
formacherryifandonlyif dm(x,y)= 2.
Definition 2.4. A chain of length k ≥ 1 is a k-tuple of leaves (a1,. . . ,ak) such
a a a a a a a a a a a a a b c d e f a b c d e a {01} {31, 62} {41, 51, 61, 71} {51, 61, 71, 81} {51, 61, 71, 81} b {01} {41, 51, 61, 71} {51, 61, 71, 81} {51, 61, 71, 81} c {01} {52, 82} {52, 82} d {01} {21} e {01}
Fig. 1. Alevel-2 networkwithitsmultisetsofdistances.Thenetworkcontainstwochains (a,b) and (c),and acherry{d,e}.Alledgesincidenttoleavesaretrivialcut-edges,andedge f istheonlycut-edgethatis non-trivial. Thedashedpathisthesideoftheblobthatcontainstheleaf c.Inthedistancematrix,the diagonalelementsare{0},andasthematrixissymmetric,manyoftheelementsareomitted.Theshortest distancematrixcanbeobtainedbytakingthesmallestelementineachmultisetstobetheelementofthe matrixinthesameposition.
Call achain (a1,. . . ,ak) maximal if there is nochain (b1,. . . ,b) such that{a1,. . . ,
ak} {b1,. . . ,b}. We assumeall chainsto be maximal, unless stated otherwise. Two
chains (a1,. . . ,ak) and (b1,. . . ,b) are adjacent if dm(ai,bj)= 4 forat least oneof i∈ {1,k} and j ∈ {1,}. Two chainsare adjacent twice if dm(a1,b1) = dm(ak,b) = 4 or
if dm(a1,b)= dm(ak,b1)= 4.
Given achain a= (a1,. . . ,ak),let pi denote theneighbour ofthe leaf ai for i∈ [k].
Theedges pipi+1 for i∈ [k − 1] arecalled theedgesof thechain. Wesaythatthechain
isincident tocut-edges iftheedgesofthechainarecut-edges.Observethatoneofthese edgesisacut-edgeifandonlyiftheyareallcut-edges.Wesaythatthechainiscontained in ablob B if theedgesof thechainareedgesin B.Observethatoneofthese edgesis
anedge of B ifand onlyiftheyarealledgesin B.
Note thataleaf canbe inbothacherryand achain.Inanetwork withoutcherries,
itis possibletopartitiontheleavesintochains.
Let B be alevel-2 blob of somenetwork N . We mayobtain the generator of B by
deleting allcut-edgesthatare incident to B andtaking thecomponentthatis B.The
edgesofthegeneratorof B arecalledthesides ofthegenerator,orsimplythesidesof B.
Let N beanetwork withno pendantsubtrees,let e beasideof B,and let x be aleaf
in N .Iftheneighbourof x,say p,subdivides e in N thenwesaythatx is ontheside e
or thattheside e contains x. We say thatachain a = (a1,. . . ,ak) is on the side e or
that theside e contains thechain a if everyleaf ai inthe chainis on theside e. If an
endpointofacut-edge uv subdivides e thenwesaythattheside e isincident to uv.
Foranoverviewofthedefinitionspresented inthissection, seeFig.1.
3. Networksthatcannotbereconstructed
Inthissectionwegiveexamplesofnetworksthatcannotbereconstructedfrom their
shortest distancesor from theirmultisetsofdistances.Fig.2shows twodistinct level-2
networks with the same shortest distance matrix. Observing thatwe may replace the
leaves with the same label by the same pendant subtree to extend this example to a
a a a a a a a a a a a a a b c d a a a a a a a a a a a a c d a b
Fig. 2. Two level-2 networks with the same shortest distances between any pair of leaves.
a a a a a a a a a a a a b a a a a a a a a a a a a a a b
Fig. 3. Twolevel-3 networksthat have thesame shortestdistancesand the samemultisets of distances betweenanypairofleaves.
Lemma3.1. Thereexisttwodistinctlevel-2 networkson n leavesfor n≥ 4 withthesame
shortestdistancematrix.
NotethatthenetworksinFig.2havedifferentmultisetsofdistances–weinvestigate
this furtherinSection5and showthere thatlevel-2 networks are reconstructiblefrom
theirmultisetsofdistances.
Fig. 3 presents two level-3 networks on 2 leaves that have the same multisets of
distances.Becausetheshortestdistance matrixcanbe obtainedby takingthesmallest
numberfor each element inthe multisets of distances,the two networks also havethe
sameshortestdistance matrix.Observethatthiscanbe generalizedtolevel-k networks
for k≥ 3 byreplacingthelevel-3 blobbyanarbitrarylevel-k blob.Inaddition,applying
thesamependantsubtreeargumentasinthelevel-2 networkcasegivesusthefollowing
lemma.
Lemma3.2. Thereexisttwodistinctlevel-k networksforall k≥ 3 withthesameshortest
distancematrix/multisetsof distances.
Therefore,networksoflevelhigherthan 1 arenotreconstructiblefromtheirshortest
distances ingeneral; networksof level higher than 2 arenotreconstructible from their
multisetsofdistancesingeneral.
4. Reconstructibilityfromshortestdistances
In this section we show that level-1 networks as well as level-2 networks on fewer
than 4 leaves are reconstructiblefrom theirshortest distances.We first look at level-1
networks. Noting that pendant blobs contain exactly one chain, the following lemma
Lemma 4.1. Let (a1,. . . ,ak) be a chain of length k ≥ 2 in a level-1 network. Then (a1,. . . ,ak) iscontainedinapendantblobifandonlyif dm(a1,x)= dm(ak,x) for all x∈ X − {a1,. . . ,ak}.
Proof. Suppose firstthatachain (a1,. . . ,ak) iscontained inapendant blobB. Let p1
and pk denote the neighbours of a1 and ak respectively,and let q denote thecommon
neighbourof p1and pk.Let x∈ X −{a1,. . . ,ak}.Observethatanyshortestpathfrom x
to aleafcontainedin B mustpassthroughthevertex q.Thereforewehavethat
dm(a1, x) = 2 + dm(q, x) = dm(ak, x).
Toshowtheotherdirection,weprovethecontrapositive.Supposethat(a1,. . . ,ak) is
not containedinapendant blob.Theneitherthechain isincident tocut-edges, or the
chainiscontainedinanon-pendantblob.Let pi denotetheneighboursof ai for i∈ [k],
andlet q denotetheneighbourof p1thatisnot a1norp2.Supposefirstthatthechainis
incidenttocut-edges.Let x bealeafinthenetworkthatisnotonthechain,suchthat x
is reachablefrom p1 via p1q.Theneverypathbetween x and ak mustpassthroughthe
vertices pi for i ∈ [k], and therefore dm(x,ak) = dm(x,a1)+ k− 1. Since k ≥ 2, the
equality inthestatementofthetheoremdoesnothold.
So nowconsider thecase thatthe chainiscontainedinanon-pendantblob.Then q
is not a neighbour of pk; the path between q and pk that does not contain the
ver-tices{p1,. . . ,pk−1} containsatleastthreevertices.Nowlet x bealeafnotonthechain
thatcanbereachedfrom q viaitsincidentnon-trivialcut-edge.Theshortestpathfrom x
to a1 andtheshortestpathfrom x to ak bothcontaintheshortestpathfrom x to q.By
observingthattheshortestpathfrom q to a1isshorterthantheshortestpathbetween q
and ak,itfollowsthat dm(x,a1)< dm(x,ak).Thereforetheequalityinthestatementof
thetheorem doesnothold.
Theorem 4.2.Level-1 networks arereconstructiblefrom theirshortest distances.
Proof. Firstweshowthatwecanrecognisecherries,reducethemandchangetheshortest
distances accordingly. Note that as mentioned above, a pair of leaves forms a cherry
precisely iftheir shortestdistance is 2. Ifthere exists acherry{x,y},we replace it by
a leaf z and set dm(z,a) := dm(x,a)− 1 for all a ∈ X − {x,y}. All other shortest
distances betweenleaf-pairs remain unchanged.After reconstructing thenetwork from
the modified distance matrix,we replace the leaf z by acherry on {x,y}. So, without
loss ofgenerality,weassumefrom nowonthatthere arenocherries.
Wenow consider thecasethatthere isexactlyoneblob.Since thereare nocherries,
allleavesarecontainedinthisblob.Wecanrecognise thisbyseeingthatthereisachain
(a1,. . . ,ak) of length k≥ 3 that satisfies dm(a1,ak)= 3. Thisimmediately shows how
to reconstructlevel-1 networks that contain exactlyone blob.Hence, we assume from
Note that pendant blobs must contain a chain of length at least 2 since networks
do not contain parallel edges. By Lemma 4.1, we can find chains on pendant blobs.
Wereduceachain (a1,. . . ,ak) contained inapendantblob byreplacing theblob bya
leaf z and setting dm(x,z) := dm(x,a1)− 2 for all x ∈ X − {a1,. . . ,ak}. All shortest
distances between other leaf-pairs remain unchanged, since their paths do not travel
throughpendantblobs.It isagaineasytoreconstructtheblobafter reconstructingthe
reducednetwork,sinceweknowthat(a1,. . . ,ak) mustformachainontheblob,inthat
order.
Thisfinishestheproofofthetheoremsinceanylevel-1networkhasacherry,apendant
blob,or exactlyoneblob.
WenotethattherestrictionofTheorem4.2tonetworkswithouttrianglesalsofollows
from Theorem 5 of [15]. Wegive the proof above to account for the triangle case and
togiveamoredirectgraph-theoreticalproofthatisindependentoftheresultsprovided
byHayamizuetal. Observethattrees(level-0 networks)arealsolevel-1 networks.Thus
Theorem4.2givesthefollowingcorollary,whichweincludehereforcompleteness. This
isaclassicalresultthatwasprovenin [14].
Corollary4.3. Treesare reconstructiblefromtheir shortestdistances.
Next,we show that level-2 networkson fewer than 4 leaves are also reconstructible
fromtheirshortestdistances.
Lemma 4.4. Level-2 networks on X for|X| ≤ 3 are reconstructible from their shortest
distances.
Proof. Therecan only be one network ona single taxon, namely the singleton graph.
Suchagraphistriviallyreconstructiblefromitsshortestdistances.Sosupposethat|X|=
2,say X ={x,y},andlet N beanetworkon X.Below,wewillprovetheclaimthat N
consistsonlyoflevel-2 blobs,whereeachlevel-2 blobisincidenttoexactlytwocut-edges.
Inparticular, N containsatmosttwopendantblobs,oneofwhichcontainstheneighbour
of x and the other theneighbour of y. Since eachadditional level-2 blobincreases the
shortestdistancebetween x and y by 3,itfollowsthat dm(x,y)= 3k + 1 where k denotes
thenumberoflevel-2 blobs in N .From there,itfollows that N isreconstructiblefrom
itsshortestdistances.
Wenowprovetheclaim.Note firstthateveryblobin N mustbe incidenttoexactly
twocut-edges.Ablobcannotbeincidenttoonlyonecut-edge.Iftheblobislevel-1 then
thiswouldimplythatitcontainsaloop;iftheblobislevel-2 thenthiswouldimplythat
itcontainsparalleledges.Thisalsoimpliesthateverypendantblobmustbeincidentto
at leastone trivialcut-edge. Onthe other handifablob is incident to morethantwo
cut-edges, say c cut-edges, then this would imply thatthe network contains at least c
a a a a x y z a a a a a a x y z a a a a a a a a x y z a a a a a a a a x y z (a) (b) (c) (d)
Fig. 4. Thefourpossible degree-3 vertices intheblob-tree ofalevel-2 network onthree leaves {x,y,z}.
(a)Aninternalvertex.(b)Alevel-1 blob.(c)Alevel-2 blobwithallleavesreachablefromdifferentsides ofthe blob.(d) Alevel-2 blobwhere y and z arereachablefromthesamesideofthe blob.Thedashed linescanbereplacedbypathsthatcontainanynumberoflevel-2 blobs.Thisispossiblebecausewetake thedistancesmodulo 3 andsinceeachadditionallevel-2 blobcontributesanextralength-3 totheshortest inter-taxadistance.
this implies that the network contains at least c > 2 leaves, which is a contradiction.
Thereforeeveryblobin N mustbeincidentto exactlytwocut-edges.Now observethat
alevel-1 blobthatisincidenttoexactlytwocut-edgescontainsparalleledges.Itfollows
that everyblob in N must be alevel-2 blob thatis incident to exactlytwo cut-edges.
Thisproves theclaim,fromwhichitfollowsbytheargumentpresentedabovethat N is
reconstructiblefrom itsshortestdistancesfor|X|= 2.
Suppose now that |X| = 3, and let X = {x,y,z}. Here we consider BT (N ), the
blob-tree of N ,whichisobtainedfrom N byreplacingeachblobof N byasinglevertex.
Since|X|= 3, BT (N ) containsexactlyonevertexofdegree-3,threeverticesofdegree-1
(which are theleaves x,y, and z), and all other vertices are of degree-2. Bya similar
argument as presented inthe |X|= 2 case, thedegree-2 vertices of BT (N ) correspond
tolevel-2 blobs.Thedegree-3 vertexcouldbeaninternalvertexofthenetwork,alevel-1
blob, or alevel-2 blob. In thecase thatit is alevel-2 blob, there aretwo possibilities.
Eitherthethreeedgesareincidenttodifferentsidesoftheblob,ortwoedgesareincident
to thesamesideoftheblob andthethirdedgetoanotherside.SeeFig.4forthesefour
possibilities.Observethatthesefourpossibilitiesallcontributedifferentdistancelengths
to inter-taxadistances.Inparticular, wehavethatthedegree-3 vertexisa(an)
• internalvertex ifandonlyif
(d(x, y), d(y, z), d(x, z)) = (2(mod3), 2(mod3), 2(mod3));
• level-1 blobifandonlyif
(d(x, y), d(y, z), d(x, z)) = (0(mod3), 0(mod3), 0(mod3));
• level-2 blobwithalledgesondifferentsidesifandonlyif
(d(x, y), d(y, z), d(x, z)) = (1(mod3), 1(mod3), 1(mod3));
• level-2 blobwiththetwo edges thatleadtoleaves x and y onthe samesideifand
(d(x, y), d(y, z), d(x, z)) = (0(mod3), 1(mod3), 1(mod3)).
Thereforewemayidentifytheblobcorrespondingtothedegree-3 vertexoftheblob-tree
bytakingthedistancesmodulo 3.
To finish the proof, take two networks N,N with the same shortest distance
ma-trix. By the previous paragraph, we may assume that N and N have the same blob
corresponding to the degree-3 vertex of their blob-trees. Assume that N = N. Then
the two blob-trees BT (N ) and BT (N) are different. Note that the shortest distances
are determined by the number of degree-2 vertices between leaves in the blob-tree.
Since Dm(N ) = Dm(N), we have that the number of degree-2 vertices between two
leaves, say x and y, is the same in both BT (N ) and BT (N). However since BT (N )
differsfrom BT (N),the positioningofthe degree-3 vertexmustdiffer.Butthis would
imply that upon placing z together with some degree-2 vertices, we can only satisfy
one of dN
m(x,z) = dN
m(x,z) or dNm(y,z) = dN
m(y,z). This contradicts the assumption
that Dm(N ) = Dm(N). Therefore we must have N = N, and that level-2 networks
on X for|X|= 3 are reconstructiblefromtheirshortestdistances.
5. Reconstructibilityoflevel-2 networksfromtheirmultisets ofdistances
Inthelasttwosections,weshowedthatlevel-1 networksarereconstructiblefromtheir
shortestdistances,level-k networksfor k≥ 2 areingeneralnotreconstructiblefromtheir
shortestdistances,andlevel-k networksfor k≥ 3 areingeneralnotreconstructiblefrom
theirmultisetsofdistances.Inthissection,weinvestigatetheremainingcase,andshow
that level-2 networks are reconstructible from their multisets of distances. The main
theoremisthefollowing.
Theorem5.1. Level-2networks are reconstructiblefromtheir multisetsof distances.
Thekeyideasinprovingthetheorem areas follows. Wefirstidentifyandreduceall
cherriesofthe network.To identifycherriesweobserve thattwo leaves x and y forma
cherryifandonlyif d(x,y)={2}.Toreducecherrieswereplaceitbyanewleaf z and
adjustthedistance matrixaccordingly,as doneforthe level-1 networksintheproof of
Theorem 4.2. Next, weidentify allleaves thatare notcontained inblobs, delete those
leaves,andadjustthedistancematrixaccordingly.Weshowthateachleafthatisdeleted
in this manner can be reattached to the reduced network in a unique fashion. After
applyingthesetwo reductions,two chainsareadjacentifandonlyiftheyarecontained
in the same blob. Using this observation, we then show that it is possible to identify
pendantblobs, replacethembyanewleaf,andadjustthedistancematrixaccordingly.
Continuinginthisfashion,weeventuallyreachthesituationwhenthereducednetwork
contains exactly one blob. We show that networks on single blobs are reconstructible
from their multisets of distances, at which point it follows that simply reversing the
Westartwiththetwoeasycases,whenthenetworkcontainsacherryorasingleblob.
Observation 5.2.Let N be a level-2 network on X and suppose that leaves x and y
form a cherry in N . Upon replacing the cherry by a leaf z, we obtain a network N
on X = X ∪ {z}− {x,y} such that the multisets of distances for N contains the elements
dN(a, b) =
dN(a, b) if a, b∈ X − {x, y}
dN(a, x)− 1 if a ∈ X − {x, y} and b = z.
Onemayobtain N from N byreplacingtheleaf z by acherry{x,y}.
Lemma5.3. Level-2 networkscontainingasingleblobarereconstructiblefromtheir
short-est distances.
Proof. Let N be a level-2 network containing a single blob. Assume without loss of
generality that N contains no cherries, as we can recognise them from the shortest
distances and reduce them by Observation 5.2. If N is a level-1 blob then we may
reconstruct it from shortest distances by Theorem 4.2. If N is alevel-2 blob then the
blob must contain atleast two chainssinceit hasno paralleledges, and at mostthree
chains. Notingthatchainscanbe identified from theshortestdistances,theplacement
of the chainson the blob sidescan be done bymatching the end-leaves of chainsthat
haveshortestdistance 4.
5.1. Leavesnot containedinblobs
Lemma5.4. Let N bealevel-2 networkon X where|X|≥ 3.Aleaf x isnotcontainedin
ablob ifandonlyif thereexistsauniquepartition Y ∪ Z of X − {x} suchthat Y,Z= ∅ and dm(y,z)= dm(x,y)+ dm(x,z)− 2 for all y∈ Y and z ∈ Z.
Proof. Supposefirstthataleaf x isnotcontainedinablob.Let pxdenotetheneighbour
of x,and let p,q denotethetwo neighboursof px thatisnot x.Observethateveryleaf
in X−{x} canbereachedfrom pxviaoneofthecut-edges pxp or pxq.Let Y and Z denote
thesetofallleavesthatcanbereachedfrom pxviathecut-edge pxp and pxq,respectively.
Note that a shortest path between some y ∈ Y and some z ∈ Z passes through the
edges pxp and pxq.Thenbyobservingthattheshortestpathfrom x to y andtheshortest
path from x to z uses the sameedges as theshortest path from y to z, barthe useof
theedge incidentto x twice,weobtaintheequation dm(y,z)= dm(x,y)+ dm(x,z)− 2
forall y∈ Y and z ∈ Z.
We now show that such apartition is unique.We claim that allleaves that canbe
reached from px viathe edge pxp mustbe contained inthe same set in the partition.
edge pxp, and suppose for acontradiction thatthey are placedin different sets of the
partition.Then,
dm(x, y1) + dm(x, y2)− 2 = dm(p, y1) + dm(p, y2) + 2
> dm(p, y1) + dm(p, y2)
≥ dm(y1, y2),
wherethefinalinequalityisthetriangleinequality.Hence y1and y2mustbecontainedin
thesamesetofthepartition;since y1and y2werechosenarbitrarily,allleavesthatcan
bereachedfrom px viatheedge pxp mustbecontainedinthesamesetinthepartition.
Similarly,allleavesthatcanbe reachedfrom px viatheedge pxq mustbecontainedin
thesamesetinthepartition.Observethatallleavesin X− {x} canbereachedfrom px
viatheedge pxp orviatheedge pxq.Sinceneithersetsofthepartitioncanbeempty,it
followsthenthatthepartitionmustbeunique,with Y and Z containingallleavesthat
canbe reachedfrom px via pxp and pxq, respectively.
Toprovetheother direction,weshowthatifaleaf x iscontained inablob B, then
thereisnosuchpartitionthatsatisfiesthegiven equation.Let px denotetheneighbour
of x.Wefirstshowthatforleaves y,z∈ X − {x},ifallshortestpathsbetween y and z
donotcontainthevertex px,thentheequationisnotsatisfiedby y and z.Let py and pz
denotetheverticeson B thatareclosesttotheleaves y and z respectively.Notethatit
ispossible to have py = pz –this is thecasewhere all shortestpaths between y and z
donotpassthrough B.Thenthefollowingequations hold:
dm(x, y) = 1 + dm(px, py) + dm(py, y) dm(x, z) = 1 + dm(px, pz) + dm(pz, z).
Wenowdistinguishtwocases.
1. If py= pz,thenbythetriangleinequalityandasallshortestpathsbetween y and z
donotcontainthevertex px,wemusthavethat
dm(py, pz) < dm(px, py) + dm(px, pz). (1) It followsthat dm(y, z) = dm(y, py) + dm(py, pz) + dm(pz, z) = (dm(x, y)− dm(px, py)− 1) + dm(py, pz) + (dm(x, z)− dm(px, pz)− 1) = dm(x, y) + dm(x, z)− 2 + dm(py, pz) − (dm(px, py) + dm(px, pz))
< dm(x, y) + dm(x, z)− 2,
wherethefinalinequalityfollowsfrom Inequality (1).
2. If py = pz,thenlet p denotetheneighbourof py thatisnotontheblob B.Then
dm(y, z)≤ dm(y, py) + dm(z, py)− 2dm(py, p)
= (dm(x, y)− dm(px, py)− 1) + (dm(x, z)− dm(px, py)− 1) − 2
= dm(x, y) + dm(x, z)− 2 − 2dm(px, py)− 2 < dm(x, y) + dm(x, z)− 2,
where thefirst inequalityfollows sincethe shortestpath between y and z may not
pass through p (e.g., if p is a vertex on a blob), and the final inequality follows
as dm(px,py)≥ 1 and dm(py,p)= 1.
It remainstoshowthatforany partition Y ∪ Z of X − {x} where Y,Z = ∅,thereexists
aleafpair y∈ Y and z ∈ Z suchthatnoshortestpathbetween y and z uses px.
SupposefirstthatB isalevel-1 blob.Sinceournetworkcontainsnoparalleledges, B
must be incident to at least two cut-edges in addition to the edge pxx. If two leaves
that can be reached from B via the same cut-edge are placed in different sets of the
partition, thenwe aredoneas noshortest pathbetween theseleaves uses px;therefore
wemayassumethatleavesthatcanbereachedfrom B viathesamecut-edgeareplaced
in thesame set inthe partition. Since Y and Z are both non-empty, there must exist
twocut-edges e1,e2(excluding pxx)whoseendpointsformanedgeof B,suchthatthere
exists aleaf thatcanbe reachedfrom B via e1 and aleafthatcanbe reached from B
via e2 forwhichthetwo leaveslieindifferent setsofthepartition.Everyshortestpath
betweenthesetwoleavespassesthroughtheedgeconnectingtheendpointsof e1 and e2
and thereforedoesnotuse px.Thereforeweare done.
Now supposethat B isalevel-2 blob.Forthesamereasonasinthelevel-1case(see
proof ofTheorem4.2), ifthere aretwoleavesthatcanbe reachedfrom B viathesame
cut-edgethatareplacedindifferentsetsofthepartition,thenwearedone;thereforewe
mayassumethatleavesthatcanbereachedfrom B viathesamecut-edgeareplacedin
the sameset inthepartition.Since Y and Z are both non-empty,itfollows thatthere
exist two cut-edges e1,e2 incident to B, such that leaves y,z can be reached from B
via e1,e2, respectively, for which y ∈ Y and z ∈ Z. There must exist a pair of such
cut-edges suchthatallshortestpathsbetweentheirendpoints on B donotcontain px,
sincethereexistenoughcut-edgestoensuretherearenoparalleledgesin B.Givensuch
apairofcut-edges, takeoneleafthatcanbe reachedfrom B viathefirstcut-edgeand
takeanother leafthatcanbe reachedfrom B viatheothercut-edge. Thenno shortest
path betweenthis pairofleavesuses px,andthuswearedone.
Lemma5.4doesnothold ingeneralfornetworksof levelhigherthan 2.Anexample
a a a a a a a a a a x y z
Fig. 5. Alevel-3 networkon X ={x,y,z} whereallofitsleavesarecontainedinablob.Y ={y} and Z = {z}
isapartitionof X− {x} suchthat Y,Z= ∅ and dm(y,z)= dm(x,y)+ dm(x,z)− 2 forall y∈ Y and z ∈ Z.
Observethatthisholdsin generalforlevel-k networkswhere k≥ 3 by replacingthe level-3 blobbyan arbitrarylevel-k blob.
Wenowshowthatafteridentifyingaleafthatisnotcontainedinablob,wecandelete
itfromthenetworkandadjustthedistancematrixaccordingly.Wealsoshowthatupon
reconstructingthereducednetworkfromthemodifieddistancematrix,thereisaunique
cut-edgeto which we mayreattach thedeletedleaf. Reattaching a leaf x to acut-edge
istheactionofsubdividingthecut-edgebyavertex px,andaddinganedge pxx.Inthe
settingofLemma5.4,wesaythattheuniquepartition Y ∪ Z isinduced bytheleaf x.
Lemma5.5. Let N bea level-2 networkon X where |X|≥ 3, and let x bealeaf that is
notcontained inablob. Let Y ∪ Z denotetheunique partitionof X= X− {x} that is inducedby x.Thenupondeletingtheleaf x,weobtainanetwork N on X suchthatthe multisetsof distancesfor N containstheelements
dN(y, z) =
dN(y, z) if y, z∈ Y or y, z ∈ Z
dN(y, z)− 1 if y ∈ Y, z ∈ Z or z ∈ Y, y ∈ Z.
Inaddition,thereisonly oneedgelocationin N where x canbereattached to,toobtain a network with the same multisets of distances as N . In particular, this network is isomorphic to N .
Proof. Let pxbetheneighbourof x in N ,andlet p and q betheotherneighboursof px
thatarenot x.AsshownintheproofofLemma5.4,thesets Y and Z correspondtothe
leaves thatcanbe reachedfrom px via pxp and via pxq, respectively.Upon deleting x
from N ,wenotethat pxbecomesavertexofdegree-2 andisthereforesuppressedinthe
resultingsubgraph. Thenallpaths in N thatused theedge pxp and theedge pxq have
theirlengthdecreasedby 1 in N;allpathsin N thatdidnotusetheedges pxp and pxq
are unaffected by this vertex suppression. Observethat any path between aleaf in Y
andaleafin Z usestheedges pxp,pxq in N .Furthermore,anypathbetweentwoleaves
in Y oranypathbetweentwoleavesin Z didnotusetheedges pxp,pxq in N .Therefore
themultisetsofdistancesof N canbe obtainedfromthemultisetsofdistancesof N as
Wenowprovethesecond statement,namelythat N containsonlyoneedgewhere x
canbereattachedto,soastoobtainanetworkwiththesamemultisetsofdistancesas N .
ByLemma5.4,we knowthat x is notinablob,andthat x induces apartition Y ∪ Z
of X. This implies that x must be reattached to N at a cut-edge that induces the
partition Y ∪ Z. Wenow show thatthere is only onesuch cut-edgein N ifwe are to
obtainanetworkwiththesamemultisetsofdistancesas N uponreattaching x.Ifthere
aretwocut-edges e1,e2in Nthatinducethesamerequiredpartition Y∪Z,observethat
anypathfrom e1to e2mustconsistonlyoflevel-2 blobsthatareincidenttoexactlytwo
cut-edges.Notethatlevel-1 blobscannotbeincludedhereasotherwisewewouldproduce
parallel edges.Now takeany leaf y∈ X − {x}, andlet N1 and N2 denotethenetworks
obtainedbyattaching x to e1and e2respectively.Becauseofthelevel-2 blobsbetween e1
and e2,wehavethat dNm1(x,y)= dmN2(x,y).Butweknowthatthere mustexistone
cut-edge e in N towhichwecanattach x toobtain N .Welocatethisedge e byfindingone
that inducesthe correctpartition andsatisfies theequation dNe
m(x,y)= dNm(x,y).This
proves theclaimthat x canbe addedbackto N viaauniqueedgetoobtainanetwork
with thesamemultisetsof distancesas N . Sincethereisauniqueedgewhere x canbe
attached toinorderto obtainanetworkwith thesamemultisetsofdistances as N ,the
network obtainedthis waymustbe isomorphicto N .
5.2. Pendantblobs
Fortheremainderofthissection,wewillrestricttolevel-2 networkswithatleasttwo
blobs andinwhichallleavesarecontainedinblobs.Wecandothis byObservation5.2
and Lemmas5.3,5.4,and5.5.
5.2.1. Pendant level-1blobs
Lemma5.6. Let N bealevel-2 networkon X.Achain (a1,. . . ,ak) with k≥ 2 iscontained in apendantlevel-1 blob ifandonly if d(a1,ak)={41,(k + 1)1}.
Proof. Supposefirstthatachain (a1,. . . ,ak) with k≥ 2 iscontainedinapendantlevel-1
blob B. As there is only one non-trivial cut-edgeincident to B, this chain is theonly
chainthatiscontainedin B.Itisthenclearthat,wemusthave d(a1,ak)={41,(k +1)1}.
Now suppose thatthere exists achain (a1,. . . ,ak) with k≥ 2 such that d(a1,ak)=
{41,(k + 1)1}. Clearly the distance k + 1 corresponds to the path between a
1 and ak
that passes through the neighbours of ai for i ∈ [k]. Therefore we examine the path
between a1 and ak thatdoes not pass throughthe neighbours of ai+1 for i ∈ [k − 2].
Notefirstthatthechaincannotbecontainedinanon-pendantlevel-1 blob,asotherwise
thispathbetween a1and akwouldpassthroughatleasttwoverticesthatareincidentto
non-trivialcut-edges.Inthiscase,thelengthofthepathbetween a1and ak wouldbeat
otherwisetheset d(a1,ak) wouldcontainat least 3 elements.Therefore thechainmust
becontainedinapendant level-1 blob.
Lemma 5.7. Let N be a level-2 network on X in which (a1,. . . ,ak) is a chain that is
containedinapendantlevel-1 blob.Let N bethenetworkon X= X∪{z}−{a1,. . . ,ak} obtained from N by replacing thependant blob by aleaf z. Forevery x∈ X− {z}, we can uniquely partition the multiset of distances dN(x,a
1) into two equal sized sets A
and B suchthat A− 2= B− (k + 1).Thenthemultisetsofdistancesof N containsthe elements
dN(x, y) =
dN(x, y) if x, y∈ X− {z}
A− 2 if y = z.
Proof. Wefirstprovetheclaim thatfor every x∈ X− {z},wecanuniquelypartition
themultiset ofdistances d(x,a1) into twoequalsized sets A and B such that A− 2=
B− (k + 1).Asusual,let pi denotetheneighboursof ai for i∈ [k],andlet q denotethe
neighbour of p1 thatis not a1 nor p2. Note that k ≥ 2 sinceotherwise there wouldbe
paralleledges.Let x∈ X.Thenanypathfrom x to a1consistsofapathfrom x to q anda
pathfrom q to a1.Therearetwopossiblepathsfrom q to a1:oneisoflength 2 andusesthe
edges qp1,p1a1;theotherisoflength k +1 andusestheedges qpk,pkpk−1,. . . ,p2p1,p1a1.
Therefore every path from x to q yieldstwo paths from x to a1, for which one of the
paths is longer than the other by a length of k− 1. This implies that the size of the
multiset d(x,a1) iseven,sinceeverypathfrom x to a1 canbematched toanotherpath
from x to a1thatsharesthesamepartofthepathbetween x and q.Nowtakethesmallest
element d∈ d(x,a1).Bytheargumentpresentedabove,theremustexistacorresponding
element d+ k− 1∈ d(x,a1).Weplace d inset A andweplace d+ k− 1 inset B,remove
both elements from d(x,a1) and recurse. By continuing this for the smallest element
in d(x,a1) at eachstep,this partitionsthemultisetinto abipartition d(x,a1)= A∪ B
where |A| = |B| = d(x,a1)/2, such that A+ (k− 1) = B. It follows from iteratively
adding the smallest element from d(x,a1) to A, that this bipartition is unique. This
provestheclaim.
Toprovethesecondpartofthelemma,firstobservethatanypathbetweenaleaf x∈
X− {z} and z inthenetwork N correspondstoapathbetween x and q in N .Nowthe
multisetof distances between x and q in N canbe obtained byfinding themultiset of
distances between x and a1 thatused theedges qp1,p1a1,and subtracting2from each
element.Thisispreciselytheset A− 2 thatwehavefoundabove.Foranyotherleaf y∈
X − {z}, we have that all paths between x and y are unaffected by the replacement
of the blob by z, as the blob is pendant in N . Therefore d(x,y) remains unchanged
for x,y∈ X− {z}.
Itisagaineasytoreconstructtheblobafterreconstructingthereducednetwork,since
a a a a a a a a a a a1 · · · ak f a a a a a a a a a a a a a a a a a a1 · · · ak b1 · · · b f
Fig. 6. Apendantlevel-2 bloboftheform (k,0,0,0) containingthechain (a1,. . . ,ak) (left)andapendant level-2 bloboftheform (k,,0,0) containingthechains (a1,. . . ,ak) and (b1,. . . ,b).Theedgeslabelled f denotethenon-trivialcut-edgesinbothnetworks.
5.2.2. Pendant level-2blobs
Weadoptthefollowingnotationforpendantlevel-2 blobs.Let B beapendantlevel-2
blob,andlet a,b,c,d denotethefourchainscontainedin B oflengths k,,m,n≥ 0 such
thatchains c and d areon thesameside ofB asthenon-trivialcut-edge. Thenwesay
that B isoftheform (k,,m,n).Foreaseofnotation,asidewithoutleavesisseenasa
length-0chain.SeeFig.6forpendantlevel-2 blobsoftheforms (k,0,0,0) and (k,,0,0).
Lemma 5.8. A level-2 network N containsa pendant level-2 blob of theform (k,0,0,0)
for k≥ 2 withthechain (a1,. . . ,ak) if andonlyif d(a1,ak)={51,61,(k + 1)1}.
Proof. Suppose first that N contains apendant level-2 blob B of theform (k,0,0,0).
Let e denotethenon-trivialcut-edgethatisincidentto B.Thenthepathfrom a1to ak
thatuses thesideof B without e andwithoutthechain, thesideof B with e, and the
side of B withthechainareof distances 5,6,and k + 1 respectively.
Supposenowthatthereexistsachain (a1,. . . ,ak) where k≥ 2 suchthat d(a1,ak)= {51,61,(k + 1)1}.First,since|d(a
1,ak)|> 2,wenotethatthechain (a1,. . . ,ak) mustbe
containedinalevel-2 blob.Consideralevel-2 blob B thatcontainsthechain (a1,. . . ,ak)
on oneof itssides, andsuppose thatthereis asinglenon-trivialcut-edge e onanother
oneof itssides. There mustbe at leastonesuch edge e because otherwisethere would
be parallel edges. Currently we have that d(a1,ak) = {51,61,(k + 1)1}: adding more
cut-edges (trivial or non-trivial) to the sides of B would change the set of distances.
Since B is incidenttoexactlyonenon-trivialcut-edge,itisalevel-2 pendantblob.
Lemma 5.9. A level-2 network N contains apendant level-2 blob of theform (1,0,0,0)
containing theleaf a if and only if dm(a,x) ≥ 6 for all x∈ X − {a} and for any two leaves y,z∈ X − {a}, dm(a,y)+ dm(a,z)− dm(y,z)≥ 8.
Proof. Suppose first that a pendant level-2 blob B contains only the leaf a. Let uv
denote thenon-trivialcut-edgeincident to B,where u is thevertex thatis on B.Now,
theshortestdistancefrom a to u isexactly 3.Furthermore,theshortestdistancefrom u
toaleaf x thatisnot a isatleast 3,sincesuchapathmustcontaintheedge uv,anedgeof
anotherblobsinceallleavesareassumedtobecontainedinblobs.Therefore dm(a,x)≥ 6
for all x ∈ X − {a}. To prove the second statement, let y,z ∈ X − {a}. Then by the
triangleinequality,wehave
dm(a, y) + dm(a, z)− dm(y, z) = dm(v, y) + dm(v, z)− dm(y, z) + 8≥ 8.
Now suppose that dm(a,x) ≥ 6 for all x ∈ X − {a} and for any two leaves y,z ∈
X− {a},wehave dm(a,y)+ dm(a,z)− dm(y,z)≥ 8.Thefirstconditionimpliesthat (a)
isamaximalchain.Supposefirstthat a wascontainedinalevel-1 blob B.Notethat B
cannot be pendant as otherwise thenetwork would have parallel edges. Let pa denote
theneighbourof a (avertex of B),andlet py,pzdenote thetwoneighboursof pa on B
thatarenot a.Thevertices py and pzarenecessarilyincidenttonon-trivialcut-edges,as
otherwise a wouldbecontainedinachain,inwhichcasethecondition dm(a,x)≥ 6 would
beviolated forsomeleaf x in thechain.Nowlet y and z denoteany leavesin X− {a}
thatcanbereachedfrom B viathecut-edgesincidentto pyand pzrespectively.Thenwe
havethat dm(a,y)+ dm(a,z)− dm(y,z)= 2 ifashortestpathbetween py and pzpasses
thevertex pa,andwehave dm(a,y)+ dm(a,z)−dm(y,z)= 3 otherwise.Thiscontradicts
our second condition, and therefore we may assume that the leaf a is contained in a
level-2 blob B.Suppose that B is a non-pendant blob,in other words, thatthere are
atleasttwonon-trivialcut-edgesincidentto B.Taketwo non-trivialcut-edgesthatare
closest to a, and take any two leaves y and z that can be reached from B via these
cut-edges. Theshortestdistance from a to theendpoints of these cut-edges on B isat
most 3. Therefore we have dm(a,y)+ dm(a,z)− dm(y,z) ≤ 6, which contradicts our
second condition. Therefore we mayassume that the leaf a is contained in apendant
level-2 blob B.Butasidefromtheleaf a andthesinglenon-trivialcut-edge,noother
cut-edgescanbe incidentto B. Indeed,having anotherleafthatis containedin B violates
thefirstcondition, andhaving anothernon-trivialcut-edgecontradicts thefactthat B
waspendant.Therefore B isapendantlevel-2 bloboftheform (1,0,0,0) thatcontains
asingleleaf a.
Lemma 5.10.Let N be a level-2 network on X containing a pendant level-2 blob of
the form (k,0,0,0) for k ≥ 1 with the chain (a1,. . . ,ak). Then we can replace the pendant blob by a leaf z to obtain a network N on X = X∪ {z}− {a1,. . . ,ak}. For every x∈ X− {z},wecanuniquelypartitionthemultisetofdistances d(x,a1) intofour
equalsizedsets A,B,C,D suchthat A− 3= B− 4= C− (k + 2)= D− (k + 3). Then themultisetsof distancesof N containstheelements
dN(x, y) =
dN(x, y) if x, y∈ X− {z}
A− 3 if y = z.
Proof. We first show thatthe partition of d(x,a1) exists and that it is unique. Let B
thatisanendpointofanon-trivialcut-edge.Let x∈ X− {z}.Everypathfrom x to a1
consists ofa path from x to q and apath from q to a1. There are four possible paths
from q to a1oflengths 3,4,k + 2,and k + 3.Byananalogousargumentusedintheproof
ofLemma5.7,thereisauniquepartitionof d(x,a1) intofourequalsizedsets A,B,C,D
suchthat A− 3= B− 4= C− (k + 2)= D− (k + 3).
Uponreplacingthependantblob B byaleaf z,wenotethatthemultisetofdistances
between a leaf x ∈ X − {z} and z in N is equivalent to the multiset of distances
between x and q in N .Thismultisetofdistancesispreciselytheset A−3.Let y∈ X−{z}
be anotherleafthatisnot x.Thenallpathsbetween x and y in N areunaffectedafter
replacing B by aleaf z;therefore dN(x,y)= dN(x,y). Pendant level-2 blobs withatleasttwochains
Lemma 5.11.A level-2 network N on X contains a pendant level-2 blob of the
form (k,,0,0) with chains a = (a1,. . . ,ak) and b = (b1,. . . ,b) with k, ≥ 1 if and only if a and b areadjacenttwice,andforall c∈ a∪ b,wehave dm(c,x)≥ 6 forall x∈ X− (a∪ b) and dm(c,y)+ dm(c,z)− dm(y,z)≥ 8 for anytwoleaves y,z∈ X − (a∪ b).
Proof. Onedirectionfollowsananalogousargument usedintheproofof Lemma5.9.
Toshowtheotherdirection,supposethat a and b areadjacenttwice,andforall c∈ a∪
b,wehave dm(c,x)≥ 6 forall x∈ X−(a∪b) and dm(c,y)+dm(c,z)−dm(y,z)≥ 8 forany
twoleaves y,z∈ X −(a∪b).Since a and b areadjacenttwice,either a and b arecontained
inthesamelevel-1 blobsuchthatthecycleoftheblobis up1p2. . . pkvq1q2. . . qu where pi
and qj denote theneighboursof ai and bj for i∈ [k],j ∈ [], respectively,and u and v
areincidenttonon-trivialcut-edges,or a and b arecontainedinthesamelevel-2 blob B
in which a and b are on two different sides of B and there are no other vertices that
subdividethesetwo sidesof B (seeFig.7).
In the first case,let B denote the level-1 blob. We takeleaves y and z that canbe
reached from B via the two non-trivial cut-edges. Without loss of generality, assume
that k ≤ . Thentheshortestpathfrom y to z mustpassthroughtheneighboursof ai
forall i∈ [k].Butthen forany c∈ a,wehavethat
dm(c, y) + dm(c, z)− dm(y, z) = 2,
whichcontradicts ouroriginal assumption.
In thesecondcase, let B denotethelevel-2 bloband let e denotethesideof B that
doesnotcontain a nor b.Sincethenetworkcontainsatleasttwoblobs, theside e must
be incident to at least onenon-trivialcut-edge. Supposefor acontradiction thatthere
areatleasttwocut-edgesincidenttotheside e.Let p and q denotetheverticesonside e
such thatif k ≥ 2 then they haveshortest distance 3 and 4 from a1, respectively,and
if k = 1 then they haveshortest distance 3 and at most 4 from a1, respectively. Note
a a a a a a a a a a a a a a a a a a a1 · · · ak b1 · · · b y z a a a a a a a a a a a a a a a a a a a a a1 · ·· ak b1 · · · b y z
Fig. 7. Thetwopossibilitiesforwhentwochains a= (a1,. . . ,ak) and b= (b1,. . . ,b) areadjacenttwiceand theyarenotcontainedinapendantlevel-2 blob,asintheproofofLemma5.11.Alevel-1 blob(left)and anon-pendantlevel-2 blob(right).Thedashededgesinbothnetworksrepresentpathsthatarenottrivial cut-edgesfromtheblobtotheleaves y and z.Inthenon-pendantlevel-2 blob,therecouldbeadditional cut-edgesonthesidenotcontainingthechains a and b.
wouldcontradictourassumptionthatforanyleaf x∈ X −(a∪b),wehave dm(a1,x)≥ 6.
Let y and z denote leaves thatcanbe reachedfrom B via thecut-edges incident to p
and q,respectively.Then
dm(a1, y) + dm(a1, z)− dm(y, z)≤ 3 + dm(p, y) + 4 + dm(q, z)− dm(y, z)
= 7− dm(p, q) ≤ 6,
wherethefinalinequalityfollowsas dm(p,q)> 0.Thisisacontradiction.Thereforethere
isexactlyonecut-edgethatisincidentto theside e, fromwhichitfollows that a and b
aretheonlychainscontainedinapendantlevel-2 bloboftheform (k,,0,0).
Lemma5.12. Let N bealevel-2 networkon X thatcontainsapendantlevel-2 blobofthe
form (k,,0,0) with chains a= (a1,. . . ,ak) and b = (b1,. . . ,b). Then we can replace the pendant blob by a leaf z to obtain a network N on X = X∪ {z}− (a∪ b). For every x∈ X,wecanuniquelypartition themultisetofdistances d(x,a1) into fourequal
sized sets A,B,C,D such that A− 3= B− (+ 4)= C− (k + 2) = D− (k + + 3).
Thenthemultisetsof distancesof N containstheelements
dN(x, y) =
dN(x, y) if x, y∈ X− {z}
A− 3 if y = z.
Table 1
Thenumberof greenedgesbetween two adjacentchains a = (a1,. . . ,ak) and b= (b1,. . . ,b) fordifferent k and values.
= 1 = 2 > 2
k = 1 mA(5) mA+B(5)− 1 mA+B(5)
k = 2 mA+C(5)− 1 mA+B+C+D(5)− 2 mA+B+C+D(5)− 1
k > 2 mA+C(5) mA+B+C+D(5)− 1 mA+B+C+D(5)
Chain-Adjacency Graphs We have now dealt with pendant level-2 blobs of the
forms (k,0,0,0) (Lemmas5.8 and5.9)and (k,,0,0) (Lemma5.11).Fortheremaining
fourcases(ignoringsymmetriccases)lefttoexamine,(k,0,m,0);(k,0,m,n);(k,,m,0);
and (k,,m,n),weemploythefollowinggraph.
Definition5.13.Achain-adjacencygraph (CAG)hasavertexforeachchain,andbetween
two vertices,
• weinsertarededge ifthechainsare adjacentonceand twored edgesifthechains
areadjacenttwice;and
• if the two chains are adjacent once, we inserta green edge for each length-5 path
betweenendpoints ofthechains(oneper chain)thatdoesnotcontainanyedges of
thetwochains.
TheconditionforjoiningtwoverticesontheCAGviaagreenedgecanindeedbe
ver-ifiedfromthemultisetsofdistances.Let a= (a1,. . . ,ak) and b= (b1,. . . ,b) denotetwo
chainsthatareadjacentonce,andsupposewithoutlossofgeneralitythat dm(a1,b1)= 4.
To countthe numberofgreenedges between a and b,we fallinto the 9 casesshownin
Table 1. This number is obtained by taking the multiplicity of 5’s in the multiset of
distances between apair of endpoints, minus the number of length-5 paths that pass
through edges of the chains.Let (A,mA) = d(a1,b1); (B,mB)= d(a1,b); (C,mC) = d(ak,b1); (D,mD)= d(ak,b).
We only insert green edges between chains that are adjacent, rather than between
all chains thatare distance-5 apart, to ensure thatchains contained indifferent blobs
are not connected in the CAG. Since we mayassume thatall leaves are contained in
blobs, wenotethattwochainsareadjacentandinthesameblobifandonlyiftheyare
connected by ared edge in theCAG. Note thatthere maybe multiple edges between
twoverticesinaCAG(seeFig.8).WenowshowhowwecanusetheCAGtodistinguish
theconfigurationsofpendantblobs fromnon-pendantblobs,andhowitcanbeusedto
distinguishtheremaininglevel-2 pendantblob structures.
ObservethateveryedgeintheCAGcorrespondstoadistinctdistance-4 ordistance-5
pathbetweenapairofchainendpoints.Wesaythatthispathinthenetworkiscovered
by the edge of the CAG. In particular, we also say that the edges of the path of the
a a a a a a a a c f a a a c a a a a a a a a a a b c f a a a a b c (a) (k, 0, m, 0). (b) (k, , m, 0). a a a a a a a a a a c d f a a a a c d a a a a a a a a a a a a b c d f a a a a a b c d (c) (k, 0, m, n). (d) (k, , m, n).
Fig. 8. Eachsubfigureshowsapendantlevel-2 blobtogetherwithitsCAGdirectlybelowit.Oneachblob, f denotesthenon-trivialcut-edge.Eachoftheleaves a,b,c,d canbereplacedbyalongerchainwhilstkeeping thesameCAG.ByTheorem5.14,wehavethatthenetworkcontainsoneofthefourpendantblobsifand onlyiftheCAG(whichcan beobtainedfromthemultisets ofdistances)is exactlytheone inthesame subfigure.In the CAG,thedashedlines representthe rededges andthesolidlines representthe green edges.In(c),thegreenedge cd intheCAGcoversthedottedpathbetween c and d.(Forinterpretationof thecoloursinthefigure,thereaderisreferredtothewebversionofthisarticle.)
coveredbymorethanoneedgeoftheCAG.SeeFig.8(c)foranexampleofadistance-5
Theorem 5.14.(See Fig. 8.) Let N be a level-2 network on X with at least two blobs, where no pendantblobs areof theform (k,0,0,0) and (k,,0,0) in whichallleavesare contained inblobs.For k,,m,n≥ 1, N contains apendantlevel-2 blob oftheform
• (k,0,m,0) if andonly if there existvertices a and c which forma blob inthe CAG
with 1 rededgeand 2 green edgesbetweenthem.
• (k,,m,0) ifandonlyifthereexistvertices a,b,and c whichformablobintheCAG, where a and b are connected by 2 red edges and the othertwo pairs are connected by 1 rededgeand 1 green edge.
• (k,0,m,n) if and only if there exist vertices a,c, and d which form a blob in the
CAG, whereevery pairofvertices areconnectedby 1 rededgeand 1 greenedge.
• (k,,m,n) if andonly if there exist vertices a,b,c, and d which form ablob in the
CAG, where every pair of vertices are connected by 1 red edge, and a and b are connectedby anadditionalred edge.
Proof. All other possible pendant level-2 blobs are of the form (k,0,0,0) or of the
form (k,,0,0). The CAG of the blob of the form (k,0,0,0) is the singleton graph;
theCAGofthebloboftheform (k,,0,0) istwoverticesconnectedby 2 rededges.The
CAG foreitherof these two pendant blobs is notthe sameas any of theCAG for the
four pendant blobsthatwe investigatehere.Thereforewemaydistinguish theCAG of
thependant level-2 blobsfromoneanother.
Nowweconsider non-pendantlevel-2 blobs.First,iftheblobcontainsnoleavesthen
theCAGofsuchablobisempty,sowearedone.Hence,supposethatsomenon-pendant
level-2 blob B containssomeleaves.Observethat B canbeobtainedbyintroducing
non-trivialcut-edgestooneofthesixpossiblelevel-2 pendantblobs.
Supposefirstthat B canbeobtainedbyintroducingnon-trivialcut-edgestoapendant
bloboftheform (k,0,0,0).Then, B containsoneormorechainsononesideoftheblob,
and the possible CAGs would be a path (or disjoint paths) of red edges that connect
adjacent chains,orifitcontainsagreenedge, twovertices thatareconnectedby 1 red
and 1 greenedge.However,noneoftheseCAGscorrespond tothatofthefourpendant
blobs weconsiderhere.
Nowsupposethat B canbeobtainedbyintroducingnon-trivialcut-edgestoapendant
bloboftheform (k,,0,0).Then, B containsoneormorechainsontwosidesoftheblob,
and at least onenon-trivialcut-edge on the thirdside. None of theedges inthe CAG
of B will cover anedge ofthis thirdside, sinceallpaths betweenchainendpoints that
uses thissidewill be oflengthat least 6.Thereforetheonlypossible CAGswe canget
on B is a cycle or a path (or paths) of red edges, or two vertices connected by 1 red
and 1 greenedge.
Suppose now that B can be obtainedby introducingnon-trivial cut-edgesto oneof
thefourremaininglevel-2 pendantblobs.Uponintroducingnon-trivialcut-edgesto the