• Nie Znaleziono Wyników

On cherry-picking and network containment

N/A
N/A
Protected

Academic year: 2021

Share "On cherry-picking and network containment"

Copied!
31
0
0

Pełen tekst

(1)

On cherry-picking and network containment

Janssen, Remie; Murakami, Yukihiro

DOI

10.1016/j.tcs.2020.12.031

Publication date

2021

Document Version

Final published version

Published in

Theoretical Computer Science

Citation (APA)

Janssen, R., & Murakami, Y. (2021). On cherry-picking and network containment. Theoretical Computer

Science, 856, 121-150. https://doi.org/10.1016/j.tcs.2020.12.031

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

Contents lists available atScienceDirect

Theoretical

Computer

Science

www.elsevier.com/locate/tcs

On

cherry-picking

and

network

containment

Remie Janssen

1

,

Yukihiro Murakami

,

1

DelftInstituteofAppliedMathematics,DelftUniversityofTechnology,VanMourikBroekmanweg6,2628XE,Delft,theNetherlands

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory: Received8May2020

Receivedinrevisedform5October2020 Accepted15December2020

Availableonline4January2021 CommunicatedbyT.Calamoneri Keywords: Phylogeneticnetworks Networkcontainment Cherry-pickingsequences Treecontainment Linear-timealgorithms Cherry-pickingnetworks

Phylogenetic networks are used to represent evolutionary scenarios inbiology and lin-guistics. To findthe mostprobablescenario,it may benecessaryto comparecandidate networks.Inparticular,oneneedstodistinguishdifferentnetworksanddeterminewhether one network is contained in another. In this paper, we introduce cherry-picking net-works,aclassofnetworksthatcanbereducedbyaso-calledcherry-pickingsequence.We thenshowhowtocomparesuchnetworksusingtheirsequences.Wecharacterize recon-structiblecherry-pickingnetworks,whicharethenetworksthatareuniquelydetermined bythesequencesthatreducethem,makingthemdistinguishable.Furthermore,weshow thatacherry-pickingnetworkiscontainedinanothercherrypickingnetworkifasequence forthelatternetworkreducestheformernetwork,providedbothnetworkscanbe recon-structedfromtheirsequencesinasimilarway(i.e.,theyareinthesamereconstructible class).Lastly,weshowthattheconverseoftheabovestatementholdsfortree-child net-works,therebyshowingthat NetworkContainment,theproblemofcheckingwhethera networkiscontainedinanother,canbesolvedbycomputingcherrypickingsequencesin lineartimefortree-childnetworks.

©2020TheAuthor(s).PublishedbyElsevierB.V.Thisisanopenaccessarticleunderthe CCBYlicense(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Inthestudyofevolutionaryhistoriesof(biological)speciesandlanguages,directedgraphsareusedtorepresentdifferent evolutionary scenarios.Thesegraphs are mostoftenassumedtobe phylogenetictrees,that is,rooted treeswhose leaves arelabeledwiththestudiedsetofspecies.Itisincreasinglycommontochangethisassumptionandallowfor(undirected) cycles inthescenarios aswell. Thesedirected graphs, calledphylogeneticnetworks, thenallow forthe representationof reticulateevolutionaryeventssuchashybridizationandhorizontalgenetransfer[20,1].

Reconstructionofevolutionaryhistoriesismostoftenbasedongeneticdata(i.e.,DNAsequences).Althoughthe evolu-tionaryhistoryofasetofspeciesmaybereticulate,smallstretchesofDNA(e.g.,piecesofDNAcodingforproteindomains) stillevolvemostlytree-like.Thenetworkrepresentingthespecies’evolutionmustthencontainthetreesforsuchpiecesof DNA.This leadsto thefollowingmathematicalproblem. Foragivennetwork N anda treeT onthe samesetofspecies, decidewhetherN contains T .

Thisproblem,called TreeContainment,isNP-completeforgeneralrootedphylogeneticnetworks[18].Manyhave over-come thiscomputational challenge by considering inputsoftopologically restrictednetworks. It was showninitially that

ThisisanextendedversionoftheconferencepaperJanssenandMurakami [16].

*

Correspondingauthor.

E-mailaddresses:R.Janssen-2@tudelft.nl(R. Janssen),Y.Murakami@tudelft.nl(Y. Murakami).

1 ResearchfundedbytheNetherlandsOrganizationforScientificResearch(NWO),withtheVidigrant639.072.602.

https://doi.org/10.1016/j.tcs.2020.12.031

0304-3975/©2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).

(3)

TreeContainmentcan besolved inpolynomialtime fornon-binarynormalnetworks,binary tree-childnetworks,and bi-nary level-k networks[13].Stronger results havebeen proven forgenetically stable networks(quadratic time:Gambette et al. [8]),binary nearly-stablenetworks(lineartime:Gambetteet al. [9]),andforbinarytree-childnetworks(lineartime: Gunawan [10], Gambetteet al. [9]).

From a biological and a computational perspective, there is no reasonto restrict to inputs ofa tree anda network. Indeed,whilesmallstretchesofDNAevolve tree-like,itispossibleforlargerpartsofthegenometoevolveasanetwork. Insuch instances,itisofgreatinteresttoconsideramoregeneralversionof TreeContainmentintroduced in[16] called NetworkContainment:Forgivennetworks N and Nonthesamesetofspecies,decidewhether N contains N.

Computationally, it is naturalto wonder whetherwe can also solve NetworkContainment efficientlyin all network classes where TreeContainment can be solved efficiently. This question was first posed in the conference paper [16], of which thispaper isan extended version. In that conference paper, thefocus was restricted to semi-binary tree-child networksandallproofswere omitted.Inthispaper,we doincludeallproofs, andwebroadenthe scopetoinclude non-binarynetworksintheclassofso-calledcherry-pickingnetworks,whichcontainstheclassoftree-childnetworks.

Likeinourconferencepaper,westudythe NetworkContainmentprobleminthelightofcherry-pickingsequences.These sequences weredevelopedtotacklevariationsonthe Hybridization problem,offindinga“simple” networkthatcontains agivensetoftrees [12,5,19,3].Twoleavesofatreeformacherry iftheyshareacommonparent—bysuccessively‘picking’ cherries(removingoneoftheleavesinacherry)fromthesetofinputtrees,weobtainasequenceofcherriesthatultimately reduceeachinputtreetoatreeonasingleleaf.Thissequenceofcherriesthencorrespondstosomenetworkthatcontains thesetofallinputtrees.

Ascherry-pickingsequenceswereintroducedtostudy Hybridization,actionsofcherry-pickingsequencesweredefined onlyontrees,andnotonnetworks.Inthispaper,wefillthisgapbydefiningtheactionofacherry-pickingsequenceona (non-binary)network.Thisleadsto theintroductionoftheclass ofcherry-pickingnetworks:networksthatcan bereduced toa singleleafby acherry-pickingsequence.Webriefly mentionedthisclassinthediscussionofthepreviousversion of thispaper[15]. Theclassforbinary networkswasalso introducedindependentlyby Erd ˝oset al. [6] (whereitwas called theclass oforchardnetworks).One ofthemain resultsofthispaper,thatcherry-pickingnetworksmaybereducedinany order(Theorem1),wasalsoshownbyErd ˝oset al. [6].Thefocusoftheirpaperandthispaperisratherdifferent(theyfocus onreconstructingnetworksfromso-calledancestralprofiles),andthereforebothpapershaveindependentlycomeacrossthe samenetworkclassviadifferentapproaches.

We investigatethe correspondencebetween cherry-picking networks and the sequences that reduce them, and ulti-mately show that within a particular reconstructible class, cherry-picking networks are characterized uniquely by their

smallest cherry-picking sequences (Theorem 2). These reconstructible classes are based on several constructionsused to obtain anetworkfromacherry-pickingsequence.AlthoughLinzandSemple [19] andDöckeret al. [5] mentionhowone can construct some network from a cherry-picking sequence, no further characterizations for the type of networksthat canbeobtainedfromsuchsequenceshavebeeninvestigated.Here,wereintroducetheseconstructionsasawaytoreverse thereduction.Asmultipledifferentnetworkscanbereducedbythesamecherry-pickingsequence,thisreversalcannotbe unique. Hence, we investigateseveral constructionscorresponding to the differentreversals ofthe reductions. Each con-struction then leads to a reconstructible class of networks, for which we can prove relations between containment and reductionbycherry-pickingsequences.

Theserelationsdependonacarefuldefinitionofwhatitmeansforanetworktobeasubnetwork ofandtobecontained

inanothernetwork. Roughlyspeaking,a network N isasubnetwork ofanothernetwork N if Ncan beobtainedfrom N bydeletingreticulationedgesandsuppressingdegree-2 vertices.Anetwork Niscontained inanothernetwork N if Ncan beobtainedfrom N bydeletingreticulationedges,suppressingdegree-2 vertices,andbycontractingedges.Weshowthat within particularclassesofcherry-pickingnetworks, ifasequence foranetwork N reducesanothernetwork N, then N iscontainedin N (Lemma15).Unfortunatelytheconversedoesnothold(Theorem4),unlessthetwoinput networksare tree-child.

It turns out that the class oftree-child networks is containedin the class ofcherry-picking networks, aseach tree-childnetworkhasaspecialtypeofcherry-pickingsequence—atree-childsequence—thatreducesit.Weexaminehowthese sequences can be usedto solve NetworkContainment fortree-childnetworks. Inparticular, within some cherry-picking network classes,we show thata tree-childsequenceforatree-childnetwork N reducesanother tree-childnetwork N if andonlyif Niscontainedin N (Theorem5).Followingthis,weprovidealinear-timealgorithmfor NetworkContainment forinputsoftree-childnetworks(Algorithm6).

Structureofthepaper InSection 2,we recallall relevantdefinitions andoutline howto constructnetworksfrom cherry-picking sequences. InSection 3, we investigatepropertiesof cherry-pickingnetworks, andshow that the orderin which cherriesarepickeddoesnotmatter(Theorem1).Furthermore,weshowthatnetworksareuniqueuptoaparticularminimal cherry-pickingsequencethatreducesthem,givenanorderonthesetofspecies.InSection 4,we showthatifasequence foranetwork N reducesanothernetwork N,then Niscontainedin N (Lemma14).Wealsogive acounter-examplefor whytheconversedoesnothold(Theorem4).InSection5,we restrictourattentiontotree-childnetworks.Weshowthat a sequence for a tree-childnetwork N reduces another network N ifand only if N is contained in N (Theorem 5). In Section 6,weusethischaracterizationin analgorithm for NetworkContainmentfortree-childnetworks, andshowthat its runningtime islinear. Wealso showthat, by defining anordering onthe leaves, itis possibletocheck whethertwo

(4)

cherry-picking networksare isomorphicin polynomial time (Theorem8). InSection 7,we conclude withopen problems andfuturedirectionsfortheuseofcherry-pickingstrategies.

2. Preliminaries

Definition1.A phylogenetic network N on a non-empty taxa set X is a directed acyclic multigraph with one root (the

outdegree-1 source),a set L

(

N

)

ofleaves (indegree-1sinks)bijectively labeled with X , andallother nodesareeithertree nodes (indegree-1,outdegreeatleast2)orreticulations (indegreeatleast2,outdegree-1).

Aphylogeneticnetworkissemi-binary if eachtreenode hasoutdegree2,anditisbinary ifit issemi-binaryandeach reticulationhasindegree2.Aphylogenetictree isaphylogeneticnetworkwithnoreticulations.

Notethatphylogeneticnetworksaregenerallydefinedassimpledirectedacyclicgraphstoavoidparalleledges.Weallow forparalleledgesinsomeofournetworkslateroninthepaper,sowedefinethemasmultigraphs.

Inthe restofthispaper, wedropthe‘phylogenetic’ termaseach networkin thispaperis aphylogeneticnetwork.To maketheassumptionsonthedegreesofthenetworknodesclear,wealwaysmentioninthestatementofaclaimwhether anetworkhastobebinary,semi-binary,ortherearenoassumptions onthedegrees.Inthelastcase,wecallthenetwork

non-binary even though it may be semi-binary or even binary. The following definition gives usa wayof relating two networksthatarenotofthesamenature.

Definition2.LetN and N be non-binarynetworkson X . Then N isa refinement of N if N canbe obtainedfromN by

contractingsomeedges.

The counterpart to contracting edges is refiningvertices (theterm splittingvertices is more common in graph theory, however,weuserefining tostayinlinewiththerefinementdefinitionfornetworks).Let N beanetwork,andlet w bea treevertexofdegreeatleast4 in N.Werefine w byreplacingitbyanedge uv wheretheparentof w in N isaparentof u inthenewnetwork,someoftheoutgoingedgesof w in N areincidentto u,andtherestoftheoutgoingedgesof w are incidentto v.Similarly,if r isareticulationofdegreeatleast 4 in N,werefine r byreplacingitbyanedge uv wherethe childof r in N isachildof v inthenewnetwork,someoftheincomingedgesof r in N areincidentto u,andtherestof theincomingedgesof r areincidentto v.Observethatrefiningatreevertexreturnsanetworkwithoneextratreevertex; refiningareticulationreturnsanetworkwithoneextrareticulation.Contractingthenewlyintroducededgeuponrefininga vertexreturnstheoriginalnetwork.

An edgefeedingintoa reticulationis calleda reticulationedge.Givenan edge uv in N,we saythat u isa parent of v

andthat v isachild of u.Thenode u isabove v,or v isbelow u ifthereisadirectedpathfrom u to v in N.Wealsocall u and v thetail andhead oftheedge uv,respectively.Wesaythat u isalowestcommonancestor(LCA) oftwovertices x and y if u

∈ {

/

x

,

y

}

,u isabove x and y,andnootherverticesbelow u hasthisproperty.Withintrees,theLCAofanytwovertices isunique;thispropertydoesnotholdinnetworks.

Wesaythatanedgeisbinary iftheheadoftheedgeisadegree3vertex.Wecallanedge uv anrr-edge if u and v are

both reticulations,atr-edge if u isa treevertexand v areticulation, anrt-edge if u is areticulationand v a tree-vertex, and a tt-edge if u and v are both tree-vertices. We call a directed path u1u2

. . .

un an rr-path if ui is a reticulation for all i

∈ [

n

]

= {

1

,

. . . ,

n

}

, an rtr-path if u1 and un arereticulations and ui is atree vertexfor atleastone 2

i

n

1,a

trt-path if u1and un aretreeverticesand uiisareticulationforatleastone 2

i

n

1,andatt-path if uiisatreevertex forall i

∈ [

n

]

.

Anetwork N isstack-free if N containsnorr-edges.Anetwork N istree-child ifitisstack-freeandeverytreenodein N isaparentofatreenodeoraleaf.Thispropertyinherentlyimpliesthateveryvertexinatree-childnetworkhasapathto aleafconsistingonlyoftree-vertices.Thereticulationnumber ofanetworkisthetotalnumberofreticulationedgesminus thetotalnumberofreticulations.

Finally,weintroducetheideaofaddingverticestoanetwork.Let x bealeafinanetworkwithaparentvertex p.Adding avertex q directlyabove x istheactionofdeletingtheedge px andaddingtwoedges pq and qx.Uponaddingavertextoa network,thegraphisnolongeranetworksincethereisavertexofindegree-1andoutdegree-1.Wenoteherethatadding a vertextoa networkisa technicaloperationthat isalways succeeded byanother graphoperation,inwhich weadd an edgeincident totherecentlyaddedindegree-1andoutdegree-1vertex.Whiletheintermediate graphsaresometimesnot networks,thisensuresthattheresultinggraphobtainedfromourgraphoperationsisanetwork.

2.1. Cherry-pickingsequences

Inthissubsection,we introducecherry-pickingsequencesandtheir actiononnetworks. Thisstarts withdefinitions of specific structures within the networkscalledcherries and reticulated cherries.We define what it means toreduce such

structuresfromnetworks,andshowthatreversingsuchreductions—calledadding pairstonetworks—canbedoneinmany ways. Weshow thattheseadditionscan beappliedto somesequenceoforderedpairs ofleavesto obtainanetwork.We impose conditionsonthesequences toensurethat theseadditions arewell-defined,and, indoingso, weformally define cherry-pickingsequencesandcherry-pickingnetworks.SeeFig.2foranillustrationofthetermsdefinedinthissubsection.

(5)

2.1.1. Reduciblepairs

Definition3.Let

(

x

,

y

)

be anordered pairofleavesinanon-binarynetwork N,andlet px

,

py denotethe parentsofx

,

y respectively.Wecall

(

x

,

y

)

acherry if px

=

py,thatis,ifx and y shareacommonparent.Wecall

(

x

,

y

)

areticulatedcherry ifpx isareticulation, py isatreevertex,and pyisaparentof px.If

(

x

,

y

)

isacherryorareticulatedcherryin N,wesay

(

x

,

y

)

isareduciblepair.

Wemayreduce cherriesandreticulatedcherriesfromanetworktoobtainanetworkofsmallersize.

Definition4.LetN beanetworkandlet

(

x

,

y

)

beanorderedpairofleaves.Reducing

(

x

,

y

)

inN istheactionof

deletingx andsuppressingdegree-2 nodesinN if

(

x

,

y

)

isacherryinN;

deletingthereticulationedgebetweentheparentsof x and y andsubsequentlysuppressingdegree-2 nodes,if

(

x

,

y

)

is areticulatedcherryin N;

doingnothingto N otherwise.

Inallcases,theresultingnetworkisdenoted N

(

x

,

y

)

.Wesometimesrefertothisaspickingacherryorpair

(

x

,

y

)

from N.We saythatanorderedpair

(

x

,

y

)

affects N if N

=

N

(

x

,

y

)

.

Givenanetwork N andasequenceoforderedpairs S,wedenoteby N S thenetworkobtainedbyrepeatedlyreducing N witheach elementof S inorder.Wesaythat S reduces N if N S isanetworkwithasingleleaf(foranyleafin N),aroot, andnoothervertices.Inparticular,wecallthesenetworkssingle-leafnetworks.Wesaythatasequenceoforderedpairs S

affects N ifeveryelementof S affects N whenappliedsuccessively.

2.1.2. Addingpairstonetworks

As eachreduction makesa simplechangeto anetwork,it isnaturaltoattemptto reversethischange.Suchreversals can be done by addinga leaf to obtain a newcherry inthe network, orby adding a reticulationedge to create a new reticulatedcherry.Ifthereductioninvolvedthepair

(

x

,

y

)

,thenwecallthereverseactionadding

(

x

,

y

)

tothenetwork.Since weallowfornon-binarynetworks,itispossibletoreducereticulatedcherrieswithamulti-reticulation (a reticulationwith indegreeatleast3).Becauseofthis,theremaynotbeauniquewaytoaddthereticulationedgeback:wehavetheoption ofchoosinganexistingreticulationvertexoranewlycreatedreticulationvertexastheheadofthisreticulationedge.

Asimilarobservationcanbemadefortreenodes.Justlikemulti-reticulations,reductionsmaypickcherriesorreticulated cherriesthatcontainmultifurcations (treenodesofoutdegreemorethan 2).Here,wehavetheoptionofchoosinganexisting treevertexoranewlycreatedtreevertexasthetailoftheinsertededge.

With thisinmind, there are 6 waysofadding

(

x

,

y

)

to a network: 2 waysofadding cherries and 4 waysof adding reticulatedcherries.

Definition5.Let N be a non-binarynetwork witha reduciblepair

(

x

,

y

)

. Let px and py denote theparents of x and y in N

(

x

,

y

)

,respectively(notethat x and px maynotbenodesin N

(

x

,

y

)

if

(

x

,

y

)

isacherryin N).Thenwemayadd

(

x

,

y

)

to N

(

x

,

y

)

toobtain N byusingoneofthefollowingsixconstructions (seeFig.1):

1. If x isnotaleafin N

(

x

,

y

)

(i.e.,if

(

x

,

y

)

isacherryin N),thenaddalabeled node x,addanode q directlyabove y,and addanedge qx.

(a) Donotcontractanyedges;or

(b) If py isatreenode,thencontract pyq.

2. If x isaleaf in N

(

x

,

y

)

(i.e.,if

(

x

,

y

)

isa reticulatedcherryin N), thenaddnodes p

,

q directlyabove x

,

y,respectively, andaddanedge qp.

(a) Donotcontractanyedges;

(b) If px isareticulation,thencontract pxp;

(c) If py isatreevertex,thencontract pyq;or

(d) If px isareticulation,contract pxp;and,if pyisatreevertex,contract pyq.

Since all tree vertices have indegree-1 and all reticulations have outdegree-1, there are no other ways of adding a reduciblepairto anetworkotherthanthesixwaysmentionedabove.Notethattheconstructions 1b, 2b, 2c, and 2dmay onlybeusedifthe‘if’conditionsaresatisfied.Alsonotethattheaboveactionsareonlywell-definedif y isaleafin N

(

x

,

y

)

. InthesettingofDefinition5,thisisnotanissue:sinceweassumethat

(

x

,

y

)

isareduciblepairof N,itisindeedthecase that y isaleafin N

(

x

,

y

)

.

(6)

Fig. 1. Thesixdifferentconstructionsforaddingareduciblepair(x,y)toanetwork N(x,y),asinDefinition5.Theleftandtherightboxesshowhow cherriesandreticulatedcherriescanbeadded,respectively:Thetopsubgraphsshowthepartofthenetwork N(x,y)thatwillbechangedbyadding(x,y); thebottomsubgraphsshowthecorrespondingpartsof N forthedifferentconstructions.Therearemorecases,forexamplewhenaddingacherry(left box)andtheparentof y isareticulation.Insuchcases,however,theconstructionsarethesameforboth 1aand 1b.Similarly,therearemorecasesfor whenaddingareticulatedcherry.Weonlydepicttheseexamples,astheyshowcaseeachofthedifferentconstructions.

Fig. 2. Abinarycherry-pickingnetwork N reducedtoleaf 4 bytheCPS S.Thereductionisshownasasequenceofnetworks N S[:i]for i=0,1,. . . ,5 fromlefttoright,inwhichanelementof S isappliedtothenetworksuccessively.Thissequenceisminimalforthenetwork,aseveryelementofthe sequencereduceseitheracherryorareticulatedcherryofthenetwork.Anexampleofacherry(3,4)canbeseeninthenetwork N S[:2],andareticulated cherry(2,1)canbeseeninthenetwork N.ThereductionofbothreduciblepairsiscarriedoutasinDefinition3.Observethatthissequenceisnota tree-childsequence,astheelement 2 appearsasafirstcoordinatein S1andasasecondcoordinatein S2.Constructinganetworkfromasequence S in

theclass(1a,2a)canbeseenbymovingthroughthesixnetworksinreverseorder(fromrighttoleft).

On the other hand, if we were to start with any sequence of ordered pairs and sought to construct a network by successivelyadding orderedpairs backwardsthroughthe sequence,thestory wouldbea littledifferent. Thatis,we may comeacrossacasewhere,upontryingtoaddareduciblepair

(

x

,

y

)

toanetwork, y doesnotalreadyexistinthenetwork asaleaf.Let S

=

S1S2

. . .

S|S|

= (

x1

,

y1

)(

x2

,

y2

)

. . . (

x|S|

,

y|S|

)

be asequenceoforderedpairs. Startingwithanetworkona singleleaf y|S|,wemayiterativelyadd Si tothenetworkfor i

= |

S

|,

|

S

|

1

,

. . . ,

1 (i.e.,backwardsthroughthesequence S), choosingasuitableconstructionforeachorderedpair,toobtainsomenetwork.Wecallthisanetworkobtainedfrom S.Now, if yi was nota leafinthenetworkwhenadding Si,then suchaconstruction wouldnotbe well-defined.Fortunately, we canfixthisbyimposingasimpleconditiononthesequences.Thismotivatesthefollowingdefinition.

Definition6.Acherry-pickingsequence(CPS) onaset X isasequenceoforderedpairsondistinctelementsfrom X ,suchthat thesecondcoordinateofeachorderedpairoccursasafirstcoordinateinsomeorderedpairintherestofthesequence,or asthesecondcoordinateofthelastpair.

Returning to the example that we hadbefore, we observeif S was a CPS, then yi mustalready havebeen a leaf in the network when adding Si

= (

xi

,

yi

)

.By definitionof CPSs, yi appears asa first coordinatein some ordered pairthat appearsafter Si,or yiappearsasthesecondcoordinateofthefinalorderedpair,whichimpliesthatthenetworkcontains theleaf yi inbothcaseswhenadding Si

= (

xi

,

yi

)

.Therefore,thisconstructioniswell-defined,andwecanalwaysobtaina networkfromaCPS.Thisbringsustothedefinitionofacherry-pickingnetwork.

Definition7.Anetworkon X isacherry-pickingnetwork(CPN) ifitcanbeobtainedfromsomeCPS S.Equivalently,aCPNis anetworkthatcanbereducedbysomeCPS.

SeeFig.2foranexampleofaCPNwithaCPSthatreducesit.SeeFig.3forexamplesofnetworksthatarenotCPNs.In particular,single-leaf networksarealsoCPNs,since thesecanbe reducedby theempty CPS.Bydefinition,aCPNwithat

(7)

Fig. 3. ACPN N1andtwonon-CPNnetworks N2and N3.WeknowfromFig.2that N1 isaCPN.N2 containsacherry(2,3);uponreducing(2,3),the

resultingnetworkdoesnotcontainareduciblepairbecausethe‘crown’structure(ingray)preventscherriesandalsoreticulatedcherry.N3containsa

reticulatedcherry(3,4);uponreducing(3,4),theresultingnetworkdoesnotcontainareduciblepairbecausethegraystructureprevents 2 frombeing pickedasacherryorareticulatedcherry.

leasttwoleavescontains eitheracherryorareticulated cherry.Intuitively,reducingthesestructuresreturnsanetworkof smallersizethatisaCPN;wemayrepeatedlyreducecherriesandreticulatedcherriesuntilthenetworkhasbeenreduced. Asubsequence ofaCPSreferstoanysequenceoforderedpairsthatcanbeobtainedbydeletingsomeelementsfromthe CPS.NotethatasubsequenceneednotbeaCPS.Inwhatfollows,wewilloftenhavetoreduceanetworkbyasubsequence ofaCPS.Thesesubsequencesaremostoftentheinitial partsofthesequence,andhenceweintroducenotation forthem. Let S

= (

x1

,

y1

)(

x2

,

y2

)

. . . (

xn

,

yn

)

bea CPS.For i

∈ [

n

]

,we usethefollowingnotationstodenotesome subsequence of S. The ithorderedpairof S is Si

= (

xi

,

yi

)

.Thefirst i orderedpairsin S aredenotedby S[:i]

= (

x1

,

y1

)

. . . (

xi

,

yi

)

.The subse-quence of S withoutthefirst i orderedpairsisdenotedby S[i+1:]

= (

xi+1

,

yi+1

)(

xi+2

,

yi+2

)

. . . (

xn

,

yn

)

.Welet S[:0] denote theemptysequence.

ACPS S is minimal fora CPN N if S reduces N andeach orderedpair Si of S affects N S[:i−1] forall i

∈ [|

S

|]

.Inother words, N S isanetworkonasingleleaf,and N S[:i−1]

=

N S[:i]forall i

∈ [|

S

|]

.WeoftenwriteaCPSof/foranetwork N torefer toaminimalCPSfor N.

ApartialCPS S oflength i isasequenceoforderedpairssuchthatthereexistsaCPS S where S[:i]

=

S.IfS andSare partialCPSsandN isanon-binarynetwork,thenapplying S andthen S isthesameasappending Sto S,denoted S S, andapplyingthewholesequence.Innotation,wewrite

(

N S

)

S

=

N

(

S S

),

andhencewedenotethisnetworkwithoutbracketsasN S S.

Observation1.Let N beanon-binaryCPNthatcanbereducedbyaCPS S.Thenthenetwork N S[:i]isaCPNforall i

=

1

,

. . . ,

|

S

|

.

Bychoosingasuitableconstruction,wemayobtainaCPNfromanyofitsminimalCPSs. Observation2.Everynon-binaryCPNcanbeobtainedfromaminimalCPSthatreducesit. 2.2. CPNclasses

Using different combinations of thesix constructions fromDefinition 5 can yield differentCPNs from the same CPS. Thesedifferencescouldbeduetothenatureofthenetworkvertexdegrees(binary,semi-binary,non-binary)orduetotheir topologicalfeatures (stack-free,tree-child). OnewayofcategorizingtheseCPNsisto chooseandstay consistentwithone particularconstruction foraddingcherriesandreticulatedcherries tonetworks.Thatis,we constructnetworksfromCPSs withachosenconstruction A toaddcherriesandachosenconstruction B toaddreticulatedcherries.

Therearetwomotivationstodoso.Firstly,thiscategorizesCPNsintoclassesdefinedbytheirtopologicalrestrictions.We mayspecifyclassesofCPNsthatcontainonlybinarynetworks,thosethatcontainonlysemi-binarynetworks,thosewithout stacks,andmanymore.Secondly,andmoreimportantly,wecanintroducesomenotionofacorrespondencebetweenCPNs andminimalCPSsthatreducethem.WithinsomeCPNclasses,itturnsoutthatifasequenceisminimalfortwonetworks, thenthenetworksmustbeisomorphic.

Definition8.Let A and B beacherryconstructionandareticulatedcherryconstruction,respectively.Welet

(

A

,

B

)

denote theclassofallCPNsthatcanbeobtainedfromCPSsbyusingthesuitableconstructions A or B.

WithintheCPNclass

(

A

,

B

)

,wesaythatweusethe

(

A

,

B

)

-construction to obtainCPNsfromCPSs. Wewrite N

∈ (

A

,

B

)

orsaythat N isan

(

A

,

B

)

-CPNtomeanthat N isaCPNintheCPNclass

(

A

,

B

)

.

Sincethereare twocherryconstructionsandfourreticulated cherryconstructions,thereareintotaleightCPNclasses. Forexample,the CPNclass

(

1a

,

2a

)

contains all binary CPNs(see Fig.2 foran exampleofobtaining a

(

1a

,

2a

)

-class CPN

(8)

Fig. 4. A CPS S and the unique networks obtained from it within the eight CPN classes. Observe that the eight networks are distinct.

fromaCPS).WenotethatitispossibleobtainthesameCPNfromthesameCPSwithindifferentCPNclasses.Indeed,aCPS correspondingtoatreewillgivethesamenetworkinalltheCPNclassesthatusethe1acherryconstruction,andthesame canbesaidforCPNclassesthatusethe1bcherryconstruction.Ontheotherhand,theredoexistCPSsthatreturndistinct networksamongstthedifferentCPNclasses;anexampleofsuchaCPSisgiveninFig.4.

SupposethatwearegivenaCPN N withinaCPNclass

(

A

,

B

)

,andlet S beaminimalCPSthatreduces N.Toformsome notionofcorrespondencebetweenCPNsandthesequences thatreducethem,weposethefollowingquestion:isitalways thecasethatapplyingthe

(

A

,

B

)

constructionon S returnsthenetwork N?Itturnsoutthatthisistrueonlyforhalfofthe CPNclasses.WestartbydefiningwhatitmeansforaCPNclasstobereconstructible.

Definition9.ACPNclass C

= (

A

,

B

)

iscalledreconstructible ifforanytwonetworks N

,

N

C withacommonminimalCPS, wehavethat N and Nareisomorphic.

Since the construction isfixed, each CPS gives rise to a uniquenetwork within each ofthe CPNclasses. Then iftwo distinctnetworks N and NhaveacommonminimalCPS S,atmostoneofthesenetworks,say N,canbeconstructedfrom thesequence.Thismeansthat although S isaminimal CPSof N,itcannot beusedtoconstructthenetwork N.Indeed, there doesexist some minimal CPS of N whichcan be used to construct N.Reconstructible CPNclasses havethe nice propertythatforagivenCPN N,any minimalCPSfor N canbeusedtoconstruct N.

Lemma1.Let

(

A

,

B

)

∈ {(

1a

,

2a

)

;

(

1a

,

2b

)

;

(

1b

,

2c

)

;

(

1b

,

2d

)

}

.Let N beaCPNinthe

(

A

,

B

)

-class,andlet

(

x

,

y

)

beareduciblepair in N.Thenadding

(

x

,

y

)

to N

(

x

,

y

)

usingthe

(

A

,

B

)

constructionresultsin N.

Proof. Observethatthesefourclassesarecharacterized bythefollowingproperties.Thenetworksin

(

1a

,

2a

)

arebinary;the networksin

(

1a

,

2b

)

donotcontainrr-edges;thenetworksin

(

1b

,

2c

)

donotcontaintt-edges;andthenetworksin

(

1b

,

2d

)

do not contain rr-edges nor tt-edges. Since N is a network of one of theseclasses, N must also havethese properties. Furthermore,thenetwork N

(

x

,

y

)

alsohastheseproperties,sincedeletingedges andpotentiallysuppressingverticesdoes notcreatenewvertices,whichmaysubdivideexistingrr-edgesortt-edges.Thismeansthatuponinsertingan edgetothe network(asaresultofaddingacherryorareticulated cherry

(

x

,

y

)

),oneshouldeitherdonothing;contractallrr-edges; contractalltt-edges;orcontractallrr-edgesandtt-edges,dependingonwhichclassofCPNsarebeingconsidered.Notethat thesecontractions, shouldthey occur, onlyinvolveverticesthat havejust beenaddedasa resultofaddingthereducible pair,since N

(

x

,

y

)

alsohastheproperties.Thisispreciselywhathappenswhenweadd

(

x

,

y

)

backtothenetwork N

(

x

,

y

)

using therespectiveconstructions, andit followsimmediatelythat theconstructionsdefined intheseCPNclassesreturn the originalnetwork N.



Lemma1 statesthat fouroftheeight CPNclasseshave theproperty thataddingback a reducedpairto thenetwork returnstheoriginalnetwork.ByapplyingthislemmaintheconstructionofanetworkfromaCPS,weobtainthefollowing corollary.

(9)

Fig. 5. Twodistinctnetworks N and NthatcanbereducedbythesameminimalCPSforthe(1a,2c),(1a,2d),(1b,2a),(1b,2b)-classes.Thenetworks obtainedbyusingtherespectiveconstructionsare N.Thismeansthatgivenanetworkandaminimumsequencethatreducesit,thesequencecannot alwaysbeusedtoconstructtheoriginalnetwork(let Nbetheoriginalnetworkinthesefourcases).

Fig. 6. Reducing a reticulated cherry(x,y)and adding it back using the construction(1a,2c)can return a different network.

Corollary1.Let

(

A

,

B

)

∈ {(

1a

,

2a

)

;

(

1a

,

2b

)

;

(

1b

,

2c

)

;

(

1b

,

2d

)

}

.Then

(

A

,

B

)

isreconstructible.

Toshowthat theabove lemmaandthecorollarydonotholdfortheother fourCPNclasses,wepresenttwonetworks withacommonminimalCPSforeachoftheCPNclassesinFig.5.Unliketheir reconstructiblecounter-parts,thenetworks inthesefourclassescancontainbothtt-edgesandrr-edgeswhilstalsocontainingmultifurcationsandmulti-reticulations. WhenconstructingnetworksfromCPSs,thisallowsforamixtureofchoosingtocontractsomett-edgesandsomerr-edges, butnotall.Thiscanmakeaddingreticulatedcherriesproblematic.Takethe

(

1a

,

2c

)

classforexample.Sincetherecanexist tt-edgesthatarebinary,wemay,inparticular,assumethatanetworkintheclasscontainsareticulatedcherry

(

x

,

y

)

where theparentof y is aheadofa binarytt-edge e. Butthismeansthat uponreducing

(

x

,

y

)

andaddingbackthereticulated cherryusingthe 2cconstruction,weessentiallycontractthistt-edge,whichreturnsadifferentnetwork(seeFig.6).

2.2.1. Refinementofconstructednetworks

ThesixconstructionsthatwereintroducedinDefinition5canberephrasedasfollows.Whenadding

(

x

,

y

)

toanetwork, checkif x isaleafinthenetwork.If x isnotaleafinthenetwork,thenaddalabeled leaf x andanedgefromthe(newly added) parentof y to x (addacherry).If x isa leafinthenetwork,then addan edgebetweenthe newlyaddedparents of y and x (addareticulatedcherry).Decidewhetherornottocontractsomeoftheedgesincidenttotheparentof x and edgesincidenttotheparentof y.ThismeansthatgivensomeCPS S,thebinarynetwork N inthe

(

1a

,

2a

)

-classconstructed from S is arefinement ofall networksthatcan be constructedfrom S,usinganycombinationofthe constructions. This givesthefollowingobservation.

Lemma2.Givenanon-binaryCPN N andaminimalCPS S for N,thereexistsabinaryrefinement Nbof N suchthat S isaminimal

CPSfor Nb.

Proof. TheuniquebinarynetworkNbobtainedbyusingconstruction

(

1a

,

2a

)

on S isarefinementofN,andS isaminimal CPSforNbbydefinitionofthisnetwork.



Finally,thefollowinglemmashowshowgeneralrefinementsofCPNs(notnecessarilybinary)arerelatedtotheCPNs. Lemma3.Let Nrbearefinementofanon-binarynetwork N thatisaCPN.Then N isaCPN,andeveryminimalCPSof Nrisalsoa

(10)

Proof. We prove this statement by induction on

|

N

|

,the number of edges in N. Forthe base case take the single-leaf network.Sosupposethatforeverynetworkofsizeatmost

|

N

|

1,theclaimistrue.

Let S beaminimalCPSof Nr,andlet S1

= (

x

,

y

)

bethefirstelementof S.Since Nrcanbeobtainedfrom N byrefining vertices,it mustbe thecasethat S1 is alsoa reduciblepair in N.Furthermore,if S1 isa cherryin Nr then S1 isalsoa cherryin N;if S1 isareticulated cherryin Nr then S1 isalsoareticulatedcherryin N.Nowitiseasytoseethat NrS1 is a refinementof N S1 Notethat

|

N S1

|

<

|

N

|

sinceevery reductionreducesthesizeofthenetwork.The network NrS1 isa CPNbyObservation1.Byinductionhypothesis, N S1 isaCPNandeveryminimalCPSof NrS1 isalsoaminimalCPSof N S1. Theninparticular, S[:2] isaminimalCPSof N S1.Itfollowsthenthat S isaminimalCPSof N.



NotethattheconverseofLemma3doesnotholdingeneral.Consider thetree T onthreeleaves

{

x

,

y

,

z

}

thatallshare acommonparent(theclaw graphwitharoot).Let Tr be abinaryrefinementof T in which x and y formacherry.Then theCPS

(

y

,

z

)(

x

,

z

)

isminimalfor T butnotfor Tr.

3. Propertiesofcherry-pickingnetworks

Inthissection,weinvestigatepropertiesofcherry-pickingnetworks.First,wecontinuewhereweleftoffintheprevious section:weinspecttherelationbetweenCPSsandCPNs.ThisincludesthereticulationnumberdefinedbyaCPS,changesin thesetsofreduciblepairsthatarereadyforpickingafterpickingapair,andtheorderinwhichwecanreduce anetwork. The last oftheseallows ustoconsider distinguishabilityoftwo CPNsby their CPSs. Then, weuse thistoinvestigatethe relationbetweenembeddednetworksofaCPNanditsCPSs.

3.1. WhyCPNsarenice:orderdoesnotmatter

Lemma4.LetS beaminimallengthsequenceoforderedpairsofleavesthatreducesanon-binarynetworkN.ThenS isaCPS. Furthermore,

|

S

|

=

n

+

r

1,where n and r denotethenumberofleavesandthereticulationnumberof N,respectively.

Proof. Supposeforacontradictionthat S is nota CPS.Then,thereisan i

<

|

S

|

withSi

= (

x

,

y

)

suchthat y isnota first coordinateinanyoftheelementsofS[i+1:]orthesecondcoordinateofS|S|.Thismeans y cannotbealeafinN S[:i−1](ifit were,then S wouldnotreduce N).ThisimpliesN S[:i−1]

=

N S[:i],andthereisashortersequence S[:i−1]S[i+1:] thatreduces

N,acontradiction.Weconcludethat S isaCPS.

We now prove the second part of the lemma. Let Si

= (

x

,

y

)

. We first construct a binary network M from S using the

(

1a

,

2a

)

construction. Uponconstructing M S[:i−1] from M S[:i],anewleaf x isaddedifx isnot aleafin M S[:i],anda reticulationisaddedotherwise.ByLemma2, M isabinaryrefinementof N,andtherefore N hasthesameleafsetandit hasthesamereticulationnumberasthatof M.Since S isaminimalCPSfor N,itfollowsthat

|

S

|

=

n

+

r

1.



Definition10.Let N be a non-binary network. Denote with

C

c

(

N

)

the set of cherries of N, andwith

C

r

(

N

)

the set of reticulatedcherriesofN.Thesetofallreduciblepairsisdenoted

C(

N

)

=

C

c

(

N

)

C

r

(

N

)

.

Thefollowinglemmastatesthatallnewreduciblepairsafterpickingapair

(

x

,

y

)

mustinvolveeitherx or y.

Lemma5.LetN beanon-binarynetworkonataxaset X ,andlet

(

x

,

y

)

beareduciblepairofN.Thenwehavethefollowinginclusion:

C

(

N

(

x

,

y

))

\

C

(

N

)

⊆ ({

x

,

y

} ×

X

)

∪ (

X

× {

x

,

y

}) .

Proof. NotethattheLHSofthecontainmentrelationrepresentsthereduciblepairsinN

(

x

,

y

)

thatwerenotpresentinN.

Suppose, forcontradiction,that thissetcontains a pair

(

z

,

w

)

not involvingx or y.Then, thispair isnot reduciblein N,

butitisinN

(

x

,

y

)

.Addingthepair

(

x

,

y

)

backintoN

(

x

,

y

)

mayonlysubdividethependantedgesleadingtox and y.This impliesthatthisactionwillnotchangethefactthat z andw formareduciblepair.Therefore,

(

z

,

w

)

isareduciblepairin

N aswell,acontradiction.Hence,allnewcherriesandreticulatedcherriesof N

(

x

,

y

)

involve x or y.



Wealsohavesimilarinclusionsforlookingatreduciblepairsintheoriginalnetworkthatarenotreduciblepairsinthe newnetwork.Roughlyspeaking,the followinglemmastatesthat reducing anetworkby theelement

(

x

,

y

)

preservesthe otherreduciblepairs.

Lemma6.Let N beanetworkon X ,and

(

x

,

y

)

areducible pairof N.Then, ifN isnon-binary, wehavetheinclusion

C(

N

)

\

C(

N

(

x

,

y

))

⊆ {

x

}

×

X

X

× {

x

}

,andinparticular

C

c

(

N

)

\

C

c

(

N

(

x

,

y

))

⊆ {

x

} ×

X

X

× {

x

},

(11)

C

r

(

N

)

\

C

r

(

N

(

x

,

y

))

⊆ {

x

} ×

X

.

IfN issemi-binary,theinclusionscanbesharpenedto

C(

N

)

\

C(

N

(

x

,

y

))

⊆ {(

x

,

y

),

(

y

,

x

)

}

,with

C

c

(

N

)

\

C

c

(

N

(

x

,

y

))

⊆ {(

x

,

y

), (

y

,

x

)

},

and

C

r

(

N

)

\

C

r

(

N

(

x

,

y

))

⊆ {(

x

,

y

)

}.

Proof. Let N beanon-binarynetworkandlet

(

x

,

y

)

beareduciblepairof N.Let

(

z

,

w

)

beareduciblepairin N where z

=

x

and w

=

x.Weclaimthat

(

z

,

w

)

mustalsobeareduciblepairin N

(

x

,

y

)

.Supposeforacontradictionthatitwasnot.Adding thepair

(

x

,

y

)

mayonlysubdividethependantedgesleadingto x and y;thisactionwillnotchangethefactthat

(

z

,

w

)

does notformareduciblepair.Butthismeansthat

(

z

,

w

)

isnotareduciblepairin N,whichisacontradiction.Therefore,

(

z

,

w

)

mustalsobeareduciblepairin N

(

x

,

y

)

.Itfollowsthateveryreduciblepairin N thatisnotareduciblepairin N

(

x

,

y

)

must containtheelement x.Notethatsincesemi-binarynetworksarenon-binary,thisfactalsoholdsforsemi-binarynetworks.

If

(

x

,

y

)

isa cherryin N,then,sincethenetwork isnon-binary,itispossiblefor x to formacherrywithanotherleaf, say z.Then

(

x

,

z

)

and

(

z

,

x

)

areboth reduciblepairs in N, while they arenot reduciblepairs in N

(

x

,

y

)

since x is nota leafin N

(

x

,

y

)

.Ontheother hand,if

(

x

,

y

)

isareticulated cherryin N,then x may onlyappearasa firstcoordinateina reduciblepairof N.Thereforetheinclusionsfornon-binarynetworksfollow.

Supposenowthat N issemi-binary.Asstatedinthefirstparagraphofthisproof,everyreduciblepairin N thatisnota reduciblepairin N

(

x

,

y

)

mustcontaintheelement x.If

(

x

,

y

)

isacherryin N,then x mayonlybeinacherry(andreducible pair) withtheleaf y.If

(

x

,

y

)

is areticulated cherryin N, then theonlyreticulated cherryinvolving theparent py of y is

(

x

,

y

)

.Allotherreticulated cherriesthatcontain x donotinvolve py,as py isofoutdegree-2:thisimpliesallreticulated cherries in N involving x (as the first coordinate) is still a reduciblepair in N

(

x

,

y

)

. The inclusionsfor the semi-binary networksthenfollow.



We now start ourinvestigation into theorder in whichpairs can be reduced. We start witha lemma that impliesa cherryontwoleaves x and y canbereducedeitheras

(

x

,

y

)

oras

(

y

,

x

)

.Thenweshowthatreducinganarbitrarypairin aCPNgivesanewCPN.

Lemma7.LetS beaminimalCPSforanon-binaryCPN N andsupposeSi

= (

x

,

y

)

reducesacherrywhenapplyingthesequence.Let

z andw bedistinctleaves(notnecessarilydifferentfromx andy)thathaveacommonparent,equaltotheparentofx andy.LetS bethesequenceS[i+1:]whereeachoccurrenceofz isreplacedbyx.ThenS[:i−1]

(

z

,

w

)

SisaminimalCPSforN.

Proof. Because

(

x

,

y

)

formsacherryin N

=

N S[:i−1],andx and y sharetheirparentswithz andw,thereducednetwork

N

(

x

,

y

)

isequaltothenetworkN

(

z

,

w

)

whenz isreplacedbyx.Hence,ifweswitchtherolesofx andz intheremaining partofthesequence,theresultafterreductionbybothsequencesisthesamemodulothex

z replacement.



Lemma8.LetN beanon-binaryCPNthatcanbereducedbyaminimalCPSS

=

S1

,

S2

,

. . . ,

S|S|suchthatS2

C(

N

)

.ThenN S2isa

CPN.

Proof. Notethat S1

,

S2

C(

N

)

byassumption.Wedistinguishseveralcasesandproveineverycasethat N S2 isaCPN.

Theleavesin

S

1and

S

2arethesame. Theneither S1

=

S2,or S1

= (

x

,

y

)

and S2

= (

y

,

x

)

forsomepairofleavesx

,

y.

InthefirstcaseN S2

=

N S1,whichisaCPN.Inthesecondcase,as

(

x

,

y

)

and

(

y

,

x

)

arebothpresentin N,

(

x

,

y

)

must beacherryin N.Thismeansthat N S1S2

=

N S1,andthus S isnotaminimalCPSfor N.Thiscaseisnotpossible. LetS1

:= (

x

,

y

)

.

Thepairs

S

1and

S

2haveexactlyoneleafincommon.

– S2

=(

x

,

z

).

Thecommonleaf x isbelowthereticulationcommontothetworeticulatedcherries.Applying S1 andS2 inanyorderremovesthesetworeticulationedges,soclearlyN S1S2

=

N S2S1.ByObservation1, N S1S2isaCPN.This impliesN S2S1 isaCPNand,therefore,that N S2isalsoaCPN.

– S2

=(

z

,

x

).

Observe firstthat

(

x

,

y

)

cannot form areticulated cherry, asotherwise thefirst coordinateof every

re-duciblepairthat involves x is x,whichcontradictsourassumptionthat S2

= (

z

,

x

)

C(

N

)

.Therefore

(

x

,

y

)

mustbe acherry.Thenthenetwork N S1

=

N

(

x

,

y

)

doesnot havetheleaf x,whichimpliesthat S2

= (

z

,

x

)

isnotareducible pairof N S1.Thiscontradictsthefactthat S wasaminimalCPSfor N,andthereforethiscaseisnotpossible. – S2

=(

y

,

z

).

The twopossibilities forthiscaseare eitherthat x

,

y

,

z all sharethesameparent,orthat

(

x

,

y

)

forma

(12)

deleting the leaves x and y and suppressing all degree-2 vertices. We obtain the same CPNby picking the cher-ries S2

= (

y

,

z

)

and

(

x

,

z

)

in succession,that is, N S2

(

x

,

z

)

=

N S1S2.Thisimpliesthat N S2 isalsoa CPN.Asimilar argumentcanbedoneforthereticulatedcherrycase—itiseasytoseethat N S2

(

x

,

z

)

=

N S1S2.

– S2

=(

z

,

y

).

Thisisthecasewhereeither y and z shareacommonparent,or

(

z

,

y

)

formsareticulatedcherry.Inboth

ofthesecases,theleaf x couldshareacommonparentwith y,or

(

x

,

y

)

could beareticulated cherry(there arein total 4 possiblecases).Inallcases,reducing N by S1firstorby S2firsthasnorealdifference,andso N S1S2

=

N S2S1. Forthesamereasonasbefore, N S2 isaCPN.

Thepairs

S

1and

S

2havenoleafincommon. Thenobviously, S1 andS2 independentlyremoveedgesin N,not influ-encedbytheorderofS1 andS2.HencewegetN S1S2

=

N S2S1andforthesamereasonasbefore,N S2 isaCPN. Inallcases,wehaveconcludedthatN S2 isaCPN,sotheresultfollows.



Lemma9.LetN beanon-binarynetwork,and

(

x

,

y

)

C(

N

)

.Then,thereexistsaminimalCPS S forN suchthatSi

= (

x

,

y

)

or

Si

= (

y

,

x

)

forsomei,and

(

x

,

y

)

isreducibleuntilthatpoint,i.e.,

(

x

,

y

)

C(

N S[:j]

)

forallj

<

i.

Proof. Let S be a minimal CPS for N. If S contains

(

x

,

y

)

or

(

y

,

x

)

as Si,and

(

x

,

y

)

is a reduciblepair in N S[:j] forall

j

<

i,wearedone,soassumethatthisisnotthecase.Leti

>

0 beminimalsuchthat

(

x

,

y

)

/

C(

N S[:i]

)

.Then Si

= (

x

,

z

)

or

Si

= (

y

,

z

)

forsomez

=

x

,

y byLemma6.Because

(

x

,

y

)

C(

N S[:i−1]

)

,

(

x

,

y

)

/

C(

N S[:i]

)

,andthesecondelementof Si isz,

(

y

,

z

)

mustformacherryinN S[:i−1].

First,supposethat

(

x

,

y

)

formsareticulatedcherryinN S[:i−1],andthat Si

= (

x

,

z

)

.Inthatcase,N S[:i]

=

N S[:i−1]

(

x

,

y

)

,so replacingSiwith

(

x

,

y

)

inS givesanewminimalCPSforN thatcontains

(

x

,

y

)

.Next,supposethat

(

x

,

y

)

formsareticulated cherryinN S[:i−1],andthat Si

= (

y

,

z

)

.Then,uponswitchingtherolesof y and z,wehave N S[:i]

=

N S[:i−1]

(

z

,

y

)

.Letting S denotethesequence S[i:]whereeachoccurrenceof z isreplacedby y,weobtainaminimalCPS Sne w

=

S[:i−1]

(

z

,

y

)

Sfor N. Inthissequence,wehavethattheminimalvalue k

>

0 forwhich

(

x

,

y

)

/

C(

Nne w[:k]

)

satisfies k

>

i.Wemayrepeatthisuntil weenterthefirstcase;suchaprocessmustterminateasthelengthof S isfinite.Ontheotherhandif

(

x

,

y

)

formsacherry in N S[:i−1],thenx, y andz share acommonparent.Therefore,byLemma7,thereisaminimalCPSforN thatstartswith

S[:i−1]

(

x

,

y

)

.



Proposition1.LetN beanon-binaryCPNwithc

C(

N

)

.ThenNc isaCPN.Thatis,thereexistsaCPSS suchthatc S isaCPSreducing N.

Proof. Letc

= (

x

,

y

)

.ByLemma9,thereisaCPS S forN thatcontainseither

(

x

,

y

)

or

(

y

,

x

)

,and

(

x

,

y

)

isreducibleuntil thatpointinthesequence.If S1

= (

x

,

y

)

,thenset S

:=

S[2:] andwearedone.Nowsuppose S1 isnotequalto

(

x

,

y

)

.Note thattheremustbeasmallesti

1 withSi

= (

x

,

y

)

or Si

= (

y

,

x

)

.

Suppose

S

i

=(

x

,

y

)

. Recallthatwehave

(

x

,

y

)

C(

N S[:j]

)

forall j

<

i.Hence,byapplyingLemma8i times,N

(

x

,

y

)

isa CPN.

Suppose

S

i

=(

y

,

x

)

. Again,wehave

(

x

,

y

)

C(

N S[:j]

)

forall j

<

i. Hence, N S[:i1] hasboth reduciblepairs

(

x

,

y

)

and

(

y

,

x

)

,anditmustcontainthecherry

(

x

,

y

)

.ByLemma7thereisa CPSof N startingwith S[:i1]

(

x

,

y

)

.Redefining S

asthissequence,weareinthepreviouscaseandthusN

(

x

,

y

)

isaCPN. WeconcludethatNc isaCPN.



Thefollowingtheoremisacorollaryofthepreviousproposition.Itessentiallystatesthatanetworkcanbecherrypicked inanyorder.

Theorem1.LetN beanon-binaryCPN,andS apartialCPS.IfineachstepofthereductionofN byS,thenetworkischanged,then thereexistsaminimalCPSSstartingwithS thatreducesN.

3.2. Distinguishability

By Theorem1, anyorderofpicking reduciblepairs givesa minimal CPS fora CPN.Thisinherently impliesthat fora givenCPN,therecouldbemanyCPSsthatreduceit.However,givenaclass

(

A

,

B

)

,everyCPSuniquelyconstructsaCPNin thatclassbyLemma2.

Remark1.WithinaCPNclass,exactlyoneCPNcanbeconstructedforeachCPS.Ontheotherhand,aCPNcanhavemore thanoneminimalCPSthatreducesit.

WhilethisremarkholdstrueforalleightoftheCPNclasses,onlytheclassesthatarereconstructibleareinteresting to examine.Theaimofthissubsectionistosetup somedistinguishabilitynotionofCPNsusingtheirminimal CPSs.Thatis,

(13)

Fig. 7. ThesmallestCPSforthisnetworkis(1,2)(3,2)(3,4)(4,5)(2,5).Initiallywehavethechoiceofpickingeither(1,2),(2,1),or(3,4).Forthesmallest CPSwepickthesmallestreduciblepair(1,2).

we wouldliketoencode eachCPNby delegatingoneofits minimalCPSstobeitsrepresentative,such thatthesequence can be usedtoreconstructthe CPN.Sincethere could bemore thanone CPNthat canbe reducedby thesameminimal CPS within CPN classesthat are not reconstructible,it makes no sense to consider theseclasses. Therefore,we define a distinguishabilitynotiononlyfortheclassesthatarereconstructible.

Withinareconstructible CPNclass,eachnetwork canhavemanyminimalCPSsthat reduce itbyRemark 1.Tochoose arepresentativefromtheseminimalCPSs, weintroduceanorderingontheCPSs.Doingsoallowsustoprescribeaunique

smallest CPS toeach CPN.So letustake an arbitraryordering onthe leaves, andletusdefine a lexicographicalordering onthereduciblepairsasfollows.Wesaythat

(

a

,

b

)

< (

c

,

d

)

ifandonlyif a

<

c orif a

=

c and b

<

d.Wenaturallyextend thisorderingtominimalCPSs.Let S and S beCPSssuchthat

|

S

|

= |

S

|

.If

|

S

|

<

|

S

|

,then S

<

S—thisensuresthesmallest CPS isminimal.Nowsuppose

|

S

|

= |

S

|

andlet i bethesmallestindexsuchthat Si

=

Si.Ifnosuch i exists,then S

=

S; otherwise, S

<

SifandonlyifSi

<

Si.

By Theorem 1, we may pick a CPN in any order. We define a smallest CPS as one that is obtained by picking the smallestreduciblepairateachiteration(seeFig.7).SuchasequenceisnaturallyaminimalCPS.Bythefollowingtheorem, distinguishingtwoCPNsofthesamereconstructibleclasscomesdowntofindingtheirsmallestCPSandcheckingwhether thesearethesame.

Theorem2.Supposewearegivenanorderingonthetaxaset X .EveryCPNon X hasauniquesmallestCPS.Inparticularwithina reconstructibleCPNclass,theseCPSscanbeusedtoreconstructtheCPN.EveryCPScanbeusedtoconstructauniqueCPNwithineach oftheeightCPNclasses.

Proof. Let N bea CPNon X .Since wehaveatotal orderingonthecherriesof N, wehavethatifthereexists asmallest CPS then itisunique. Furthermore,weknowthat a smallestCPS exists:simply picka smallestcherryatevery iteration. ThereforeeveryCPNon X hasauniquesmallestCPS.WithinreconstructibleCPNclasses,notwonetworkshavethesame minimalCPSs.ItthenfollowsthatasmallestCPSforanetworkcanbeusedtoconstructsaidnetwork.

ByRemark1,wehavethateveryCPSgivesrisetoauniqueCPN.



ThefollowingcorollaryisadirectconsequenceofTheorem2.

Corollary2.Supposewearegivenanorderingonthetaxaset X .WithinareconstructibleCPNclass,twoCPNson X areisomorphicif andonlyiftheyhavethesamesmallestCPS.

Thisleadstoapolynomial-timealgorithmforcheckingwhethertwoCPNsofareconstructibleCPNclassonthesameset oftaxaareisomorphic,whichwedescribeinSection6.

4. Reductionandcontainment

Inthissection,we provethat,withinreconstructibleCPNclasses,thereductionofanetworkbya CPSforanother net-workimplies‘containment’oftheformerinthelatter.Wealsoshow thattheconversedoesnotalwayshold:containment ofanetworkNinanothernetworkN doesnotimplythatthereexistsaminimalCPSofN thatreducesN.

4.1. Reductionimpliescontainment

Wefirstformallydefinewhatitmeansforanetworktocontainanothernetwork.

Definition11.Let N bea non-binarynetworkon thesetof taxa X .A non-binarynetwork N on X

X isa subnetwork

of N if N canbeobtainedfrom N bydeletingreticulationedges,andthencleaningupw.r.t.X,i.e.,applyingthefollowing changesuntilanetworkon Xisobtained:

removingoutdegree-0nodesnotlabeled by X,togetherwiththeirincomingedges;

suppressingalldegree-2 nodes.

Cytaty

Powiązane dokumenty

Thermophysical properties of the uppermost surface govern the exchange of radiative energy between the asteroid and its environment, hence determine surface and

Natomiast w odniesieniu do owych „kształtów zewnętrznych”, czyli tego, co można nazwać formami przypadłościowymi rzeczy, Augustyn pisze tak: „Jeśli, mając na uwadze tylko

Instead, we are aiming at constructions superior to the previous ones at least in certain special situations; besides we will gather new information on random-type properties of

The intensity distribution in the aberration spot as well as the incoherent modulation transfer function (MTF) of this lens are presented in Figs.. 2a and

large scale infrastructure projects like Betuweroute

A closed form solution is presented for the stresses near a rectangular vertex of linear elastic plate loaded by an evenly distributed shear force on one of the edges.. The

Taking into account the obtained values of the surface tension of studied mixtures, it is also possible to determine the effectiveness of adsorption process of

With this modę of calculating numerical indicators based on qualitative features we would undeniably have total equivalence of particular features if the number of classes in