Maximum parsimony distance on phylogenetic trees

(1)

Maximum parsimony distance on phylogenetic trees

A linear kernel and constant factor approximation algorithm

Jones, Mark; Kelk, Steven; Stougie, Leen

DOI

10.1016/j.jcss.2020.10.003

Publication date

2021

Document Version

Final published version

Published in

Journal of Computer and System Sciences

Citation (APA)

Jones, M., Kelk, S., & Stougie, L. (2021). Maximum parsimony distance on phylogenetic trees: A linear

kernel and constant factor approximation algorithm. Journal of Computer and System Sciences, 117,

165-181. https://doi.org/10.1016/j.jcss.2020.10.003

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

Contents lists available atScienceDirect

Journal

of

Computer

and

System

Sciences

www.elsevier.com/locate/jcss

Maximum

parsimony

distance

on

phylogenetic

trees:

A linear

kernel

and

constant

factor

approximation

algorithm

Mark Jones

a

,

b

,

∗

,

Steven Kelk

c

,

Leen Stougie

b

,

d

,

e

a_Delft_Institute_of_Applied_Mathematics,_Delft_University_of_Technology,_Van_Mourik_Broekmanweg_6,₂₆₂₈_XE,_Delft,_the_Netherlands b_Centrum_Wiskunde_&_Informatica_(CWI),₁₀₉₈_XG_Amsterdam,_the_Netherlands

c_Department_of_Data_Science_and_Knowledge_Engineering_(DKE),_Maastricht_University,₆₂₀₀_MD_Maastricht,_the_Netherlands d_Vrije_Universiteit_Amsterdam,₁₀₈₁_HV_Amsterdam,_the_Netherlands

e_{INRIA-Erable,}_France

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Articlehistory:

Received7April2020

Receivedinrevisedform 23October2020 Accepted26October2020

Availableonline7December2020

Keywords:

Phylogenetics Maximumparsimony Fixedparametertractability Maximumagreementforest

Maximum parsimony distance is a measure used to quantify the dissimilarity of two unrooted phylogenetictrees.ItisNP-hardtocompute,andveryfewpositivealgorithmic results are known due to its complex combinatorial structure. Here we address this shortcoming by showing that the problem is ﬁxedparameter tractable. We do this by establishinga linearkernel i.e.,thatafter applyingcertain reductionrulesthe resulting instance has size that is bounded by a linear function of the distance. As powerful corollariestothisresultweprovethattheproblempermitsapolynomial-time constant-factorapproximationalgorithm;thatthetreewidthofanaturalauxiliary graphstructure encountered in phylogenetics is bounded by a function of the distance; and that the distance is withinaconstantfactor ofthe sizeof amaximum agreementforest ofthe twotrees,awellstudiedobjectinphylogenetics.

1. Introduction

Phylogeneticsisthescienceofinferringandcomparingtrees(ormoregenerally,graphs)thatrepresenttheevolutionary historyofasetofspecies[34].Inthisarticlewefocusontrees.Theinferenceproblemhasbeencomprehensivelystudied: givenonlydataaboutthespeciesinX (suchasDNAdata)constructaphylogenetictree whichoptimizesaparticularobjective function[17,40].Informally,aphylogenetictreeissimplyatreewhoseleavesarebijectivelylabelledby X .Duetodifferent objectivefunctions,multipleoptimaandthephenomenonthatcertaingenomesaretheresultofseveralevolutionarypaths (rather than just one) we are often confronted with multiple “good” phylogenetic trees [32]. In such caseswe wish to formally quantifyhowdissimilarthesetreesreallyare.Thisleads naturallytotheproblemofdeﬁningandcomputingthe

distance between phylogenetictrees [36]. Many such distances havebeen proposed, some ofwhich can be computed in polynomial-time, such as Robinson-Foulds (RF) distance [33], and some of which are NP-hard, such as SubtreePruneand Regraft (SPR)distance[9] orTreeBisectionandReconnection (TBR)distance[1].

Interestingly,distancesarenotonlyrelevantasanumericalquantiﬁcationofdifference:theyalsoappearinconstructive methods for the inference ofphylogenetic networks[20], whichgeneralise trees to graphs, andphylogenetic supertrees,

*

Correspondingauthorat:DelftInstituteofAppliedMathematics,DelftUniversityofTechnology,VanMourikBroekmanweg6,2628XE,Delft,the Netherlands.

E-mailaddress:M.E.L.Jones@tudelft.nl(M. Jones). https://doi.org/10.1016/j.jcss.2020.10.003

(3)

whichseektomergemultipletreesintoasinglesummarytree[42].InrecentdecadesNP-hardphylogeneticdistanceshave attractedquitesomeattentionfromthediscreteoptimizationandparameterizedcomplexitycommunities,seee.g.[12,16].

In this articlewe focuson a relatively newdistance measure, maximumparsimonydistance, henceforth denoted dM P. Let T1 and T2 be two unrooted (i.e. undirected) binary phylogenetic trees,with the sameset of leaf labels X . Consider

an arbitraryassignmentofcolours(“states”)to X ;we callsuchanassignment acharacter.Theparsimonyscore of T1 with

respect to thecharacter is theminimum numberofbichromatic edges in T1,ranging overall possible colouringsof the

internalverticesofT1.TheparsimonydistanceofT1andT2isthemaximumabsolutedifferencebetweenparsimonyscores

ofT1 andT2,rangingoverallcharacters[18,31].

The distancehasseveralattractive properties;it isa metric,and(unlike e.g.RF distance) itis not confounded bythe inﬂuence ofhorizontal evolutionaryevents [18]. Furthermore,the concept ofparsimony, whichlies atthe heart ofdM P, is fundamentalin phylogeneticssince itarticulatesthe ideathat explanations ofevolutionary historyshouldbe no more complexthannecessary.Alongsideitshistoricalsigniﬁcanceforappliedphylogenetics[17],thestudyofcharacter-based par-simonyhasgivenrisetomanybeautifulcombinatorialandalgorithmicresults;werefertoe.g.[37,29,38,2,30] foroverviews. Unfortunately, itis NP-hardto compute dM P [22]. A simpleexponential-time algorithm isknown [26], which runsin time O

(φ

n

·

poly

(

n

))

,where

|

X

|

=

n and

φ

≈

1

.

618 is the golden ratio,but beyondthis few positive results are known. Thisisfrustratingandsurprising,sinceanumberofresultslinkdM P tothewell-studiedTBRdistance,henceforthdenoted dT B R. Namely, it has been proven that dM P is a lower bound on dT B R [18], which, informally, asks for the minimum number oftopological rearrangement operationsto transformone tree into the other;an empirical study has suggested thatinpracticethedistancesareoftenveryclose[23].Also,dM P hasbeenusedtoprovethetightnessofthebest-known kernelizationresultsfordT B R [24,25].What,exactly,istherelationshipbetweendM P anddT B R?Thisisapertinentquestion, whichtranscendsthespeciﬁcsofTBRdistancebecause,crucially,dT B R canbe characterizedusingthepowerfulmaximum agreementforest abstraction.

Distancesbasedonagreementforestshavebeenintensivelyandsuccessfullystudied inrecentyears,astheuseofthe agreementforestabstractionalmostalwaysyieldsﬁxedparametertractabilityandconstant-factorapproximationalgorithms [10], manyofwhichare effectiveinpractice.Werefer to[41,39,14,35] forrecentoverviewsoftheagreement forest liter-ature, andbookssuch as[15] for an introductionto ﬁxed parameter tractability.Inparticular, dT B R can becomputed in O

(

3dT B R

·

_poly

₍

_n

₎₎

_time_[₁₃_],_permits_a_{polynomial-time}_{3-approximation}_algorithm,_and_a_kernel_of_size_11d

T B R

−

9 [25]. Incontrast,priortothispaperverylittlewasknownaboutdM P:nothingwasknownabouttheapproximabilityofdM P; itwas not knownwhetherit isﬁxedparametertractable (wheredM P istheparameter);and, while,asmentionedabove, it is known that dM P

≤

dT B R,it remained unclear howmuch smaller dM P can be than dT B R in the worst case. Despite promising partial resultsit evenremained unclearwhether questionssuch as“Is dM P

≥

k?” can be solved inpolynomial time whenk isa constant[8,23]. Thisisanotherimportantdifference withdistancessuch asdT B R,wherecorresponding questionsaretriviallypolynomialtime solvableforﬁxedk.TheapparentextracomplexityofdM P seems tostemfromthe unusualmax-mindeﬁnitionoftheproblem,andthefactthatunlikedT B R,whichisbasedontopologicalrearrangementsof subtrees,dM P isbasedonlyoncharacters.

Inthisarticlewetakea signiﬁcantstepforwardinunderstanding thedeepercomplexityofdM P andresolveall ofthe above questions.Our central result is that we prove that two common polynomial-time reduction rules encountered in phylogenetics,thesubtree andchain reductions[1],aresuﬃcienttoproducealinearkernel fordM P.Thismeansthat,after exhaustiveapplicationoftheserules,whichpreserve dM P,thereducedtreeswillhaveatmost

α

· (

dM P

+

1

)

leaves, with

α

=

560.The ﬁxed parametertractability ofcomputingdM P (parameterizedby itself)thenfollows, bysolving the kernel using the exact algorithm from [26]. The fact that the reduction rules preserve dM P was already known [23]. However, proving thebound onthesize ofthereducedtrees requiresratherinvolvedcombinatorialarguments,which haveavery differentﬂavourtotheargumentstypicallyencounteredinthemaximumagreementforestliterature.Themaingoalofthis articleistopresenttheseargumentsasclearlyaspossible,ratherthantooptimizetheresultingconstants.

The kernelconﬁrmsthat questionssuchas“IsdM P

≥

k?”can,indeed,be solvedinpolynomial time:it isstrikingthat heretheproofofﬁxedparametertractabilityhasprecededtheweakerresultofpolynomial-timesolveabilityforﬁxedk.

Next,by producinga modiﬁed,constructiveversion oftheboundingargumentunderpinning thekernelization, weare abletodemonstrateapolynomial-time

α

(

1

+

1

/

r

)

-factorapproximationalgorithmforcomputationofdM P foranyconstant r,placingtheprobleminAPX.

Anumberofotherpowerfulcorollariesresultfromthekernelization.Weleveragethefactthatthereductionrulesalso preserve dT B R, to show that 1

≤

d_dT B R_{M P}

≤

2

α

, which limits how much smaller dM P can be than dT B R. Subsequently, we show that thetreewidth ofan auxiliary graphstructure knownasthe displaygraph [11] isbounded bya linearfunction of dM P, resolving an open question posedseveraltimes [28,23]. Thetreewidth bound, andthe existence ofa non-trivial approximationalgorithmfordM P,werespecifiedassufficientconditionsforprovingthefixedparametertractabilityofdM P viaCourcelle’s Theorem[23];ourlinearkernelimpliesthem.Summarising,ourcentralresultshowshowkernelizationcan openthegatewaytoahostofstrongauxiliaryresultsandbypassintermediatestepsinthealgorithmdesignprocess.

The structure of the paper is as follows. In Section 2 we give formal deﬁnitions and insightful preliminary results. In Section 3 we prove our main result: the linear kernel.The section starts with Subsection 3.1 that gives a high-level overviewofhowasequenceoflemmasandtheoremsleadtothekernel,whereasintherestofthesectiontheselemmas and theorems are proved. Interestingcorollaries of the existence of a linearkernel are derived inSection 4: Aconstant approximation algorithm in Section 4.1; A bound on the ratio between dM P and dT B R in Section 4.2; A bound on the

(4)

Fig. 1. TwounrootedbinaryphylogenetictreesT1,T2onX= {a,. . . ,g}.Solidedgesaremonochromaticanddashededgesarebichromaticunderanoptimal

extensionforthecharacterχ:X→ {red,blue}, whereχ(a)=χ(b)=χ(c)=red,χ(d)=χ(e)=χ(f)=χ(g)=blue.Asthereisonebichromaticedge inT1 andtwoinT2,wehavethatlχ(T1)=1,lχ(T2)=2,provingthatdM P(T1,T2)≥ |1−2|=1.Infact,itcanbeveriﬁedthatnocharactercancause

theparsimonyscoresofthesetwotreestodifferbymore,sodM P(T1,T2)=1.WewillshowinSection4.2thatdT B R(T1,T2)=2,becauseamaximum

agreementforestofthesetwotreescontainsthreeblocks[23].(Forinterpretationofthecoloursintheﬁgure(s),thereaderisreferredtothewebversion ofthisarticle.)

treewidthoftheso-calleddisplaygraphintermsofdM P inSection4.3.Section5concludeswithsomedirectionsforfuture research.

2. Deﬁnitionsandpreliminaries

An unrootedbinaryphylogenetictree ona setof species(ortaxa) X isan undirectedtreein whichall internal vertices havedegree3,andthedegree-1 vertices(theleaves)arebijectivelylabelledwithelementsfromX .Forbrevitywewillrefer tounrootedbinaryphylogenetictreesasphylogenetictrees,orevenshortertrees.SeeFig.1foranexample.

GivenasetS

⊆

X andatreeT on X ,we denoteby T

[

S

]

thespanningsubtreeonS inT ,thatis,theminimalconnected subgraph Tof T suchthat Tcontainsevery elementofS.TheinducedsubtreeT

|

SbyS inT isthetreederivedfromT

[

S

]

bysuppressinganyverticesofdegree2.

Givenasubset S

⊆

X andatreeT on X ,wesaythat S hasdegreed inT ifthereareexactlyd edgesuv inT for which

u isinT

[

S

]

andv isnot;inotherwords,d isthenumberofedgesseparatingT

[

S

]

fromtherestofT .Wecalltheseedges

pendingedgesofS in T .

For two disjointsubsets S1

,

S2

⊆

X , we say S1 and S2 are spanning-disjoint in T if thespanning subtreesT

[

S1

]

and T

[

S2

]

areedge-disjoint.(ObservethatasT is binary,thisalsoimpliesthat T

[

S1

]

andT

[

S2

]

arevertex-disjoint.)Similarly,

wesayacollectionS1

,

. . .

Sm ofsubsetsof X arespanning-disjoint inT if Si

,

Sj arespanning-disjointinT foranyi

=

j. 2.1. Charactersandparsimony

Acharacter on X isa function

χ

:

X

→

C,whereC isasetofstates.Inthispaperthereisnolimit onthesizeofC,in contrasttosome contextswhere

|

C

|

isassumedtobequitesmall(forexample,ingeneticdatathenucleobasesA,C,G,T). Thinkofthestatesascolours,say1

,

2

,

. . . ,

t

=: [

t

]

.

Foragivencharacter

χ

andtreeT on X ,theparsimonyscore measureshowwell T ﬁts

χ

.Itisdeﬁnedinthefollowing way.Callacolouring

φ

:

V

(

T

)

→ [

t

]

an extension of

χ

to T if

φ (

x

)

=

χ

(

x

)

forallx

∈

X . Denoteby

T

(φ)

thenumberof bichromaticedges uv in T ,i.e.forwhich

φ (

u

)

= φ(

v

)

.We usuallyomit subscript T whenthetreeis clearfromcontext. Theparsimonyscore forT withrespectto

χ

isdeﬁnedas

lχ

(

T

)

=

min

φ

T

(φ)

wheretheminimum istakenover allpossibleextensions

φ

of

χ

to T .An extension

φ

that achievesthisbound iscalled anoptimalextension of

χ

to T .Anoptimalextension,andthustheparsimonyscore,canbeeasilycomputedinpolynomial timeusingdynamicprogrammingore.g.Fitch’salgorithm[19].

Observethat foranyT and

χ

,the parsimonyscore forT with respectto

χ

is atleast

|

χ

(

X

)

|

−

1, i.e.the numberof colours assignedby

χ

minus 1.Iflχ

(

T

)

isexactly

|

χ

(

X

)

|

−

1,wesaythat T isaperfectphylogeny for

χ

.Fortrees T1

,

T2

andacharacter

χ

on X ,theparsimonydistancewithrespectto

χ

isdeﬁnedas dM Pχ

(

T1

,

T2

)

= |

lχ

(

T1

)

−

lχ

(

T2

)

|.

Nowwearereadytodeﬁnethemaximumparsimonydistance betweentwotrees(seealsoFig.1).FortwotreesT1

,

T2 on X ,themaximumparsimonydistanceisdeﬁnedas

dM P

(

T1,T2)

=

max

χ dM Pχ

(

T1,T2)

wherethemaximumistakenoverallpossiblecharacters

χ

onX [18,31].Equivalently,wemaywriteitas dM P

(

T1,T2)

=

max

(5)

where

φ

1 isanoptimalextensionof

χ

toT1,and

φ

2 anoptimalextensionof

χ

toT2.Thismeasuresatisﬁestheproperties

ofa distancemetric onthespaceofunrootedbinary phylogenetictrees[18,31].Fortwo treesonn taxaitisknownthat

dM P isatmostn

−

2

√

n

+

1 [18].Aweakerbound ofn

−

1 iseasilyobtainedbyobservingthat theparsimonyscoreofa characteronatreeisatleast0andatmostn

−

1.

GivenatreeT on X andacolouring

φ

:

V

(

T

)

→ [

t

]

,theforestinducedby

φ

isderived fromT bydeletingevery bichro-maticedgeunder

φ

.Observethatthenumberofconnectedcomponentsintheforestinducedby

φ

isexactly

(φ)

+

1.

Lemma1.If

χ

:

X

→ [

t

]

isacharacterwithSi

=

χ

−1

(

i

)

= ∅

(i.e.atleastonetaxaiscoloured i)foreachi

∈ [

t

]

,andT isatreeonX , then

lT

(

χ

)

≥

t

−

1

withequalityifandonlyifS1

,

. . .

Starespanning-disjointinT .

Proof. ToseethatlT

(

χ

)

≥

t

−

1,consideranoptimalextension

φ

of

χ

toT ,andlet F betheforestinducedby

φ

.Aseach connectedcomponentin F ismonochromaticallycolouredby

φ

,theremustbe atleastt connectedcomponents,andthus

(φ)

≥

t

−

1,whichimplieslχ

(

T

)

≥

t

−

1.

NowsupposethatS1

,

. . . ,

Starespanning-disjointinT .Thenconstructanextension

φ

of

χ

toT byﬁrstsetting

φ (

u

)

=

i foreveryvertexu inT

[

Si

]

,foreachi

∈ [

t

]

.(Asthespanningtreesareedge-disjointandthusvertex-disjointinT ,thisis well-deﬁned).Foranyremainingunassignedverticesv,ifv hasaneighbouru forwhich

φ (

u

)

isdeﬁned,thenset

φ (

v

)

= φ(

u

)

. Repeatthisprocess untilevery vertexisassigneda colourby

φ

. Nowobserve thatby construction,the verticesassigned colouri by

φ

formaconnectedsubtreeforeach i

∈ [

t

]

.Thustheforestinducedby

φ

hasexactlyt connectedcomponents, andso

(φ)

=

t

−

1.

Finally,suppose lχ

(

T

)

=

t

−

1, andlet

φ

be an optimalextension of

χ

.Then theforest F inducedby

φ

hasexactly t

connectedcomponents,whichimpliesbythepigeonholeprinciplethateach Si isasubsetofoneconnectedcomponentin F .Thenaseach Si iscontainedwithin adifferentconnectedcomponentof F , thespanning treesT

[

Si

]

arealsocontained withinthesecomponents,andso S1

,

. . .

St arespanning-disjoint.

2.2. Parameterizedcomplexityandkernelization

A parameterizedproblem is aproblemfor whichthe inputsare oftheform

(

x

,

k

)

, wherek isan non-negative integer, calledtheparameter.Aparameterizedproblemisﬁxed-parametertractable (FPT)ifthereexistsanalgorithmthatsolvesany instance

(

x

,

k

)

in f

(

k

)

· |

x

|

O(1)_time,_where _f

₍₎

_is_a_computable_function_depending_only_on_k._A_{parameterized}_problem_has akernel ofsize g

(

k

)

,where g

()

isacomputablefunctiondependingonlyonk,ifthereexistsapolynomialtimealgorithm transforming anyinstance

(

x

,

k

)

intoan equivalent problem

(

x

,

k

)

, with

|

x

|,

k

≤

g

(

k

)

.If g

(

k

)

is a polynomial ink then

we call this a polynomialkernel; if g

(

k

)

=

O

(

k

)

then it isa linearkernel. It is well-known that a parameterized problem isﬁxed-parameter tractableifandonlyifithasa(notnecessarilypolynomial)kernel.Formoreinformation,wereferthe readerto [15].

Foramaximizationproblem

and

ρ

≥

1,wesay

hasaconstantfactorapproximation withapproximationratio

ρ

ifthere existsapolynomial-timealgorithmsuchthatforanyinstance

π

of

,thefollowinginequalitieshold,whereopt

(

π

)

denotes themaximumvalueofasolutionto

π

,andalg

(

π

)

denotesthevalueofthesolutionto

π

returnedbythealgorithm:

1

≤

opt

(

π

)

alg

(

π

)

≤

ρ

Inthispaperwestudythefollowingmaximizationproblem: MaximumParsimonyDistance(dmp)

Input: Twotrees T1

,

T2 onasetoftaxa X .

Output: Acharacter

χ

on X thatmaximizes

|

lχ

(

T1

)

−

lχ

(

T2

)

|

.

3. Kernelbound

3.1. Overview

Inthissectionwegiveanoverviewoftheconstituentpartsofourkernelizationresult,andhowtheyﬁttogether. The ﬁrststepistoapply tworeduction rules,theCherryruleandtheChain rule,describedinthe nextsection.These rulescorrespondroughlytoreductionrulesthatoftenappearinpapersoncomputationalphylogenetics.Thecorrectnessof theserules was provedin [23];ourcontributionisto show thatthe exhaustiveapplicationoftheserules grantsa linear kernel,asstatedinthefollowingtheorem.

(6)

Theorem1.Thereexistsaconstant

α

(

α

=

560)forwhichthefollowingholds.Let

(

T1

,

T2

)

beapairofbinaryunrootedphylogenetic treesonX thatareirreducibleunderReductionRules1and2.

Thenif

|

X

|

≥

α

k,itholdsthatdM P

(

T1

,

T2

)

≥

k,andwecanﬁndawitnessingcharacter,i.e.acharacter

χ

yieldingdM Pχ

(

T1

,

T2

)

≥

k,inpolynomialtime.

Thistheorem,togetherwiththecorrectnessofthereductionrulesasprovedin [23],immediatelyimpliesalinearkernel for dmp.

Toshowhowweprovethetheorem,wewillneedtointroducesometerminologyaswego.

Aquartet Q isanysetof4 elementsin X .IfT1

|

Q

=

T2

|

Q,wesaythat Q isaconﬂictingquartet for

(

T1

,

T2

)

.

Asacrucialstepweprovethatforany S largeenoughwithrespecttothedegreeofS inboth T1 andT2,eitherthere

existsaconﬂictingquartetoroneofthereductionrulesapplies.

Lemma2.LetS beasubsetofX withd1thedegreeofS inT1,andd2thedegreeofS inT2.If

|

S

|

>

9

(

d1

+

d2

)

−

12,theneither T1

|

S

=

T2

|

S oroneofReductionRules1or2appliesto

(

T1

,

T2

)

.Inparticularif

(

T1

,

T2

)

isirreducibleunderRules1or2and

|

S

|

≥

9

(

d1

+

d2

)

−

11,thenthereexistsaconﬂictingquartetQ

⊆

S,andsuchaquartetcanbefoundinpolynomialtime.

The next resultimpliesthat if we havea large enoughnumber ofconﬂicting quartets that are alsospanning-disjoint in both T1 and T2,then we are done. While it is intuitively clearthat such quartets can be leveraged to create a high

parsimonyscoreinonetree,somecarehastobetakentokeeptheparsimonyscorelowintheothertree.

Lemma3.Let

Q

= {

Q1

,

. . . ,

Qk

}

beasetofconﬂictingquartetsforT1

,

T2,suchthatQ1

,

. . .

Qkarespanning-disjointinT1andin T2.

ThendM P

(

T1

,

T2

)

≥

k,andwecanﬁndawitnessingcharacterinpolynomialtime.

Incombination,Lemmas2and3allowustoshowthatdM P

(

T1

,

T2

)

≥

k providedthatwecanﬁndatleastk setsS1

,

. . .

Sk thatarespanning-disjointinbothtreesandsatisfytheconditionsofLemma2.

We will ﬁndk such sets as part of the construction of a character that witnesses dM P

(

T1

,

T2

)

≥

k, for any reduced

instancewith

|

X

|

≥

α

k.Inordertoconstructthischaracter,weﬁrstcreateapartitionof X intolargesubsets,asdescribed bythefollowinglemma.

Lemma4.Supposethat

|

X

|

≥

2ct forsomeintegersc andt,andletT1beaphylogenetictreeonX .

TheninpolynomialtimewecanconstructapartitionS1

,

. . . ,

StofX withS1

,

. . . ,

Stspanning-disjointinT1,suchthat

|

Si

|

≥

c foreachi.

Wenotethatthereisaone-to-onecorrespondencebetweenpartitionsandcharactersonX ,inthefollowingsense.Given a partition S1

,

. . .

St of X ,we maydeﬁnea character

χ

:

X

→ [

t

]

such that

χ

(

x

)

=

i if x

∈

Si,foreach i

∈ [

t

]

.Callsucha characterthecharacterdeﬁned by S1

,

. . .

St.

Thusletusconsiderthecharacter

χ

on X deﬁnedbythepartitiondescribedbyLemma4.Since S1

,

. . .

St are spanning-disjointinT1,Lemma1tellsthattheparsimonyscoreofT1 withrespectto

χ

isexactlyt

−

1.

Lemma5.Let

χ

bethecharacterdeﬁnedbythepartitionS1

,

. . . ,

StwhereS1

,

. . . ,

Starespanning-disjointinT1,letd1

,

d2bepositive integerssuchthatd1d2

−

d1

−

d2

>

0,andassume

t

≥

(

2d1d2

+

d1) d1d2

−

d1

−

d2

k

.

TheneitherdM Pχ

(

T1

,

T2

)

≥

k,orinpolynomialtimewecanﬁndasetofindicesi1

,

. . .

ikwithk

≥

k suchthat:

•

Si1

,

. . .

Sikarespanning-disjointinT2(aswellasinT1);

•

Sijhasdegreeatmostd1inT1foreachj

∈ [

k

]

;and

• ∈ [

k

]

.

We willprove Theorem1 bycombiningtheseresults inthefollowing way.Fixintegers d1

,

d2 to be determinedlater.

Assume

(

T1

,

T2

)

isirreducibleunderReductionRules1and2,andassumethat

|

X

| ≥

2ct

,

where c

=

9

(

d1

+

d2)

−

11 and t

≥

(

2d1d2

+

d1)

d1d2

−

d1

−

d2

k (thisholdsif

|

X

|

≥

α

k).

ByLemma4,thereexistsapartitionS1

,

. . .

St ofX with S1

,

. . .

St spanning-disjointinT1and

|

Si

|

≥

c foreachi

∈ [

t

]

.Let

(7)

getasetofindicesi1

,

. . .

iksuchthat Si1

,

. . .

Sik arespanning-disjointinT2 (aswellasinT1),each Sij hasdegreeatmost

d1 in T1,andeach Sij hasdegreeatmostd2 inT2.Buttheneach Sij satisﬁestheconditionsofLemma2,andtherefore

foreach j

∈ [

k

]

thereexistsaconﬂictingquartet Qj

⊆

Sij.Moreover,as Si1

,

. . .

Sik arespanning-disjointin T1 andT2,the

quartets Q1

,

. . .

Qkarealsospanning-disjointinT1andT2.ThenLemma3impliesthatdM P

(

T1

,

T2

)

≥

k.

Bysettingd1

=

4 andd2

=

5,wegetthat

α

=

560,givingthedesiredbound.

Inthenextsubsectionsweproveeachoftheselemmas,andthenthemaintheorem,inturn.

3.2. Reductionrules

We begin by statingthe reduction rules forour kernelizationresult. In what follows,a pair

(

x

,

y

)

with x

,

y

∈

X is a

cherry inatree T ifthereexistsan internalvertexu inT adjacenttoboth x andy.Acherryisalsosometimesknownin theliterature asasibling-pair. Asequence ofleavesx1

,

. . .

xr

∈

X isachain in T if thereexistsa pathofinternal vertices p1

,

. . . ,

pr (possiblywithp1

=

p2 andpossiblywithpr−1

=

pr),suchthatforeachi

∈ [

r

]

pi istheinternal vertexadjacent toxi.Wecallr thelength ofthischain.

ReductionRule1.[Cherryreductionrule]Ifthereexistx

,

y

∈

X suchthat

(

x

,

y

)

isacherryineachofT1

,

T2,thenreplace

(

T1

,

T2

)

with

(

T1

|

X\{x}

,

T2

|

X\{x}

)

.

ReductionRule2.[Chainreductionrule]Ifthereexistsasequenceofleavesx1

, . . .

xr

∈

X suchthatx1

,

. . .

xrischaininbothT1and T2,andr

≥

5,thenreplace

(

T1

,

T2

)

with

(

T1

|

X\{x5,...,xr}

,

T2

|

X\{x5,...xr}

)

(thus,thecommonchainisreducedtolength4).

Thecorrectnessoftheserules(inthesensethattheypreservedM P)waspreviouslyprovedin [23].

Theorem2.Let

(

T₁

,

T₂

)

beaninstanceof dmpderivedfrom

(

T1

,

T2

)

byanapplicationofReductionRules1or2.Then

dM P

(

T1

,

T2

)

=

dM P

(

T1,T2).

CorrectnessofthechainreductionrulefollowsfromTheorem3.1in [23].Correctnessofthecherryreductionrulefollows asasubcaseofTheorem4.1in [23].

Our main contribution is to show that ifan instanceis reduced by these rules then its size is bounded by a linear functionofdM P.

3.3. Smalldegreesets

InthissectionweproveLemma2.

Lemma2. Let S beasubsetof X withd1 thedegreeof S inT1,andd2 thedegreeofS in T2.If

|

S

|

>

9

(

d1

+

d2

)

−

12,then eitherT1

|

S

=

T2

|

S oroneofReductionRules1or2appliesto

(

T1

,

T2

)

.Inparticularif

(

T1

,

T2

)

isirreducibleunderRules1or2and

|

S

|

≥

9

(

d1

+

d2

)

−

11,thenthereexistsaconﬂictingquartetQ

⊆

S,andsuchaquartetcanbefoundinpolynomialtime.

Proof. Since unrooted binary trees are characterized by their quartets [34,Theorem 6.3.5(iii)], the last statement of the lemmafollowsdirectly.

WewillshowthatifT1

|

S

=

T2

|

S andneitherofthereductionrulesappliesto

(

T1

,

T2

)

,then

|

S

|

≤

9

(

d1

+

d2

)

−

12.This

impliesthemainclaimofthelemma.LetusdenoteT

|

S

=

T1

|

S

=

T2

|

S.

Considerthebackbone graphofT

|

S obtainedbydeletingallleaves(seeFig.2foranexample).LetPC bethesetofnodes havingdegree1onthebackbone,whichwerefertoasparents ofacherryinT

|

S.LetPL bethesetofnodeshavingdegree 2 onthe backbone,which werefer to asparents ofaleaf ofT

|

S.All remaining verticeson thebackbone havedegree 3. Thus

|

S

|

,thetotalnumberofleavesof T

|

S is2

|

PC

|

+ |

PL

|

.We callthepathbetweenanytwoodddegreeverticesonthe backbone,havinginternalnodesonlyinPL,aside ofthebackbone.

First noticethat for each cherryin T

|

S,there mustexist in T1

[

S

]

,the spanning treeon S in T1, orin T2

[

S

]

anode,

incident to a pending edge of S, between at least one of its two leaves and its corresponding node in PC. Otherwise ReductionRule1canbeapplied.Inparticularthisimpliesthat

|

PC

|

≤

d1

+

d2.

Thusatleast PC ofthed1

+

d2 pendingedgesmustbeusedfor“cutting”thecherries,eachofthemcutting1leafofa

cherry.Letuschooseonesuchleaffromeachcherry,andcallthesethecut-leaves.

After removing cut-leaves, every node in PC and PL is now the parent of 1 leaf in T

|

S.Every side ofthe backbone contains at most4vertices in PC and PL, unless T1

[

S

]

or T2

[

S

]

hasa node ofa pending edge of S or a node adjacent

to anodeofa pendingedgeon thatside.We showthat everysuch pendingedgeon asidemayincrease thenumberof

PL-nodesonthat sidebyatmost5 (seeFig.2).Indeed,supposeasideofthebackbone hasintotald pendingedgesof S inboth T1 and T2,butmorethan 4

+

5d nodesin PL,i.e.atleast5

(

d

+

1

)

.Then T

|

S containsachainoflength5

(

d

+

1

)

,

(8)

Fig. 2. ExampleillustrationofthebackboneofT|S=T1|S=T2|SwithinT1andT2,whereS= {s1,. . . ,s29}.Edgesandverticesofthebackboneareinbold.

ObservethatT|Shasthechains1,. . . ,s9,but(T1,T2)donothaveacommonchainoflengthgreaterthan4,astheleafs5hasasiblinga inT2. whichwecansplitupintod

+

1 chainsoflength 5.Clearlyatleastone ofthesechainshasnopendingedgeineitherT1

orT2,andsoT1

,

T2 haveacommonchainoflength5,acontradiction.

Thus thetotalnumberofnodesfrom PC and PL onaside isatmostﬁvetimesthenumberofpendingedges ofS (in T1

[

S

]

orT2

[

S

]

)onthatside,plus4.OtherwiseReductionRule2canbeapplied.Giventhatwe alreadyused

|

PC

|

pending edgesforcuttingthecherries,wehaved1

+

d2

− |

PC

|

pendingedgeslefttobedistributedoverthesides.

The number ofsides onthe backbone is the numberof edges in an unrooted binary tree with

|

PC

|

leaves, which is 2

|

PC

|

−

3.ThereforethetotalnumberofleavesofT

|

S is

|

S

| =

2

|

PC

| + |

PL

| ≤ |

PC

| +

4

(

2

|

PC

| −

3

)

+

5

(

d1

+

d2

− |

PC

|)

≤

4

|

PC

| +

5

(

d1

+

d2

)

−

12

.

Clearly,thisattainsitslargestvalueif

|

PC

|

=

d1

+

d2,inwhichcase

|

S

|

≤

9

(

d1

+

d2

)

−

12,aswastobeproven.

3.4. Combiningconﬂictingquartets

(9)

Lemma3. Let

Q

= {

Q1

,

. . . ,

Qk

}

beasetofconﬂictingquartetsforT1

,

T2,suchthatQ1

,

. . .

Qkarespanning-disjointinT1andin T2.

ThendM P

(

T1

,

T2

)

≥

k,andwecanﬁndawitnessingcharacterinpolynomialtime.

Proof. Foraquartet Q andtreeT ,wesaythatT

|

Q

=

ab

|

cd if Q

= {

a

,

b

,

c

,

d

}

andinT thepathbetweena andb is edge-disjointfromthepathbetweenc andd.Withoutlossofgenerality,wemayassume Qi

= {

ai

,

bi

,

ci

,

di

}

,T1

|

Qi

=

aibi

|

cidiand

T2

|

Qi

=

aici

|

bidiforeachi

∈ [

k

]

.

We will show how to build a character

χ

with two states, such that lχ

(

T1

)

≤

k, and lχ

(

T2

)

≥

2k. This shows that dM Pχ

(

T1

,

T2

)

≥

k,asrequired.

Theideaistoconstruct

χ

insuchawaythat,foreachquartet Qi,

χ

(

ai

)

=

χ

(

bi

)

=

χ

(

ci

)

=

χ

(

di

)

.Thiswillensurethat lχ

(

T2

)

isatleast2k,asT2willhaveatleast2k edge-disjointpaths(fromai toci andfrombidi,foreach i

∈ [

k

]

)thateach requireatleastonechangeinstatealongsomeedge.

Foreach Qi,leteQi denoteanedgeinT1 suchthatinT1

[

Qi

]

,ei isonthepaththatseparates

{

ai

,

bi

}

from

{

ci

,

di

}

.

Now weconstructafunction

φ

:

V

(

T1

)

→ {

red

_,

blue

}

_as_follows._Start_by_choosing _an_arbitrary_leaf_in_T₁_,_say_without lossofgeneralitya1,andset

φ (

a1

)

=

red.Nowproceedasfollows.Foranyedgeuv inT₁suchthat

_{φ (}

u

₎

isdeﬁnedbut

_{φ (}

v

₎

isnot,weset

φ (

v

)

= φ(

u

)

,unlessuv

=

eQi forsomei.Inthatcase,weset

φ (

v

)

=

blueif

φ (

u

)

=

red,andset

φ (

v

)

=

red

otherwise.

Nowwecanlet

χ

betherestrictionof

φ

to X .Byconstruction,

φ

isanextensionof

χ

toT1and

(φ)

= |

eQi

:

i

∈ [

k

]|

=

k.

Thisisenoughtoshowthatlχ

(

T1

)

≤

k.

We now show that

χ

(

ai

)

=

χ

(

bi

)

=

χ

(

ci

)

=

χ

(

di

)

, foreach i

∈ [

k

]

.To seethis, consider thespanning tree T1

[

Qi

]

. By construction, T1

[

Qi

]

containstheedgeeQi andeQi separates

{

ai

,

bi

}

from

{

ci

,

di

}

.Letui

,

vi betheverticesofeQi,withui

thevertexclosertoai andbi.Notethat T1

[

Qi

]

cannotcontaineQj forany j

=

i,asT1

[

Qi

]

andT1

[

Qj

]

areedge-disjoint.It

followsthatui

,

aibiareallassignedthesamevalueby

φ

andvi

,

ci

,

diareassignedtheoppositevalue.Thusbydeﬁnitionof

χ

,wehave

χ

(

ai

)

=

χ

(

bi

)

= φ(

ui

)

= φ(

vi

)

=

χ

(

ci

)

=

χ

(

di

)

.

ItremainstoobservethatasQ1

,

. . .

Qkarespanning-disjointinT2,theai

−

ciandbi

−

dipathsinT2 arepairwise

edge-disjointforalli

∈ [

k

]

.Thenas

χ

(

ai

)

=

χ

(

ci

)

and

χ

(

bi

)

=

χ

(

di

)

,thereexistatleast2k edgesuv inT2with

φ

2

(

u

)

= φ

2

(

v

)

,for

anyextension

φ

2 of

χ

toT2.Itfollowsthatlχ

(

T2

)

≥

2k,andsodM P

(

T1

,

T2

)

≥

dM Pχ

(

T1

,

T2

)

= |

lχ

(

T1

)

−

lχ

(

T2

)

|

≥

2k

−

k

=

k.

Sinceeach edgeisprocessedatmostonceintheconstruction of

χ

,itisclearthat thisconstructiontakespolynomial

time.

3.5. Constructinganinitialpartition

InthissectionweproveLemma4.

Lemma4. Supposethat

|

X

|

≥

2ct forsomeintegersc andt,andletT1beaphylogenetictreeonX .

TheninpolynomialtimewecanconstructapartitionS1

,

. . . ,

StofX withS1

,

. . . ,

Stspanning-disjointinT1,suchthat

|

Si

|

≥

c foreachi.

Proof. Weprovetheclaimbyinductionont.Forthebasecase,ift

=

1 thenwemaylet S1

=

X ,andwehavethedesired

partition.

Fortheinductivestep,assume

|

X

|

≥

2ct andthattheclaimistrueforsmallervaluesoft.Weﬁrstﬁxanarbitraryrooting on T1.Thatis,chooseanarbitraryedgee in T1 andsubdivideitwithanew(temporary)vertexr,thenorientalledges in T1 awayfromr.Underthisrooting,letu bealowestvertexinT1 forwhichu hasatleastc descendantsin X .Let St

⊆

X bethesetofthesedescendants. NotethatsinceT1isbinary,

|

St

|

<

2c,asotherwiseoneofthetwochildrenofu wouldbe alowervertexwithatleastc descendants.

NowconsidertheinducedsubtreeT1

|

X,where X

=

X

\

St.As

|

St

|

<

2c,wehaveX

≥

2c

(

t

−

1

)

.Thenbytheinductive hypothesis,we canconstructapartition S1

,

. . . ,

St−1 of X with S1

,

. . . ,

St−1 spanning-disjointinT1

|

X,suchthat

|

Si

|

≥

c for each i.By construction itis clearthat St is spanning-disjointin T1 from S1

,

. . . ,

St−1.Thus S1

,

. . . ,

St isthe desired partition.

Astheconstructionof St canbedoneinpolynomialtimeandthisprocessisrepeatedt

≤ |

X

|

times,theentireprocess takespolynomialtime.

3.6. Well-behavedsets

InthissectionweproveLemma5.Westartwithanobservation:

Observation1.Forany(notnecessarilybinary)unrootedtreeT withn vertices,andanyintegerd

≥

1,thenumberofverticesinT withdegreestrictlygreaterthand isatmostn

/

d.1

(10)

Proof. Foreachvertexv inT letd

(

v

)

denotethedegreeofv.Recallthatanunrootedtreewithn verticeshasexactlyn

−

1 edges.Itfollowsthat

v∈V(T)

d

(

v

)

=

2

|

E

(

T

)

| =

2n

−

2

.

NowsupposethatT hasm

>

n

/

d verticeswithdegreestrictlygreaterthand,i.e.atleastd

+

1.Theremainingn

−

m vertices

allhavedegreeatleast1,fromwhichitfollowsthat

v∈V(T)

d

(

v

)

≥

m

(

d

+

1

)

+

n

−

m

=

md

+

n

≥ (

n

/

d

)

d

+

n

=

2n

,

acontradiction.

Lemma5. Let

χ

bethecharacterdeﬁnedbythepartitionS1

,

. . . ,

StwhereS1

,

. . . ,

Starespanning-disjointinT1,letd1

,

d2bepositive integerssuchthatd1d2

−

d1

−

d2

>

0,andassume

t

≥

(

2d1d2

+

d1) d1d2

−

d1

−

d2

k

.

TheneitherdM Pχ

(

T1

,

T2

)

≥

k,orinpolynomialtimewecanﬁndasetofindicesi1

,

. . .

ikwithk

≥

k suchthat:

•

Si1

,

. . .

Sikarespanning-disjointinT2(aswellasinT1);

• ∈ [

k

]

;and

• ∈ [

k

]

.

Proof. By Lemma 1, lχ

(

T1

)

=

t

−

1. If lχ

(

T2

)

≥

t

+

k

−

1, then dM Pχ

(

T1

,

T2

)

≥

k as required. So we may assume that lχ

(

T2

)

≤

t

+

k

−

2.Let

δ

=

lχ

(

T2

)

−

lχ

(

T1

)

,andobservethat0

≤ δ ≤

k

−

1.

Wenow constructa partition P1

,

. . .

Ps of X whichisspanning-disjoint inT2 (seeFig.3foran illustration).Let

φ

2 be

anoptimalextensionof

χ

to T2.Aslχ

(

T2

)

=

lχ

(

T1

)

+ δ =

t

+ δ −

1,theforestinducedby

φ

2 hasexactlys monochromatic

connectedcomponents,wheres

=

t

+ δ

.Let P1

,

. . . ,

Ps bethepartition of X formedbytakingtheintersectionof X with the vertex set of each tree in this forest. Observe that by construction P1

,

. . .

Ps are spanning-disjoint in T2, and that

furthermoreeachPjisasubsetofSiforsomei

∈ [

t

]

(aseachelementof Pjisassignedthesamevalueby

φ

2,andthusby

χ

).

Nowlet

I ⊆ [

t

]

denotethesetofindicesi in

[

t

]

suchthat

•

Si

=

Pjforsome j

∈ [

s

]

;

•

Sihasdegreeatmostd1 inT1;and

•

Sihasdegreeatmostd2 inT2.

Notethat since P1

,

. . .

Pj arespanning-disjoint in T2,the sets

{

Si

:

i

∈

I}

arealso spanning-disjointin T2. Noticethat

it is suﬃcient to provethat

|

I|

≥

k, whence anysubset of k indices from

I

satisﬁesthe lemma. We will prove thisby providingupperboundsonthenumberofindicesin

[

t

]

thatdonotsatisfytheconditionsof

I

.

Let

I

0 denotethesetofindicesi

∈ [

t

]

suchthat Pj

=

Si forany j

∈ [

s

]

.Weﬁrstclaimthat

|

I

0

|

≤ δ

.Indeed,sinceevery Pjisasubsetofsome Si andS1

,

. . .

St andP1

,

. . . ,

PsarebothpartitionsofX ,wehavethatforeveryi

∈

I

0,thereexistat

leasttwodistinctindices j

,

j

∈ [

s

]

forwhich Pj

,

Pj

⊂

Si.Hence,

s

≥

2

|

I

0

| + |[

t

] \

I

0

| =

t

+ |

I

0

|.

Thereforeif

|

I

0

|

> δ

thens

>

t

+ δ

,contradictingthedeﬁnitionofs.Thus,wehave

|

I

0

|

≤ δ

.

Next, let

I>

d1 denote the set of indices i

∈ [

t

]

for which Si has degree greater than d1 in T1. We will show that

|

I>

d1

|

≤

t

/

d1.Foreach i

∈ [

t

]

,compressthespanningsubtreeT1

[

Si

]

toasinglevertex,andobservethatthedegreeofthis vertexisequaltothedegreeofSiinT1.Anyvertexu whichisnotpartofanyT1

[

Si

]

ismergedwithoneofitsneighbours. Notethatthismergingprocesscanonlyincreasethedegreesoftheremainingvertices.CalltheresultingtreeT₁.SeeFig.4.

T₁ hast vertices,eachofthemcorrespondingtoasubset Si,andhavingdegreeatleastthedegreeofthecorresponding Si inT1.NowbyObservation1,thereareatmostt

/

d1 verticesin T1 withdegreegreaterthand1.Itfollowsthatthereareat

mostt

/

d1 valuesofi

∈ [

t

]

forwhich Sihasdegreegreaterthand1 inT1,andthus

|

I>

d1

|

≤

t

/

d1 aswewantedtoshow. Similarlylet

J>

d2 denotethesetofindices j

∈ [

s

]

forwhich Pj hasdegreegreaterthand2 inT2.Bysimilararguments asusedfor

I>

d1 above,wecanshowthat

|

J>

d2

|

≤

s

/

d2.

Noticethatforanyi

∈ [

t

]

,ifi isnotin

I

,theneitheri

∈

I

0,ori

∈

I>

d1,orthereexists j

∈

J>

d2 suchthat Si

=

Pj.We thereforehavethat

(11)

Fig. 3. Illustrationoftheconstructionofpartition P1,P2,P3,P4,P5 fromS1,S2,S3.Solidedgesaremonochromaticanddashededgesarebichromatic

underanoptimalextensionforχ,whereχisthecharacterinducedbyS1,S2,S3.

Fig. 4. Illustrationoftheconstructionofauxiliarytree T1,givenapartitionofX with S1= {a,b,c},S2= {d,e,f},S3= {g,h,i},S4= {j,k},S5= {l,m}.

Notethattheinternalvertexlabelledu isnotpartofT1[Si]foranyi,sowemergeitwithanarbitraryadjacentvertex.Inthiscasewemergeu into

(12)

Now,usingthatt

≥

(2d1d2+d1) d1d2−d1−d2k,s

=

t

+ δ

and

δ

≤

k

−

1,wehave:

|

I

| ≥

t

− |

I

0

| − |

I>

d1

| − |

J>

d2

|

≥

t

− δ −

t

/

d1

−

s

/

d2

=

t

− δ −

t

/

d1

− (

t

+ δ)/

d2

=

d1d2t

−

d1d2δ

−

d2t

−

d1t

−

d1δ d1d2

=

(

d1d2

−

d1

−

d2)t

− (

d1d2

+

d1)δ d1d2

≥

(

d1d2

−

d1

−

d2

)

t

− (

d1d2

+

d1

)(

k

−

1

)

d1d2

≥

(

2d1d2

+

d1

)

k

− (

d1d2

+

d1

)(

k

−

1

)

d1d2

=

d1d2k

+

d1d2

+

d1 d1d2

>

d1d2k d1d2

=

k

,

as we needed to prove. To see that

I

can be constructed in polynomial time, it suﬃces to observe that the partition

P1

,

. . . ,

Pscan beconstructedinpolynomialtime(asthe

φ

2 canbe foundinpolynomialtime),andafterthiseach Si can becheckedformembershipin

I

inpolynomialtime.

3.7. ProofofTheorem1

Lemma6.Letd1

,

d2bepositiveintegerssuchthatd1d2

−

d1

−

d2

>

0.Let

(

T1

,

T2

)

beapairofbinaryunrootedphylogenetictreeson X thatareirreducibleunderReductionRules1and2.

Thenif

|

X

|

≥

2ct,wherec

=

9

(

d1

+

d2

)

−

11 andt

=

d(12dd21−d2d+1−d1d)2

k,itholdsthatdM P

(

T1

,

T2

)

≥

k,andwecanﬁndawitnessing

characterinpolynomialtime.

Proof. By Lemma4,thereexistsapartition S1

,

. . .

St of X ,allspanning-disjointin T1,andwith

|

Si

|

≥

c forall i

∈ [

t

]

.Let

χ

be thecharacter deﬁnedby S1

,

. . . ,

St. If

χ

isa witness todM P

(

T1

,

T2

)

≥

k,then we mayreturn

χ

andwe aredone.

Otherwise,wemayapplyLemma5toﬁndindicesi1

,

. . .

iksuchthat:

•

Si1

,

. . .

Sik areallspanning-disjointinT2 (aswellasinT1);

•

each Sij hasdegreeatmostd1 inT1;and

•

each Sij hasdegreeatmostd2 inT2.

Nowforeach Sij,wehavethatSij hasdegreed

j 1

≤

d1 inT1andd j 2

≤

d2 inT2,andthat

|

Sij

| ≥

c

>

9

(

d1

+

d2

)

−

11

≥

9

(

d j 1

+

d j 2

)

−

11

,

andalsothat

(

T1

,

T2

)

isirreducibleunderRules1and2.ThuswemayapplyLemma2,toﬁndaconﬂictingquartetQj

⊆

Sij

foreachij.

Finally, as Si1

,

. . .

Sik are spanning-disjoint in both T1 and T2, and as each Qj is a subset of Sij, we have that

Q1

,

. . . ,

Qk are also spanning-disjoint in both T1 and T2. Therefore we may apply Lemma 3 to ﬁnd a witnessing

char-acter fordM P

(

T1

,

T2

)

≥

k. As each stepofthis process takespolynomial time,the construction ofa witnessingcharacter

takespolynomialtime.

ItremainstocompletetheproofofTheorem1.

Theorem1. Thereexistsaconstant

α

(

α

=

560)forwhichthefollowingholds.Let

(

T1

,

T2

)

beapairofbinaryunrootedphylogenetic treesonX thatareirreducibleunderReductionRules1and2.

Thenif

|

X

|

≥

α

k,itholdsthatdM P

(

T1

,

T2

)

≥

k,andwecanﬁndawitnessingcharacter,i.e.acharacter

χ

yieldingdM Pχ

(

T1

,

T2

)

≥

k,inpolynomialtime.