• Nie Znaleziono Wyników

perspective of decision The role tree representation in regression problems – Anevolutionary Applied Soft Computing

N/A
N/A
Protected

Academic year: 2021

Share "perspective of decision The role tree representation in regression problems – Anevolutionary Applied Soft Computing"

Copied!
18
0
0

Pełen tekst

(1)

The role of decision tree representation in regression problems – An evolutionary perspective

Marcin Czajkowski

, Marek Kretowski

FacultyofComputerScience,BialystokUniversityofTechnology,Wiejska45a,15-351Bialystok,Poland

a r t i c l e i n f o

Articlehistory:

Received26September2015 Receivedinrevisedform20June2016 Accepted2July2016

Availableonline16July2016

Keywords:

Evolutionaryalgorithms Datamining

Regressiontrees

Self-adaptablerepresentation

a b s t r a c t

Aregressiontreeisatypeofdecisiontreethatcanbeappliedtosolveregressionproblems.Oneof itscharacteristicsisthatitmayhaveatleastfourdifferentnoderepresentations;internalnodescan beassociatedwithunivariateorobliquetests,whereastheleavescanbelinkedwithsimpleconstant predictionsormultivariateregressionmodels.Theobjectiveofthispaperistodemonstratetheimpact ofparticularrepresentationsontheinduceddecisiontrees.Asitisdifficultifnotimpossibletochoosethe bestrepresentationforaparticularprobleminadvance,theissueisinvestigatedusinganewevolutionary algorithmforthedecisiontreeinductionwithastructurethatcanself-adapttothecurrentlyanalyzed data.Theproposedsolutionallowsdifferentleavesandinternalnodesrepresentationwithinasingletree.

Experimentsperformedusingartificialandreal-lifedatasetsshowtheimportanceoftreerepresentation intermsoferrorminimizationandtreesize.Inaddition,thepresentedsolutionmanagedtooutperform populartreeinducerswithdefinedhomogeneousrepresentations.

©2016ElsevierB.V.Allrightsreserved.

1. Introduction

Datamining[18]canrevealimportantandinsightfulinforma- tionhiddenindata.However,appropriatetoolsandalgorithmsare requiredtoeffectively identifycorrelationsand patternswithin thedata.Decisiontrees[24,40]representoneofthemaintech- niquesfordiscriminantanalysispredictioninknowledgediscovery.

Thesuccessoftree-basedapproaches canbeexplainedbytheir easeofapplication,fastoperation,andeffectiveness.Furthermore, thehierarchicaltreestructure, in which appropriate testsfrom consecutivenodesare sequentiallyapplied,closely resembles a humanwayofdecisionmaking.Allthismakesdecisiontreeseasy tounderstand,evenforinexperiencedanalysts.Despite50yearsof researchondecisiontrees,manyproblemsstillremain[30],such assearchingonlyforalocallyoptimalsplitintheinternalnodes;

appropriatepruningcriterion, efficientanalysisofcost-sensitive dataorperformingmulti-objectiveoptimization.Tohelpresolve someoftheseproblems,evolutionarycomputation(EC)hasbeen appliedtodecisiontreeinduction[2].Thestrengthofthisapproach liesintheglobalsearchforsplitsandpredictions.Itresultsinhigher accuracyand smalleroutputtrees compared topopulargreedy decisiontreeinducers.

∗ Correspondingauthor.

E-mailaddress:m.czajkowski@pb.edu.pl(M.Czajkowski).

Finding appropriate representation of the predictor before actuallearningisadifficulttaskformanydataminingalgorithms.

Often,thealgorithmstructuremustbepre-definedandfixeddur- ingitslife-cycle,whichisamajorbarrierindevelopingintelligent artificialsystems.Thisproblemiswellknown[20]inartificialneu- ralnetworkswherethetopologyandthenumberofneuronsis unknown,insupportvectormachineswiththeirdifferenttypesof kernels,andindecisiontreeswherethereisaneedtoselectthe typeofnoderepresentation.Onesolutionistoautomaticallyadapt thestructureofthealgorithmtotheanalyzedproblemduringthe learningphase,whichcanbeaccomplishedusingtheevolution- aryapproach[27,33].Thisapproachisalsoappliedtoclassification trees [29,26]wherea mixed testrepresentation in theinternal nodesispossible.

Inthispaper,wewanttoinvestigatetheroleofregressiontree representationanditsimpactonpredictiveaccuracyandinduced treesizeasithasnotbeensufficientlyexplored.Usingartificially generateddatasets,wewillrevealtheprosandconsoftreeswith differentrepresentation types, focusing mainlyonevolutionary inducedtreesforregressionproblems[2].Differencesintherep- resentationofregressiontrees[30]canoccurintwoplaces:inthe testsintheinternalnodesandinthepredictionsintheleaves.For real-lifeproblems,itisdifficulttosaywhichkindofdecisiontree (univariate,oblique,regression,model)shouldbeused.Itisoften almostimpossibletochoosethebestrepresentationinadvance.To topitall,formanyproblemsheterogeneousnoderepresentation http://dx.doi.org/10.1016/j.asoc.2016.07.007

1568-4946/©2016ElsevierB.V.Allrightsreserved.

(2)

isrequiredwithinthesametree.Thisiswhywealsostudyaspe- cializedevolutionaryalgorithm(EA)calledtheMixedGlobalModel Tree(mGMT).Itinducesadecisiontreethatwebelieveself-adapts itsstructuretothecurrentlyanalyzeddata.Theoutputtreemay havedifferentinternal nodeandleafrepresentations, andfor a givendatasetitmaybeasgoodorevenbetterthananytreewith strictrepresentation.

Thepaperisorganizedasfollows.Thenextsectionprovides a briefbackgroundonregression trees. Section 3describesthe proposedextensionforevolutionaryinducerswithhomogeneous representations.AllexperimentsarepresentedinSection4,and thelastsectioncomprisestheconclusionandsuggestionsforfuture work.

2. Decisiontrees

Wemayfinddifferentvariantsofdecisiontreesintheliterature [30].Theycanbegroupedaccordingtothetypeofproblemthey areappliedto,thewaytheyareinduced,orthetypeofstructure.

Inclassificationtrees,aclasslabelisassignedtoeachleaf.Usu- ally,itisthemajorityclassofalltraininginstancesthatreaches thatparticularleaf.Inthispaper,wefocusonregressiontreesthat maybeconsideredvariantsofdecisiontreesdesignedtoapproxi- matereal-valuedfunctionsinsteadofbeingusedforclassification tasks.Althoughregressiontreesarenotaspopularasclassification trees,theyarehighlycompetitivewithdifferentmachinelearning algorithms[35]andareoftenappliedtomanyreal-lifeproblems [16,28].

Inthecaseofthesimplestregressiontree,eachleafcontains aconstantvalue,usuallyanaveragevalueofthetargetattribute.

Amodeltreecanbeseenasanextensionofthetypicalregres- siontree[46,31].Theconstantvalueineachleafoftheregression treeisreplacedinthemodeltreebyalinear(ornonlinear)regres- sionfunction.Topredictthetargetvalue,thenewtestedinstance is followed downthetree froma root nodetoa leaf usingits attributevaluestomakeroutingdecisionsateachinternalnode.

Next,thepredictedvalueforthenewinstanceisevaluatedbased on a regression model in the leaf. Examples of predicted val- ues of classification, regression, and model trees are given in Fig.1.Thegraylevel colorof eachregionrepresentsadifferent classlabel(foraclassificationtree), andtheheightcorresponds to the value of the prediction function (regression and model trees).

Mostdecisiontreespartitionthefeaturespacewithaxis-parallel decisionborders[44].Thistypeoftreeiscalledunivariatebecause eachsplitinthenon-terminalnodeinvolvesasinglefeature.For continuous-valuedfeatures,inequalitytestswithbinaryoutcomes areusuallyapplied,andfornominalfeaturesmutuallyexclusive groupsoffeaturevaluesareassociatedwiththeoutcomes.When morethanonefeatureistakenintoaccounttobuildatestinan

internalnode, wedeal withmultivariatedecision trees[8].The mostcommonformofsuchatestisanobliquesplit,whichisbased onalinearcombinationoffeatures.Thedecisiontreethatapplies onlyobliquetestsisoftencalledobliqueorlinear,whereashetero- geneoustreeswithunivariate,linear,andothermultivariate(e.g., instance-based)testsarecalledmixedtrees[29].Fig.2showsan exampleofunivariateandobliquedecisiontrees.Wecanobserve thatifdecisionbordersarenotaxis-parallel,thenusingonlyuni- variatetestsmayleadtoanovercomplicatedclassifier.Thiskindof situationisknownasa‘staircaseeffect’[8]andcanbeavoidedby applyingmoresophisticatedmultivariatetests.Whileobliquetrees aregenerallysmaller,thetestsareusuallymoredifficulttointer- pret.Itshouldbeemphasizedthatthecomputationalcomplexity ofmultivariatetreeinductionissignificantlyhigherthanthatof univariatetreeinduction[3].

Theroleoftreerepresentationhassofarbeendiscussedmainly interms ofclassificationproblems.Thestudy[25,8] showsthat univariate inducers return larger trees than multivariate ones, andtheyareoftenlessaccurate.However,multivariatetreesare difficulttounderstandandinterpret,andthetreeinductionissig- nificantlyslower.Therefore,makingageneralconclusionisrisky asthemostimportantfactorsarethecharacteristicsofthepar- ticular dataset [25]. To thebest of our knowledge,there is no detailedreportthatreferstotheroleofrepresentationinregres- siontrees.Itcouldbeexpectedthatunivariate andmultivariate regressiontreesshouldbehavesimilarlytotheclassificationones.

However,there is stillan openquestionabouttheinfluenceof the leaves’ representation on the tree performance. The paper focuses on evolutionary induced regression trees; therefore, to go further, we must briefly describe the process of creating a decision treefromthetraining set.Thetwo mostpopularcon- ceptsforthedecisiontreeinductionarethetop-downandglobal approaches. Thefirstis basedonagreedy procedureknownas recursivepartitioning[39].Inthetop-downapproach,theinduc- tionalgorithmstartsfromtherootnodewherethelocallyoptimal splitissearchedaccordingtothegivenoptimalitymeasure.Next, thetraininginstancesareredirectedtothenewlycreatednodes, andthisprocessisrepeatedforeachnodeuntilastoppingcon- dition ismet.Additionally, post-pruning [15]is usually applied aftertheinductiontoavoidtheproblemofover-fittingthetraining data.

Oneofthemostpopularrepresentativesoftop-downinduced univariateregressiontreesisasolutionproposedbyBreimanetal.

calledClassificationAndRegressionTree(CART)[7].Thealgorithm searches for a locally optimalsplit that minimizes the sumof squaredresidualsandbuildsapiecewiseconstantmodelwitheach terminalnodefittedwiththetrainingsamplemean.Othersolutions have managedtoimprovethepredictionaccuracy byreplacing singlevalues intheleaveswithmoreadvancedmodels.TheM5 system[46]inducesatreethatcontainsmultiplelinearmodelsin

Fig.1. Anillustrationofpredictedvaluesoftheclassification,regression,andmodeltrees.

(3)

Fig.2.Anexampleofobliqueandunivariatedecisiontrees.

theleaves.AsolutioncalledStepwiseModelTreeInduction(SMOTI) [31]canbeviewedasanobliquemodeltreeastheregressionmod- elsareplacednotonlyintheleavesbutalsointheupperpartsof thetree.Allaforementionedmethodsinducetreeswiththegreedy strategy,whichisfastandgenerallyefficientbutoftenproduces onlylocallyoptimalsolutions.

Theglobalapproachforthedecisiontreeinductionlimitsthe negativeeffects oflocally optimaldecisions.It triestosimulta- neouslysearchforthetreestructure,testsintheinternalnodes,and modelsintheleaves.Thisprocessisobviouslymuchmorecompu- tationallycomplexbutcanrevealhiddenregularitiesthatareoften undetectablebygreedymethods.Theglobalinductionismainly representedbysystemsbasedonanevolutionaryapproach[2,4];

however,therearesolutionsthatapply,forexample,antcolony optimization[36,6].

In the literature, there are relatively fewer evolutionary approachesfortheregressionandmodeltreesthanfortheclas- sification trees. Popular representatives of EA-based univariate regressiontreesaretheTARGETsolution[17]thatevolveaCART- likeregression treewithbasic geneticoperators and theuGRT algorithm[11]thatintroducesspecializedvariantsofmutationand crossover.AstronglytypedGP(GeneticProgramming)approach calledSTGPwasalsoproposed[21]forunivariateregressiontree induction. Thereare also globally induced systems that evolve univariatemodeltrees,suchastheE-Motiontree[1]thatimple- ments standard 1-point crossover and two different mutation strategiesandtheGMTsystem[12]thatincorporatesknowledge aboutthe inducingproblem for theglobal modeltree intothe evolutionarysearch.Therearealsopreliminarystudiesonoblique treescalledoGMT[10].Intheliterature,wemayalsofindtheGP

(4)

Fig.3. ThemGMTprocessdiagram.

approachthatevolvesthemodeltreeswithnonlinearregression modelsintheleavescalledGPMCC[38].Itiscomposedfromthe GPtoevolve thestructure ofthemodel treesandGAtoevolve polynomialexpressions(GASOPE)[37].

3. MixedGlobalModelTree

Thispaperfocusesontherepresentation ofgloballyinduced regressionandmodeltreesanditsinfluenceontheoutputtree.

In this section,we proposean extensionfor theGMT and GRT systems[12]calledtheMixedGlobalModelTree(mGMT)tobet- terunderstandtheunderlyingprocessbehindtheselectionofthe representation.Withtheevolutionarytreeinduction,weareable notonlytosearchforanoptimaltreestructure,testsininternal nodes,ormodelsintheleavesbutalsotoself-adaptthetreerepre- sentation.Thegeneralstructureofthealgorithmfollowsatypical EAframework[32]withanunstructuredpopulationandagen- erationalselection.It canbetreated asaunifiedframework for bothunivariateandobliquetestsintheinternalnodesandregres- sionand modelsleaves.ThemGMTdoesnot requiretosetthe treerepresentation in advance becausetheEA validates differ- entvariantsoftherepresentationsnotonlyonthetreelevelbut alsoonthenodelevelandmayinduceaheterogeneoustreethat wecalledamixedtree.Adescriptionoftheproposedapproachis given,especiallywithrespecttoissuesthatarespecifictomixed trees.

TheprocessdiagramofthemGMTalgorithmisillustratedin Fig.3.Theproposedsolutionevolvestheregressionandmodeltrees intheiractualforms.Thecandidatesolutionsthatconstitutethe populationareinitializedwiththesemi-randomgreedystrategy andareevaluatedusingthemulti-objectiveweightformulafitness function.Iftheconvergencecriteriaisnotsatisfied,alinearrank- ingselectionisperformedtogetherwiththeelitiststrategy.Next, geneticoperatorsareapplied,includingdifferentvariantsofspe- cializedmutationsandcrossovers.Aftertheevolutionprocessis finished,thebestindividualfoundusingtheEAissmoothed.Each elementofthemGMTsolutionisdiscussedindetailinthefollowing sections.

3.1. Representation

Amixedregression treeis acomplexstructure inwhichthe number and the type of nodes and even the number of test outcomes are not known in advance for a given learning set.

Fig.4.AnexamplerepresentationofthemGMTindividual.

Therefore,thecandidatesolutionsthatconstitutethepopulation are not encoded and are representedin theiractualform (see Fig.4).

Therearethreepossibletesttypesintheinternalnodes:two univariate and onemultivariate.In thecase of univariate tests, a test representation concerns only one attribute and depends ontheconsideredattributetype.Forcontinuous-valuedfeatures, typicalinequalitytests withtwo outcomesare used.Fornomi- nalattributes,atleastoneattributevalueisassociatedwitheach branchstartinginthenode,whichmeansthataninternaldisjunc- tionisimplemented.Onlybinaryorcontinuous-valuedattributes areusedtoconstructtheobliquesplit.Thefeaturespacecanbe dividedintotworegionsbyahyperplane:

H(w,)={x:w,x=}, (1)

wherexisavectoroffeaturevalues(objects),w=[w1,...,wP]isa weightvector,isathreshold,w,xrepresentsaninnerproduct, andPisthenumberofindependentvariables.Eachhyperplaneis representedbyafixed-sizeP+1–dimensionaltableofrealnum- berscorrespondingtotheweightvectorwandthethreshold.

IneachleafofthemGMTsystem,amultiplelinearmodelcan beconstructedusingthestandardregressiontechnique.Itiscal- culatedonlyforobjectsassociatedwiththatnode.Adependent variableyisexplainedbythelinearcombinationofmultipleinde- pendentvariablesx1,x2,...,xP:

y=ˇ01∗x12∗x2+...+ˇP∗xP, (2) whereˇ0,...,ˇParefixedcoefficientsthatminimizethesumof thesquaredresidualsofthemodel.IfallBi(0<i≤P)areequalto 0,theleafnodewillbearegressionnodewithaconstantequal toˇ0.IfonlyoneBi /=0then,wedealwithsimplelinearregres- sion; otherwiseeach leafcontainssimple ormultivariatelinear regression.

3.2. Initialization

Eachinitialindividualinthepopulationiscreatedwiththeclas- sicaltop-downapproachthatresemblestheM5solution[46].The initialpopulationofmGMTisheterogeneousandiscomposedof five typesof standard regression trees withdifferentrepresen- tations(fourhomogeneousandoneheterogeneous):aunivariate regressiontree;anobliqueregressiontree;aunivariatemodeltree;

anobliquemodeltree;anda mixedtreethatcontains different kindsoftestsintheinternalnodes(univariateandoblique)anddif- ferenttypesofleaves(regressionandmodel).Inmixedtrees,before each stepofrecursivepartitioning, thetype ofnodeis selected

(5)

Fig.5.Hyperplaneinitializationbasedonrandomlychosen‘longdipole’(left)andanexampleillustratinghowtheobliquetestiscreated(right).

randomlyandanappropriatetestormodelisgenerated.Theimpor- tanceofsucha heterogeneousinitialpopulationisitsdiversity.

Therecursivepartitioningisfinishedwhenthedependentvalue ispredictedforalltrainingobjectsinthenodeorthenumberof instancesinthenodeissmall(default:fiveinstances).Eachini- tialindividualiscreatedbasedonasemi-randomsubsampleofthe originaltrainingdata(default:10%ofdata)tokeep thebalance betweenexplorationandexploitation.Toensurethatthesubsam- plecontainsobjectswithvariousvaluesofthepredictedattribute, thetrainingdataissortedbythepredictedvalueandsplitintoa fixednumberofequal-sizefolds(default:10).Fromthesefolds, anequal numberofobjectsisrandomlychosenandplacedinto thesubsample.Testsinnon-terminalnodesarecalculatedfroma randomsubsetofattributes(default:50%).

Inthecaseoftheunivariateinternalnodes,oneofthreememetic searchstrategies[12]thatinvolvesemployingthelocallyoptimized testsischosen:

• LeastSquares(LS):thetestintheinternalnodeischosenaccord- ingtothenodeimpuritymeasuredbythesumofthesquared residuals.

• LeastAbsoluteDeviation(LAD):thetestreducesthesumofthe absolutedeviations.Itismorerobustandhasgreaterresistance tooutlyingvaluesthanLS.

• Dipolar:thetestisconstructedaccordingtothe‘longdipole’[12]

strategy.Atfirst,aninstancethat willconstitutethedipoleis randomlyselectedfrom thesetof instances fromthecurrent node.The restofthefeaturevectorsaresorted indecreasing orderaccordingtothedifferencebetweenthedependentvari- ablevaluesandtheselectedinstance.Thesecondinstancethat constitutesthedipoleshouldhaveamuchdifferentvaluethan thedependentvariable.Tofindit,weappliedamechanismsimi- lartotherankinglinearselection[32].Finally,thetestthatsplits thedipoleisconstructedbasedonarandomlyselectedattribute wheretheboundarythresholdisdefinedasamidpointbetween thepairsthatconstitutethedipole.

Thesearchstrategyusedtofindsplitsintheinternalnodesis differentfortheobliquetests.Aneffectivetestinanon-terminal nodeissearchedonlyusingthedipolarstrategy.Fig.5(left)illus- tratesthehyperplaneinitializationbasedonarandomlychosen

‘longdipole’.ThehyperplaneHij(w,)splitsthedipole(xi,xj)in suchawaythatthetwofeaturevectorsxiandxjaresituatedon theoppositesidesofthedividinghyperplane:

(w,xi−)∗(w,xj−)<0. (3) The hyperplane parameters are as follows: w=xixj and

=ı*w,xi+(1−ı)*w,xj,whereı∈(0,1)isarandomlydrawn coefficientthatdeterminesthedistancebetweentheoppositeends

ofthedipole.Hij(w,)isperpendiculartothesegmentconnecting thedipoleends.

Toprovideanumericexampleillustratinghowanobliquetest iscreated,let’simaginethetwo2dimensionalspaceillustratedin Fig.5(right).Aftertheselectionoftworandomlychosendipoles withCartesiancoordinatesequaltoA(1,1),B(5,3),andcoefficient ı=0.5,thesplittinghyperplaneHparametersare:w[5−1,3−1]

and=0.5*((5−1)*(1+5))+0.5*((3−1)*(1+3))=16.Therefore, thehyperplane HAB is a linedescribed as: y=−2*x+8. To per- forma split,wesimplycheck onwhich side ofthehyperplane Hallinstancesfromtheinternalnodearepositioned.Let’scon- siderpointC(1.5,2.5).Byapplyingittothehyperplaneequationw (1.5*4+2.5*2),weseethatthescore11issmallerthanthevalue of.Usingadifferentpoint,forexample,D(3.5,4.5)wouldresultin value23,whichmeansthatthepointDliesontheoppositesideof thehyperplanetopointC.Forthisparticularexample,theparame- terıequals0.5;therefore,thehyperplanewintersectsthemidpoint betweendipolesAandB.However,ifwechangetheparameterto ı=0.1,thenthehyperplanedenotedasHAB shiftstowardspointA.

WecanobservethatforthishyperplaneHpointCandpointDlie onthesamesideandthusbothinstanceswouldbedirectedafter thesplittothesamesub-node.

3.3. Goodnessoffit

Theevolutionarysearchprocessisverysensitivetotheproper definitionofthefitnessfunction.Inthecontextofregressiontrees, a direct minimization of the prediction error measured in the learningsetusuallyleadstotheover-fittingproblem.Intypical top-downinduction ofdecision trees [39], thisproblemis par- tiallymitigatedbydefiningastoppingconditionandbyapplying post-pruning[15].Inthecaseoftheevolutionaryapproach,the multi-objective functionis required tominimizetheprediction errorandthetreecomplexityatthesametime.

Inourapproach,aBayesianinformationcriterion(BIC)[41]is usedasafitnessfunction.Itwasshownthatthiscriterionworked wellwithregressionandmodeltrees[17,12]andoutperformsother popularapproaches.BICisgivenby:

FitBIC(T)=−2∗ln(L(T))+ln(n)∗k(T), (4) whereL(T)isthemaximumofthelikelihoodfunctionofthetreeT, nisthenumberofobservationsinthedata,andk(T)isthenumber ofmodelparametersinthetree.Thelog(likelihood)functionL(T) istypicalforregressionmodelsandcanbeexpressedas:

ln(L(T))=−0.5n∗[ln(2)+ln(SSe(T)/n)+1], (5) whereSSe(T)isthesumofsquaredresidualsofthetreeT.Theterm k(T)canalsobeviewedasapenaltyforover-parametrization.

Theproposedmixedtreerepresentationrequiresdefininganew penaltyforthetreeover-parametrization.Itisratherobviousthat

(6)

ininternalnodesanobliquesplitbasedonafewfeaturesismore complexthanaunivariatetest.Thesameappliestothedifferent leafrepresentations.Asaconsequence,thetreecomplexityk(T) shouldnotonlyreflectthetreesizebutalsothecomplexityofthe testsininternalnodesandmodelsintheleaves.However,itisnot easytoarbitrarilysettheimportanceofdifferentmeasuresbecause itoftendependsonthedatasetbeinganalyzed.Insuchasituation, thetreecomplexityk(T)isdefinedas:

k(T)=˛1∗Q(T)+˛2∗O(T)+˛3∗W(T), (6) whereQ(T)isthenumberofnodesinthemodeltreeT;O(T)isequal tothesumofthenumberofnon-zeroweightsinthehyperplanesin theinternalnodes,andW(T)isthesumofthenumberofattributes inthelinearmodelsintheleaves.Defaultvaluesoftheparame- tersare˛1=2.0,˛2=1.0,and˛3=1.0;however,furtherresearch todeterminetheirvaluesisneeded.Ifthei-thinternalnodeTiis univariate,thevalueofO(Ti)equals1.Ifthej-thleafcontainsacon- stantvalue,thentheparameterW(Tj)equalszerobecausethere arenoattributesinthelinearmodel.Otherwise,thevalueofO(Ti) andW(Tj)equalsthenumberofattributesusedtobuildthetestin internalnodeiorthemodelinleafj.

Theflexibilityofthefitnessfunctionallowsitssimpleconfig- urationbased onadditionalknowledge or userpreferences,for example,ifusersknowthebasicrelationshipsinthedataorwantto limittreerepresentationstothedesiredones,thefitnessfunction canassignahighvalueto˛2or˛3orboth.

3.4. Geneticoperators

Tomaintaingeneticdiversity,themGMTalgorithmappliestwo specializedgeneticoperatorscorrespondingtoclassicalmutation andcrossover.Ingloballyinducedtreeswithstrictrepresentations, thereareseveralvariantsoftheoperators[11,12];however,their availabilitymainlydependsontherepresentationtype.Bothoper- atorsareappliedwithagivenprobabilityandinfluencethetree structure,thetestsinnon-terminalnodes,andoptionallythemod- elsintheleaves.Afteranysuccessfulmutationorcrossover,itis usuallynecessarytorelocatelearningvectorsbetweenthepartsof thetreerootedinthealterednode.Thiscancausepruningofcertain partsofthetreethatdonotcontainanylearningvectors.Inaddi- tion,thecorrespondingmodelsintheaffectedindividualleavesare recalculated.Duetoperformancereasons,thecoefficientsinthe existinglinearmodelsarerecalculatedtofitarandomlyselected sampleoftheactualdata(nomorethan50instances)inthecorre- spondingleaves.

Eachcrossoverbeginswithrandomlyselectingtwoindividuals fromthepopulationthatwillbeaffected.Next,thecrossoverpoints inbothindividualsaredetermined.Wehaveadaptedallvariants proposedintheunivariatetreeinducer[12]toworkwiththemixed representation,visualizedinFig.6:

(a)exchangesubtrees:exchangedofsubtreesstartinginrandomly selectednodes;

(b) exchange branches: exchanges of branches that starts from selectednodesinrandomorder;

(c) exchange tests: recombines the tests (univariate nominal, univariate continuous-valued, and oblique) associated with randomlyselectedinternalnodes;

(d)withbest:crossoverswiththebestindividual;

(e)asymmetric: duplicates subtrees with small mean absolute errorsandreplacesnodeswithhigherrors.

Selected nodes for the recombination must have the same numberofoutputs;however,theymayhavedifferentrepresen- tations.Thiswaycrossoversshiftnotonlythetreestructurebut alsothenodes’representations.Inthevariants(d)withbestand

(e) asymmetric, the additional mechanism is applied to decide whichnodewouldbeaffected.Thealgorithmranksalltreenodes in bothindividuals accordingtotheirabsoluteerrordividedby thenumber of instancesin thenode. Theprobability ofselect- ingnodesisproportionaltotherankinalinearway.Thenodes with a small averageerror per instance are more likely to be donors,whereas theweaknodes(withahighaverageerrorper instance)aremorelikelytobereplacedbythedonorsfromthe second individual (and have a higher probability of becoming receivers).

Themutationofanindividualstartswiththeselectionofanode type (equal probabilityof selecting a leafor an internalnode).

Next,arankedlistofnodesoftheselectedtypeforthisindivid- ualiscreated.Dependingonthetypeofnode,therankingtakes intoaccountthelocationforinternalnodes(nodesinthelower partsofthetreearemutatedwithhigherprobability)andthepre- dictionerrorofthenode(nodeswithahighererrorperinstance aremorelikelytobemutated).Finally,amechanismanalogousto therankinglinearselection[32]isappliedtodecidewhichnode intheindividualwillbeaffected.Dependingonthenode’srepre- sentation,differentvariantsofoperatorsareavailableininternal nodes:

• prune:changesinternalnodetoaleaf(actslikeapruningproce- dure);

• parentwithchild(branches):replacesaparentnodewitharan- domlyselectedchildnode(internalpruning);

• parentwithchild(tests):exchangestestsbetweenparentandran- domlyselectedchildnodes;

• newdipolartest:testsinaffectednodeisreinitializedbyanew oneselectedusingthedipolarstrategy;

• newmemetictest:testsinnodeisreinitializedbyoneoftheopti- malitystrategiesproposedinSection3.2;

• modifytest: shiftshyperplaneorsetrandomweights (oblique test);shiftsthreshold(univariatetestonacontinuousattribute) or re-groups nominal attribute values by adding/merging branchesormovingvaluesbetweenthem;

• recalculatemodels:recursivelyrecalculateslinearmodelsusing alltheinstancesinthecorrespondingleaves;

andintheleaves:

• dipolarexpand:transformsleafintointernalnodewitha new dipolartest(randomtype);

• memeticexpand:transformsleafintointernalnodewithanew testselectedbyoneoftheoptimalitystrategies;

• changemodel:extends/simplifies/changesthelinearmodelinthe leafbyadding/removing/replacingarandomlychosenattribute orremovingtheleastsignificantone.

Foramoredetaileddescriptionofmutationvariants,pleaserefer to[12].

Inaddition,weproposeanewmechanismcalledSwitchthat assuresthediversityofnoderepresentationswithinthepopulation.

It isembeddedinthespecifiedvariantsofthemutation(prune, expand,andnewtest)thatrequirefindingnewtestsintheinternal nodesormodelsintheleaves.TheSwitchmechanismwithassigned probabilitychangestheinitialrepresentationoftheselectednodes:

• thetestintheinternalnodewhencalculatinganewtestwiththe samenumberofoutputs:

–withthechangefromunivariatetooblique(internalnodes), anewcalculatedhyper-planeinvolvesanattributefromthe univariatetest;

(7)

Fig.6.Visualizationofcrossovers,fromtoplefttobottomright:(a)exchangesubtrees,(b)exchangebranches,(c)exchangetests,(d)withbest,and(e)asymmetric.

–withthechangefromobliquetounivariate(internalnodes),a newunivariatetestisbasedonarandomlyselectedattribute fromtheobliquetest.

• newlycreatednodesthatinherittheirrepresentationfromthe initialrepresentation

–leavesfliprepresentationfromtheregressionconstantvalueto linearregressionmodel(orviceversa)whenpruninginternal nodes;

–internalnodesfliprepresentationfromtheobliquetesttothe univariateone(orviceversa)whenexpandingtheleaves.

Intherestofthemutationvariants,theSwitchmechanismis notapplied.Preservingtherepresentationin,forexample,themod- ifytestorchangemodelvariantallowsexploringtheneighborhood spaceofsolutionsratherthanstartingthesearchfromanewplace.

3.5. Selection,terminationcondition,andsmoothing

Therankinglinearselectionisappliedasaselectionmechanism.

Ineachgeneration,thesingleindividualwiththehighestvalueof thefitnessfunctioninthecurrentpopulationiscopiedtothenext

(8)

one(elitiststrategy).Evolutionterminateswhenthefitnessofthe bestindividualinthepopulationisnotimprovedduringthefixed numberofgenerations(default:1000).Inthecaseofaslowcon- vergence,themaximumnumberofgenerationsisalsospecified (defaultvalue:10,000)tolimitthecomputationtime.

ThemGMTsystemusesaformofsmoothingthatwasinitially introducedintheM5algorithm[46]foraunivariatemodeltree.As inthebasicGMTsolution[12],thesmoothingisappliedonlytothe bestindividualreturnedbyEAwhentheevolutionaryinductionis finished.Theroleofthesmoothingistoreducesharpdiscontinu- itiesthatoccurbetweenadjacentlinearmodelsintheleaves.For everyinternalnodeofthetree,thesmoothingalgorithmgenerates anadditionallinearmodelthatisconstitutedfromfeaturesthat occuralongthepathfromtheleaftothenode.Thisway,eachtested instanceispredictednotonlybyasinglemodelataproperleafbut alsobythedifferentlinearmodelsgeneratedforeachoftheinter- nalnodesuptotherootnode.Duetotheobliquesplitsthatmay appearinthetreeinducedbythemGMTsystem,wehaveupdated thesmoothingalgorithmtouseallattributesthatconstitutethe testsintheinternalnodes.

4. Experimentalvalidation

Toverifytheroleoftreerepresentations,wehaveperformed experimentsonbothartificialandreallifedatasets.Inthefirstsec- tionbelow,theimpactofthetreerepresentationisassessedusing fouralgorithmswithdifferenthomogeneousrepresentationsand theproposedmGMTinducer.Next,themGMTsolutionis com- paredwiththe resultsfrompaper [23]that cover experiments withpopulartreeinducersonpubliclyavailabledatasets.Finally, thepredictionperformanceoftheproposedsolutionistestedona largergroupofpubliclyavailabledatasets.

In all experiments reported in this section, a default set of parametersforallalgorithmsisusedinalltesteddatasets.Results presentedinthepapercorrespondtoaveragesof50runs.

4.1. Roleofthetreerepresentation

Inthissection,fivetypesoftreerepresentationsareanalyzed:

• univariateGlobal RegressionTree(denotedasuGRT) that has axis-paralleldecisionbordersandsimpleconstantpredictionsin theleaves;

• univariateGlobalModelTree(uGMT)thathasaxis-paralleldeci- sionbordersand multivariatelinearregression modelsinthe leaves;

• obliqueGlobalRegressionTree(oGRT)thatconstructsoblique splitsonbinaryorcontinuous-valuedattributesintheinternal nodes;

• oblique Global Model Tree (oGMT) – the most complex tree representation(obliquesplitsandmultivariatelinearregression models);

• mixedGlobalModelTree(mGMT)thatself-adaptsthetreerep- resentationtothecurrentlyanalyzeddata.

Thefirstfouralgorithms arebasedontheexisting solutions [10–12],andtheproposedmGMTalgorithmcanbetreatedasan extensionandunification.

Theimpactofrepresentationonthetreeperformanceistested ontwosetsofartificiallygenerateddatasets:

• armchair–variantsofthedatasetproposedin[11]thatrequire atleastfourleavesandthreesplits;

• noisy–datasetswithvariousdatadistributionsandadditional noise.

Table1

DefaultparametersofuGRT,uGMT,oGRT,oGMTandmGMT.

Parameter Value

Populationsize 50individuals

Crossoverrate 20%assignedtothetree

Mutationrate 80%assignedtothetree

Elitismrate 2%ofthepopulation(1individual) Maximumamountofgeneration

withoutimprovement

1000

Maxtotalnumberofgeneration 10,000

Allartificialdatasetshaveanalyticallydefineddecisionborders that fit toparticular treerepresentations: univariate regression (UR), univariate model (UM), oblique regression (OR), oblique model(OM),andmixed(MIX).Eachsetcontains1000instances, where33%oftheinstancesconstitutethetrainingsetandtherestof theinstancesconstitutethetestingset.Avisualizationanddescrip- tionoftheartificialdatasetsareincludedinAppendix.

4.1.1. Parametertuning

Parameter tuning for EAs is a difficult task. Hopefully, all importantEAparameters(e.g.,populationsize,theprobabilityof mutation and crossover,etc.)and the decision treeparameters (maximumsize,minimumobjectstomake asplit)wereexper- imentallyvalidatedandtunedinpreviouspapersfor treeswith homogeneousrepresentations[12].Thosegeneralsettingsshould alsoworkwellwiththemixedregressiontrees;therefore,they canbetreatedasdefault.Themainparameterforallalgorithmsis giveninTable1andtheprobabilitiesofselectingmutationopera- torvariantsareshowninTable2(theprobabilityofselectingeach crossovervariantisequalto20%).Thisway,onlytheroleofthe Switchmechanismthatisembeddedindifferentvariantsofmuta- tionoperatorsanddirectlyswitchesthenoderepresentation,for example,fromunivariatetoobliqueintheinternalnodeandfrom constantpredictiontomultivariatelinearregressionmodelinthe leaf,shouldbeinvestigated.

Parametertuningwasperformedonthearmchairdataset(ver- sion AMix1)according tothe guidelinesproposed in [14].Four differentSwitchmechanismvaluesthatcorrespondtotheprob- abilityofnoderepresentationchangeweretested:0.0,0.1,0.25, and0.5.TheimpactofthissettingontheproposedmGMTsolution andontherestofthetreeinducerswitha homogeneousinitial populationwaschecked.Forexample,whentheuGRTalgorithm isevaluatedandtheSwitchmechanismisenabled,thentherepre- sentationofmutatednodeswithassignedprobabilitiescanchange.

Thisway,thealgorithm canhaveamixedrepresentationandis abletohaveobliquesplitsormultivariateregressionmodelsin theleaves.Figs.7and8showthetreeerror(RMSE)ofthebest

Table2

ProbabilityofselectingasinglevariantofthemutationoperatorinuGRT,uGMT, oGRT,oGMT,andmGMT.

Mutationoperator Probabilityin:

uGRT&oGRT uGMT,oGMT

&mGMT

prune 30 20

parentwithson(branches) 5

parentwithson(tests) 2.5

newdipolartest 10

newmemetictest 2.5

modifytest 15

recalculatemodels 2.5

dipolarexpand 30 20

memeticexpand 2.5

changemodel 0 20

(9)

Fig.7.ImpactoftheSwitchmechanismonthebestindividualfortheuGRT,uGMT,oGRT,andoGMTinducersonthearmchairAMix1dataset.

individualduringthelearningphaseperformedonthetrainingset forallfivealgorithms:uGRT,uGMT,oGRT,oGMT,andmGMT.

One can observethat the impactof theSwitch mechanism isespecially visibleforthealgorithmswithhomogeneousinitial populations.IntheFig.7,enablingtheSwitchistheonlywaytofind optimalsolutionsfortheuGRT,oGRT,anduGMTalgorithms.When theSwitchissetto0.5,whichequalstotherandomrepresentation selection,theinducershavethefastestconvergence.IntheoGMT

algorithm,whichiscapableoffindingtheoptimalsolutiononits own,theapplicationoftheSwitchmechanismshortenstheinduc- ers’convergencetime.Astatisticalanalysisoftheresultsusingthe FriedmantestandthecorrespondingDunn’smultiplecomparison test(significancelevelequals0.05),asrecommendedbyDemsar [13],showedthatthereexistssignificantdifferencesbetweenthe Switchparametersettingforallfouralgorithmswithstrictrepre- sentations.Theperformedexperimentsshowedthattheoptimal

(10)

Fig.8. ImpactoftheSwitchmechanismonthebestindividualforthemGMTinducer onthearmchairAMix1dataset.

Switchsettingsfortheinducerswithhomogeneousrepresentation is0.5,whichequalsarandomrepresentationofthenewlycreated node.

ThemGMTresultsvisualizedinFig.8showtherearenobigdif- ferencesbetweenthealgorithmswithvariousSwitchsettings.This canbeexplainedbytheconstructionoftheinitialpopulationofthe algorithm,whichiscomposedoffivetypesofrepresentations.The individualrepresentationscanbesuccessfullycombinedwiththe crossoveroperators.However,wecanobserveaslightimprove- mentinthealgorithmconvergencetotheoptimalsolutionwhen theSwitchmechanismisenabled.

4.1.2. Comparisonofrepresentations

Toshowtheimpactoftreerepresentation,fiveinducerswere testedontwogroupsofdatasets,armchairandnoisy(eachsetwith sixvariants),describedinAppendix.Fourmetricswerecollected andillustrated:

• RootMeanSquaredError(RMSE)calculatedonthetestingset (Fig.9);

• averagenumberofleavesinthetree(Fig.10);

• average number ofattributesin theregression modelsin the leaves(Fig.11).Univariateinducersarenotshownastheaverage numberoftestsisalwaysequaltotheirsizedecreasedby1;

• averagenumberofattributesinthetestsintheinternalnodes (Fig.12).Regressioninducersarenotshownastherearenomod- elsintheleaves;therefore,theaveragenumberofattributesis alwaysequaltozero.

Allfourfiguresshouldbeanalyzedatthesametimetounder- standhoweachglobalinducerworks.

Artificialdatasetsweredesignedtobesolvedbyoneofthetested systemsandtheabbreviationsofdatasetsrevealwhichinduceris mostappropriatetouse.Ingeneral,allinducerswiththeappropri- ateindividualrepresentationmanagedtosuccessfullyinducethe definedtree.However,whentherepresentationdoesnotfitthe specificsofthedataset,itistoosimple(univariatesplit,regres- sionleaf)ortooadvanced(obliquesplit,modelintheleaf),andthe evolutionaryinducerswithhomogeneousrepresentationssome- timeshave difficultyfindinganoptimalsolution.In contrastto thefourglobalinducerswithdefinedrepresentations(uGRT,oGRT, uGMT,andoGMT),themGMTsystemhasflexiblerepresentation.

TheresultspresentedinFigs.9–12showthatmGMTsuccessfully adaptsthetreestructuretothespecificsofeachartificiallygener- ateddataset.InthedatasetsdenotedasUR,UM,OR,andOM,the mGMTsystemmanagedtokeepupwiththealgorithmswhose structurefittedthecharacteristicsofthedatasets.AsfortheMix datasetvariants,mGMTmanagedtooutperformtherestofthetree inducers.

Thereareatleasttworeasonswhythesystemswithstrictrep- resentationsoftheindividualshavedifficultywithsomevariants ofthedatasets.Thefirstreasonisthelimitationintheindividuals’

representation.Thenoaxis-paralleldecisionborderscaneasilybe solvedwithoGRToroGMTalgorithms.Theapplicationofunivariate splitsmaycausethe‘staircaseeffect’[8].Thisproblemissimilarfor theregressiontreesappliedfortheUMandOMdatasetsthatrequire regressionmodelsintheleaves.Toovercometheserestrictionsin therepresentation,regressiontrees(uGRT,uGMT,oGRT)increase theirtreesizes;however,thelimitationstillexists.Thelargesizeof theinducedtreeinfluencesnotonlyitsclaritybutmaycauseover- fittingtothetrainingdataandthusalargerpredictionerror.Letus explainthisfordifferentvariantsofthearmchairdatasetdescribed inAppendix:

Fig.9. RelativeMeanSquaredError(RMSE)ofthealgorithmson12artificialdatasetsdescribedinAppendix.Testedalgorithms:univariateGlobalRegressionTree(uGRT), obliqueGlobalRegressionTree(oGRT),univariateGlobalModelTree(uGMT),obliqueGlobalModelTree(oGMT),andmixedGlobalModelTree(mGMT).Forillustrative purposes,thevaluesoftheRMSEerrorforthenoisydatasethavebeenrescaled.

(11)

Fig.10.AveragenumberofleavesinthetreefordifferentGMTvariants.Thedefinedbarsrepresentthereferencevaluesthatareequaltotheoptimalnumbersofleavesfor thedatasets.

• AUR–canbeperfectlypredictedbyunivariateregressiontrees.

Allaforementionedinducersarecapableoffindingdecisiontrees withsmallRMSE(Fig.9),fourleaves(Fig.10),threeunivariate splits(Fig.11),andnoregressionmodelintheleaves(Fig.12).

EventheoGMTsystemmanagedtofindthedecision borders despiteitsadvancednoderepresentationoftheindividuals.The univariatesplitisjustaspecial caseofanobliquesplit,and a constantvalueisjustaspecialcaseofaregressionmodel.

• AUM – canbe perfectly predictedby univariate model trees.

ThisdatasetisdifficultfortheuGRTandoGRTsystemsbecause theyinduceonlytheregressiontrees.Forthesesystems,wecan observeamuchhighererrorrate(RMAE)andtreesthatare2–3 timeslarger.Itistypicalfortheregressiontreestoreducethetree errorbyaddingmanyleaveswithasmallnumberofinstances.In addition,theoGRTinducerappliedunnecessaryobliquesplitsin ordertominimizeRMSE.Therestofthealgorithmshadnoprob- lemwiththis datasetandinducetrees withfourleaves, three

univariatesplits,andusuallyperfect regressionmodelsinthe leaves.

• AOR–canbeperfectlypredictedbyobliqueregressiontrees.The applicationof thealgorithms withunivariate tests(uGRT and uGMT)tothedatasetwithnon-axisparalleldecisionbordersled totheirapproximationbyaverycomplicatedstair-likestructure.

• AOM,AMix1,andAMix2–canbeperfectlypredictedonlybythe inducerswiththemostadvanced treerepresentation (oblique splitsandmodelsintheleaves).Therefore,itisnotsurprising thatthealgorithmsuGRT,oGRT,anduGMTinduceovergrown decision trees.It isworth notingthat ofthose threesystems, thelargesttreesareinducedbythesystemthat hasthemost limitationsintherepresentationoftheindividuals–theuGRT.

Thesecondissueisthelargesearchspaceoftheinducerswith advancedtreerepresentationthatrequiresextensivecalculations tofindagoodsolution.Itcanbeobservedespeciallyforthetrees

Fig.11.ThesumofanaveragenumberofattributesusedintheinternalnodetestsfordifferentGMTvariants.Thedefinedbarsareequaltotheoptimalnumbersofattributes intheinternalnodetests.

(12)

Fig.12.Thesumofanaveragenumberofattributesthatconstituteleaves’modelsfordifferentGMTvariants.Whentheinducedtreehasonlyregressionleaves,thenno valueappearsonthechartasintheAURorNURdataset.Thedefinedbarsareequaltotheoptimalnumbersofattributesintheleaves’modes.

withobliquesplits.Theoretically,theoGMTsystemshouldbeable tofindoptimaldecisionsinalldatasetsasitinducestrees with themostcomplexrepresentation.However,wecanobservethat thetreesinducedbytheoGRTandoGMTsystemsdonotalways haveanoptimalstructure(eveniftheyarecapableoffindingit).

ForthesimplestdatasetslikeAUR,theinducerswithobliquesplits needsignificantlymoretimethantheuGRTsolution(whichfinds optimaldecisionsalmostinstantly).Thissituationisillustratedin Fig.13.AlthoughthemGMTsystemneededadditionaliterationsto settheappropriatetreerepresentation,itstilloutperformsoGRT andoGMT.InFig.13,wecanseethatthelargestnumberofitera- tionsisrequiredbytheinducerswithobliquesplitsintheinternal nodes.TheoGRTandoGMTsystemsneededsignificantlymoreiter- ationsthanuGRTbutmanagedtosuccessfullyreducetheprediction errorcalculatedonthetrainingsettozero.Itcanbeseenthatthe oGMTinducerdidnotfindtheoptimaltreesizeforall50runs.For afewruns,theoGMTalgorithmneededover10,000iterations,but additionalexperimentsshowedthatitiscapableoffindingoptimal

Fig.13.Influenceofthetreerepresentationontheperformanceofthebestindivid- ualontheAURtrainingsetfor5inducers.

trees.Inaddition,thelooptimeforglobalinducersdifferssignifi- cantly,asdifferentvariantsofmutationoperatorsareapplied.The averagelooptimes(inseconds)calculatedforalliterationsofall artificialdatasetsareshowninTable3.

Allobservationsmadeforthearmchairdatasetarealsocon- firmedforthenoisydataset.ThemGMTsolutionmanagedtofind alldefinedsplitsandmodelsdespitethenoiseanddifferentdata distributions.FromthedatasetvisualizationincludedinAppendix, itcanbeseenthatfindingappropriatedecisionbordersisnotan easytask.TheoGMTusuallykeptupwithmGMTbecausethedeci- siontreewassmaller(thedefinedtreehastwointernalnodesand threeleaves).

Fromtheperformedexperiments,wecanobservethatevery inducerwiththestricttreerepresentationhasitsprosandcons.

Thesystemsforunivariateregressiontreesareveryfastandgener- atesimpletestsininternalnodes;however,thetreeerrorandsize areusuallylarge.Obliqueregressiontreesareslightlysmallerand moreaccurate,butthesearchingofthesplittingrulesismuchmore computationallydemandingandthesimplicityoftheoutputtree islost.Theresultsgenerallyconfirmwhatisobservedfortheuni- variateandobliqueclassificationtrees.Currently,themostpopular treesfortheregressionproblemsareunivariatemodeltrees.From theresults, weseethattheyhaveagoodtrade-offbetweenthe treecomplexityandthepredictionperformance;inducedtreesare accurateandrelativelysmall.Theoretically,ifthecomputational complexityofthealgorithmwasnotanissue,theobliquemodel treesshouldbeasgoodasallaforementionedalgorithmsintermsof predictionpower.Unfortunately,theinductiontimeandthecom- plexityofthesolutionoftenhinderthepracticalapplicationofthe inducer,especiallyforthelargedatasets.

Ifweknewthecharacteristicsofthedatasetwecouldpre-select theinducerwiththemostappropriaterepresentation.However, thisisoftennotthecase;therefore,itmaybebettertoconsider

Table3

Averagesinglelooptimesofalliterationsofalldatasetsfordifferentsystems.

Algorithm uGRT uGMT oGRT oGMT mGMT

Averagetime 0.0013 0.0036 0.0017 0.0043 0.0024

±(stdev) 0.0002 0.0004 0.0005 0.0010 0.0003

(13)

systems.

4.2. mGMTvs.populartreeapproaches

Inthissetofexperiments,wecomparedtheproposedmGMT inducerwithdifferentpopulartreeapproaches.Inordertomake apropercomparisonwiththestateoftheartandthelatestalgo- rithmsintheliterature,weselectedthebenchmarkdatasetsalso usedin [23]. We precisely followed the preprocessing and the experimentalprocedurein [23] tomake thecomparisontothe resultsofthat paper asaccurateaspossible. Twopopularsyn- theticdatasetsandtworeal-lifedatasetsfromthewell-knownUCI MachineLearningRepository[5]wereused:

• Fried–artificialdatasetproposedbyFriedman[25]containing tenindependentcontinuousattributesuniformlydistributedin theinterval[0,1].Thevalueoftheoutputvariableisobtained withtheequation:

y=10∗sin(∗x1∗x2)+20∗(x3−0.5)2 +10∗x4+5∗x5+(0,1);

• 3DSin–artificialdatasetcontainingtwo continuouspredictor attributesuniformlydistributedininterval[3,3],withtheoutput definedas

y=3∗sin(x1)∗sin(x2);

usingWEKAsoftware[19] ontwo additionaltreeinducers,and includedtheresultsforourmGMTsystem:

• Hingealgorithm [23]thatisbasedononhinginghyperplanes identifiedbyafuzzyclusteringalgorithm;

• FRT–fuzzyregressiontree;

• FMID–fuzzymodelidentification;

• CART–state-of-the-artunivariateregressiontreeproposedby Breimanetal.[7];

• REPTree(RT)–populartop-downinducerthatbuildsaunivariate regressiontreeusingvarianceandprunesitusingreduced-error pruning(withbackfitting);

• M5–state-of-the-artunivariatemodeltreeinducerproposedby Quinlan[46];

• mGMT–proposedglobaltreeinducerwithmixedrepresentation.

Theperformanceofthemodelsismeasuredbythe(RMSE),a wellknownregressionperformanceestimator.Testingwasper- formedwith10-foldcross-validation,and50runswereperformed forthetested(bytheauthors)algorithms.Wehavealsoincluded theinformationaboutthealgorithms’standarddeviation(unfortu- nately[23],donotincludethisinformation).Theresultsshownin Table4indicatethatthemGMTsolutioncansuccessfullycompete withpopulardecisiontreeinducers.

Asthemeanvalueisnotpresentedintheresearch[23],wehave performedFriedmantests(significancelevelequalto0.05)using RMSEerrorvaluesontwogroups:

• mGMTvsHinge,FRT,FMIDandCART;

• mGMTvsuGRT,uGMT,oGRT,oGMT,RTandM5.

Table4

ComparisonofRMSEresultsofdifferentalgorithms.Algorithmswith*weretestedin[23]andtheirresultsarerecalled.ResultsformGMTalsoincludethestandarddeviation ofRMSEandthenumberofleavesinthetree.ThesmallestRMSEandsizeresultsforeachdatasetarebolded.

Algorithm Metric Fried 3DSin Abalone Kinman

Hinge* RMSE 0.92 0.18 4.1 0.16

Leaves 8 11 8 6

CART* RMSE 2.12 0.17 2.87 0.23

Leaves 495.6 323.1 664.8 453.9

FMID* RMSE 2.41 0.31 2.19 0.20

Leaves 12 12 12 12

FRT* RMSE 0.70 0.18 2.19 0.15

Leaves 15 12 4 20

RT RMSE 2.25±0.10 0.6±0.01 2.33±0.13 0.19±0.01

Leaves 445.7±37.6 724.2±30.1 168.8±33.7 720.8±78.1

M5 RMSE 1.81±0.09 0.23±0.01 2.12±0.14 0.16±0.01

Leaves 52.5±13.5 197.3±11.8 8.59±3.2 109.7±18.0

mGMT RMSE 0.67±0.01 0.15±0.003 2.13±0.08 0.14±0.001

Leaves 14.9±2.2 53.6±8.9 2.1±0.7 6.4±1.3

uGRT RMSE 3.66±0.09 0.53±0.04 2.55±0.03 0.21±0.007

Leaves 11.5±0.8 40.0±0.54 4.4±0.34 11.4±1.2

uGMT RMSE 0.66±0.01 0.15±0.003 2.19±0.001 0.16±0.002

Leaves 16.4±0.43 56.3±1.9 2.1±0.03 8.6±0.6

oGRT RMSE 3.41±0.05 0.62±0.008 2.50±0.10 0.19±0.01

Leaves 5.7±0.03 22.5±1.3 3.4±0.05 6.6±0.2

oGMT RMSE 1.13±0.02 0.15±0.01 2.21±0.05 0.17±0.001

Leaves 6.6±0.4 44.7±1.9 2.1±0.09 4.4±0.2

(14)

Fig.14.AnexampleofinducedtreeformGMTfortheKinmandataset.

For first group, the Friedman test showed significant sta- tistical differences between algorithms (P value=0.0109, F- statistic=10.62);however,aDunn’smultiplecomparisontestdid notshowanysignificantdifferencesinranksum,whichmaybe causedbyasmallsamplesize(onlyfourvaluesforfivealgorithms).

For the second group, a Friedman test also showed significant statistical differences between algorithms (P value<0.0001, F- statistic=194.6).AcorrespondingDunn’smultiplecomparisontest showedsignificantdifferencesinranksumbetweenmGMTandall algorithmsexceptuGMT.ItshouldalsobenotedthatmGMTman- agedtoinducemuchsmallertrees,oftenbyanorderofmagnitude smallerthanthetestedcounterparts.Arelativelyhighernumberof leavesforthemGMTinducerforthe3DSinandFrieddatasetscan beexplainedbyhighnon-linearityinthedatasets.AsthemGMT appliesmultivariatelinear regressionfunctionsin theleaves, it requiresmoresplitstofittothenon-lineardatasetscharacteristics.

Thecostoffindingpossiblynewhiddenregularitiesisthetree inductiontime. Itis wellknownthattheEAsin comparisonto thegreedysolutionsareslower,andthemGMTisnoexception.

TheefficiencycomparisonbetweenmGMTandbothtestedgreedy inducersshowedthattheproposedsolutionissignificantlyslower (verified withFriedmantest, Pvalue <0.0001) than both algo- rithms:M5and RT.ThemGMTtreeinductiontimewassmaller tothatoftheGMTsolution[12](Table3)andtook,dependingon thedataset,fromseveralsecondstoafewminutesonaregular PCcomputer.However,theprocessofevolutionaryinductionis progressive;therefore,intermediatesolutionsfrompre-maturely abortedrunsmayalsoyieldhigh-qualityresults.Inaddition,EAs arenaturallypronetoparallelism;therefore,theefficiencyproblem canbepartiallymitigated.

InFig.14,wepresentoneofthetreesinducedbymGMTfor theKinmandataset.Forthisparticularreal-lifedataset,allinduced treescontainedobliqueandunivariatesplitsandalmostalways multivariatelinearregressionsintheleaves.Thismaysuggestthat thismixedrepresentationisthemostsuitableoneforthisparticular

datasetandmayrevealnewrelationshipsandinformationhidden inthedata.Theoutputtreeismuchsmallerand hasthesmall- estpredictionerror,especiallywhen comparedtotheresultsof state-of-the-artsolutionslikeCARTandM5.However,itshouldbe noticedthatincaseofthemixed,obliqueormodeltreesthesize ofthetreeisnotanaccuratereflectionofitscomplexity.Thetrees withmoreadvancedtreerepresentationareusuallysmallerwhich iswhytheM5algorithminducesmuchsmallertreesthanCART.

Therefore,even verysmalltreeinducedbythemGMTbutwith complexobliquesplitsandmodelsintheleavescanbelesscom- prehensiblethan,forexample,largerunivariateregressiontree.In anextremescenario,theproposedsolutioncanbeascomplexas treesinducedbytheoGMTsystemorassimpleasonesinduced bytheuGRTalgorithm.However,mGMTis capableofadjusting therepresentationofthenodestoautomaticallyfittotheanalyzed whichisnotpossibleinthecompetitivesolutionswhichhaveonly homogeneoustreerepresentation.Although,thetrade-offbetween thecomprehensibilityandpredictionperformanceinmGMTstill exits,itcanbeeasilyadjustedtotheuserpreferencesduetothe parametersinthefitnessfunctionofthemGMTalgorithm.

4.3. OverallpredictionperformanceofmGMT

Inthelaststepoftheexperiments,wecomparedthepredic- tionperformanceofthemGMTinducerwiththatofotherpopular systemsonmultipledatasets.TestswereperformedwithWEKA software[19]usingthecollectionofbenchmarkregressiondatasets provided byLouisTorgo[45].Fromthis packageof 30datasets (availableontheWEKApage),weselectedonlythosewithamin- imumof 1000instances,described inTable5.We decidedthat datasetswith,forexample,43instancesandtwovariablesarenot thebestforvalidation.ThedatasetshavebeenprocessedbyWEKA’s supervisedNominalToBinaryfilterthatconvertsnominalattributes intobinarynumericattributesandtheunsupervisedReplaceMiss- ingValuesfilterthatreplacesmissing valueswiththeattributes’

Table5

Datasetcharacteristics:name,numericattributesnumber(Num),nominalattributesnumber(Nom),andthenumberofinstances.

ID Name Num Nom Instances ID Name Num Nom Instances

1 2dplanes 10 0 40,768 11 elevators 18 0 8752

2 abalone 7 1 4177 12 fried 10 0 40,768

3 ailerons 40 0 13,750 13 house16H 16 0 22,784

4 bank32nh 32 0 8192 14 house8L 8 0 22,784

5 bank8FM 8 0 8192 15 kin8nm 8 0 8192

6 calhousing 8 0 20,640 16 mv 7 3 40,768

7 cpuact 21 0 8192 17 pol 48 0 15,000

8 cpusmall 12 0 8192 18 puma32H 32 0 8192

9 deltaailerons 5 0 7129 19 puma8NH 8 0 8192

10 deltaelevators 6 0 7129

Cytaty

Powiązane dokumenty

One can observe that the evolutionary induction is run in a sequential way on a master node and the most time consuming operations (evaluation of the individuals, recalculation of

In this paper, we would like to propose a new evolutionary algorithm for global induction of univariate regression trees and model trees that associate leaves with simple

In con- trast to classical top-down inducers, where locally optimal tests are sequentially chosen, in GMT the tree structure, tests in internal nodes and models at the leaves

[r]

The plot showing the results of 5NN with Euclidean measure, applied to the thyroid data shows clearly the ad- vantage of SSV2 selector (its ability to detect dependencies

After a working set of parameters and configuration connections has been created, it is recommended that an archive of the recipe be made for back up purposes. There are 2

1 We choose the attribute with the greatest information gain (the greatest reduction in entropy), which becomes the root of the decision tree.. 2 For each value of the

Analogously, we suggest systematic variations of substrate properties (e.g. roughness and surface energy), properties of the wetting liquid (e.g. surface tension and viscosity),