The role of decision tree representation in regression problems – An evolutionary perspective
Marcin Czajkowski
∗, Marek Kretowski
FacultyofComputerScience,BialystokUniversityofTechnology,Wiejska45a,15-351Bialystok,Poland
a r t i c l e i n f o
Articlehistory:
Received26September2015 Receivedinrevisedform20June2016 Accepted2July2016
Availableonline16July2016
Keywords:
Evolutionaryalgorithms Datamining
Regressiontrees
Self-adaptablerepresentation
a b s t r a c t
Aregressiontreeisatypeofdecisiontreethatcanbeappliedtosolveregressionproblems.Oneof itscharacteristicsisthatitmayhaveatleastfourdifferentnoderepresentations;internalnodescan beassociatedwithunivariateorobliquetests,whereastheleavescanbelinkedwithsimpleconstant predictionsormultivariateregressionmodels.Theobjectiveofthispaperistodemonstratetheimpact ofparticularrepresentationsontheinduceddecisiontrees.Asitisdifficultifnotimpossibletochoosethe bestrepresentationforaparticularprobleminadvance,theissueisinvestigatedusinganewevolutionary algorithmforthedecisiontreeinductionwithastructurethatcanself-adapttothecurrentlyanalyzed data.Theproposedsolutionallowsdifferentleavesandinternalnodesrepresentationwithinasingletree.
Experimentsperformedusingartificialandreal-lifedatasetsshowtheimportanceoftreerepresentation intermsoferrorminimizationandtreesize.Inaddition,thepresentedsolutionmanagedtooutperform populartreeinducerswithdefinedhomogeneousrepresentations.
©2016ElsevierB.V.Allrightsreserved.
1. Introduction
Datamining[18]canrevealimportantandinsightfulinforma- tionhiddenindata.However,appropriatetoolsandalgorithmsare requiredtoeffectively identifycorrelationsand patternswithin thedata.Decisiontrees[24,40]representoneofthemaintech- niquesfordiscriminantanalysispredictioninknowledgediscovery.
Thesuccessoftree-basedapproaches canbeexplainedbytheir easeofapplication,fastoperation,andeffectiveness.Furthermore, thehierarchicaltreestructure, in which appropriate testsfrom consecutivenodesare sequentiallyapplied,closely resembles a humanwayofdecisionmaking.Allthismakesdecisiontreeseasy tounderstand,evenforinexperiencedanalysts.Despite50yearsof researchondecisiontrees,manyproblemsstillremain[30],such assearchingonlyforalocallyoptimalsplitintheinternalnodes;
appropriatepruningcriterion, efficientanalysisofcost-sensitive dataorperformingmulti-objectiveoptimization.Tohelpresolve someoftheseproblems,evolutionarycomputation(EC)hasbeen appliedtodecisiontreeinduction[2].Thestrengthofthisapproach liesintheglobalsearchforsplitsandpredictions.Itresultsinhigher accuracyand smalleroutputtrees compared topopulargreedy decisiontreeinducers.
∗ Correspondingauthor.
E-mailaddress:m.czajkowski@pb.edu.pl(M.Czajkowski).
Finding appropriate representation of the predictor before actuallearningisadifficulttaskformanydataminingalgorithms.
Often,thealgorithmstructuremustbepre-definedandfixeddur- ingitslife-cycle,whichisamajorbarrierindevelopingintelligent artificialsystems.Thisproblemiswellknown[20]inartificialneu- ralnetworkswherethetopologyandthenumberofneuronsis unknown,insupportvectormachineswiththeirdifferenttypesof kernels,andindecisiontreeswherethereisaneedtoselectthe typeofnoderepresentation.Onesolutionistoautomaticallyadapt thestructureofthealgorithmtotheanalyzedproblemduringthe learningphase,whichcanbeaccomplishedusingtheevolution- aryapproach[27,33].Thisapproachisalsoappliedtoclassification trees [29,26]wherea mixed testrepresentation in theinternal nodesispossible.
Inthispaper,wewanttoinvestigatetheroleofregressiontree representationanditsimpactonpredictiveaccuracyandinduced treesizeasithasnotbeensufficientlyexplored.Usingartificially generateddatasets,wewillrevealtheprosandconsoftreeswith differentrepresentation types, focusing mainlyonevolutionary inducedtreesforregressionproblems[2].Differencesintherep- resentationofregressiontrees[30]canoccurintwoplaces:inthe testsintheinternalnodesandinthepredictionsintheleaves.For real-lifeproblems,itisdifficulttosaywhichkindofdecisiontree (univariate,oblique,regression,model)shouldbeused.Itisoften almostimpossibletochoosethebestrepresentationinadvance.To topitall,formanyproblemsheterogeneousnoderepresentation http://dx.doi.org/10.1016/j.asoc.2016.07.007
1568-4946/©2016ElsevierB.V.Allrightsreserved.
isrequiredwithinthesametree.Thisiswhywealsostudyaspe- cializedevolutionaryalgorithm(EA)calledtheMixedGlobalModel Tree(mGMT).Itinducesadecisiontreethatwebelieveself-adapts itsstructuretothecurrentlyanalyzeddata.Theoutputtreemay havedifferentinternal nodeandleafrepresentations, andfor a givendatasetitmaybeasgoodorevenbetterthananytreewith strictrepresentation.
Thepaperisorganizedasfollows.Thenextsectionprovides a briefbackgroundonregression trees. Section 3describesthe proposedextensionforevolutionaryinducerswithhomogeneous representations.AllexperimentsarepresentedinSection4,and thelastsectioncomprisestheconclusionandsuggestionsforfuture work.
2. Decisiontrees
Wemayfinddifferentvariantsofdecisiontreesintheliterature [30].Theycanbegroupedaccordingtothetypeofproblemthey areappliedto,thewaytheyareinduced,orthetypeofstructure.
Inclassificationtrees,aclasslabelisassignedtoeachleaf.Usu- ally,itisthemajorityclassofalltraininginstancesthatreaches thatparticularleaf.Inthispaper,wefocusonregressiontreesthat maybeconsideredvariantsofdecisiontreesdesignedtoapproxi- matereal-valuedfunctionsinsteadofbeingusedforclassification tasks.Althoughregressiontreesarenotaspopularasclassification trees,theyarehighlycompetitivewithdifferentmachinelearning algorithms[35]andareoftenappliedtomanyreal-lifeproblems [16,28].
Inthecaseofthesimplestregressiontree,eachleafcontains aconstantvalue,usuallyanaveragevalueofthetargetattribute.
Amodeltreecanbeseenasanextensionofthetypicalregres- siontree[46,31].Theconstantvalueineachleafoftheregression treeisreplacedinthemodeltreebyalinear(ornonlinear)regres- sionfunction.Topredictthetargetvalue,thenewtestedinstance is followed downthetree froma root nodetoa leaf usingits attributevaluestomakeroutingdecisionsateachinternalnode.
Next,thepredictedvalueforthenewinstanceisevaluatedbased on a regression model in the leaf. Examples of predicted val- ues of classification, regression, and model trees are given in Fig.1.Thegraylevel colorof eachregionrepresentsadifferent classlabel(foraclassificationtree), andtheheightcorresponds to the value of the prediction function (regression and model trees).
Mostdecisiontreespartitionthefeaturespacewithaxis-parallel decisionborders[44].Thistypeoftreeiscalledunivariatebecause eachsplitinthenon-terminalnodeinvolvesasinglefeature.For continuous-valuedfeatures,inequalitytestswithbinaryoutcomes areusuallyapplied,andfornominalfeaturesmutuallyexclusive groupsoffeaturevaluesareassociatedwiththeoutcomes.When morethanonefeatureistakenintoaccounttobuildatestinan
internalnode, wedeal withmultivariatedecision trees[8].The mostcommonformofsuchatestisanobliquesplit,whichisbased onalinearcombinationoffeatures.Thedecisiontreethatapplies onlyobliquetestsisoftencalledobliqueorlinear,whereashetero- geneoustreeswithunivariate,linear,andothermultivariate(e.g., instance-based)testsarecalledmixedtrees[29].Fig.2showsan exampleofunivariateandobliquedecisiontrees.Wecanobserve thatifdecisionbordersarenotaxis-parallel,thenusingonlyuni- variatetestsmayleadtoanovercomplicatedclassifier.Thiskindof situationisknownasa‘staircaseeffect’[8]andcanbeavoidedby applyingmoresophisticatedmultivariatetests.Whileobliquetrees aregenerallysmaller,thetestsareusuallymoredifficulttointer- pret.Itshouldbeemphasizedthatthecomputationalcomplexity ofmultivariatetreeinductionissignificantlyhigherthanthatof univariatetreeinduction[3].
Theroleoftreerepresentationhassofarbeendiscussedmainly interms ofclassificationproblems.Thestudy[25,8] showsthat univariate inducers return larger trees than multivariate ones, andtheyareoftenlessaccurate.However,multivariatetreesare difficulttounderstandandinterpret,andthetreeinductionissig- nificantlyslower.Therefore,makingageneralconclusionisrisky asthemostimportantfactorsarethecharacteristicsofthepar- ticular dataset [25]. To thebest of our knowledge,there is no detailedreportthatreferstotheroleofrepresentationinregres- siontrees.Itcouldbeexpectedthatunivariate andmultivariate regressiontreesshouldbehavesimilarlytotheclassificationones.
However,there is stillan openquestionabouttheinfluenceof the leaves’ representation on the tree performance. The paper focuses on evolutionary induced regression trees; therefore, to go further, we must briefly describe the process of creating a decision treefromthetraining set.Thetwo mostpopularcon- ceptsforthedecisiontreeinductionarethetop-downandglobal approaches. Thefirstis basedonagreedy procedureknownas recursivepartitioning[39].Inthetop-downapproach,theinduc- tionalgorithmstartsfromtherootnodewherethelocallyoptimal splitissearchedaccordingtothegivenoptimalitymeasure.Next, thetraininginstancesareredirectedtothenewlycreatednodes, andthisprocessisrepeatedforeachnodeuntilastoppingcon- dition ismet.Additionally, post-pruning [15]is usually applied aftertheinductiontoavoidtheproblemofover-fittingthetraining data.
Oneofthemostpopularrepresentativesoftop-downinduced univariateregressiontreesisasolutionproposedbyBreimanetal.
calledClassificationAndRegressionTree(CART)[7].Thealgorithm searches for a locally optimalsplit that minimizes the sumof squaredresidualsandbuildsapiecewiseconstantmodelwitheach terminalnodefittedwiththetrainingsamplemean.Othersolutions have managedtoimprovethepredictionaccuracy byreplacing singlevalues intheleaveswithmoreadvancedmodels.TheM5 system[46]inducesatreethatcontainsmultiplelinearmodelsin
Fig.1. Anillustrationofpredictedvaluesoftheclassification,regression,andmodeltrees.
Fig.2.Anexampleofobliqueandunivariatedecisiontrees.
theleaves.AsolutioncalledStepwiseModelTreeInduction(SMOTI) [31]canbeviewedasanobliquemodeltreeastheregressionmod- elsareplacednotonlyintheleavesbutalsointheupperpartsof thetree.Allaforementionedmethodsinducetreeswiththegreedy strategy,whichisfastandgenerallyefficientbutoftenproduces onlylocallyoptimalsolutions.
Theglobalapproachforthedecisiontreeinductionlimitsthe negativeeffects oflocally optimaldecisions.It triestosimulta- neouslysearchforthetreestructure,testsintheinternalnodes,and modelsintheleaves.Thisprocessisobviouslymuchmorecompu- tationallycomplexbutcanrevealhiddenregularitiesthatareoften undetectablebygreedymethods.Theglobalinductionismainly representedbysystemsbasedonanevolutionaryapproach[2,4];
however,therearesolutionsthatapply,forexample,antcolony optimization[36,6].
In the literature, there are relatively fewer evolutionary approachesfortheregressionandmodeltreesthanfortheclas- sification trees. Popular representatives of EA-based univariate regressiontreesaretheTARGETsolution[17]thatevolveaCART- likeregression treewithbasic geneticoperators and theuGRT algorithm[11]thatintroducesspecializedvariantsofmutationand crossover.AstronglytypedGP(GeneticProgramming)approach calledSTGPwasalsoproposed[21]forunivariateregressiontree induction. Thereare also globally induced systems that evolve univariatemodeltrees,suchastheE-Motiontree[1]thatimple- ments standard 1-point crossover and two different mutation strategiesandtheGMTsystem[12]thatincorporatesknowledge aboutthe inducingproblem for theglobal modeltree intothe evolutionarysearch.Therearealsopreliminarystudiesonoblique treescalledoGMT[10].Intheliterature,wemayalsofindtheGP
Fig.3. ThemGMTprocessdiagram.
approachthatevolvesthemodeltreeswithnonlinearregression modelsintheleavescalledGPMCC[38].Itiscomposedfromthe GPtoevolve thestructure ofthemodel treesandGAtoevolve polynomialexpressions(GASOPE)[37].
3. MixedGlobalModelTree
Thispaperfocusesontherepresentation ofgloballyinduced regressionandmodeltreesanditsinfluenceontheoutputtree.
In this section,we proposean extensionfor theGMT and GRT systems[12]calledtheMixedGlobalModelTree(mGMT)tobet- terunderstandtheunderlyingprocessbehindtheselectionofthe representation.Withtheevolutionarytreeinduction,weareable notonlytosearchforanoptimaltreestructure,testsininternal nodes,ormodelsintheleavesbutalsotoself-adaptthetreerepre- sentation.Thegeneralstructureofthealgorithmfollowsatypical EAframework[32]withanunstructuredpopulationandagen- erationalselection.It canbetreated asaunifiedframework for bothunivariateandobliquetestsintheinternalnodesandregres- sionand modelsleaves.ThemGMTdoesnot requiretosetthe treerepresentation in advance becausetheEA validates differ- entvariantsoftherepresentationsnotonlyonthetreelevelbut alsoonthenodelevelandmayinduceaheterogeneoustreethat wecalledamixedtree.Adescriptionoftheproposedapproachis given,especiallywithrespecttoissuesthatarespecifictomixed trees.
TheprocessdiagramofthemGMTalgorithmisillustratedin Fig.3.Theproposedsolutionevolvestheregressionandmodeltrees intheiractualforms.Thecandidatesolutionsthatconstitutethe populationareinitializedwiththesemi-randomgreedystrategy andareevaluatedusingthemulti-objectiveweightformulafitness function.Iftheconvergencecriteriaisnotsatisfied,alinearrank- ingselectionisperformedtogetherwiththeelitiststrategy.Next, geneticoperatorsareapplied,includingdifferentvariantsofspe- cializedmutationsandcrossovers.Aftertheevolutionprocessis finished,thebestindividualfoundusingtheEAissmoothed.Each elementofthemGMTsolutionisdiscussedindetailinthefollowing sections.
3.1. Representation
Amixedregression treeis acomplexstructure inwhichthe number and the type of nodes and even the number of test outcomes are not known in advance for a given learning set.
Fig.4.AnexamplerepresentationofthemGMTindividual.
Therefore,thecandidatesolutionsthatconstitutethepopulation are not encoded and are representedin theiractualform (see Fig.4).
Therearethreepossibletesttypesintheinternalnodes:two univariate and onemultivariate.In thecase of univariate tests, a test representation concerns only one attribute and depends ontheconsideredattributetype.Forcontinuous-valuedfeatures, typicalinequalitytests withtwo outcomesare used.Fornomi- nalattributes,atleastoneattributevalueisassociatedwitheach branchstartinginthenode,whichmeansthataninternaldisjunc- tionisimplemented.Onlybinaryorcontinuous-valuedattributes areusedtoconstructtheobliquesplit.Thefeaturespacecanbe dividedintotworegionsbyahyperplane:
H(w,)={x:w,x=}, (1)
wherexisavectoroffeaturevalues(objects),w=[w1,...,wP]isa weightvector,isathreshold,w,xrepresentsaninnerproduct, andPisthenumberofindependentvariables.Eachhyperplaneis representedbyafixed-sizeP+1–dimensionaltableofrealnum- berscorrespondingtotheweightvectorwandthethreshold.
IneachleafofthemGMTsystem,amultiplelinearmodelcan beconstructedusingthestandardregressiontechnique.Itiscal- culatedonlyforobjectsassociatedwiththatnode.Adependent variableyisexplainedbythelinearcombinationofmultipleinde- pendentvariablesx1,x2,...,xP:
y=ˇ0+ˇ1∗x1+ˇ2∗x2+...+ˇP∗xP, (2) whereˇ0,...,ˇParefixedcoefficientsthatminimizethesumof thesquaredresidualsofthemodel.IfallBi(0<i≤P)areequalto 0,theleafnodewillbearegressionnodewithaconstantequal toˇ0.IfonlyoneBi /=0then,wedealwithsimplelinearregres- sion; otherwiseeach leafcontainssimple ormultivariatelinear regression.
3.2. Initialization
Eachinitialindividualinthepopulationiscreatedwiththeclas- sicaltop-downapproachthatresemblestheM5solution[46].The initialpopulationofmGMTisheterogeneousandiscomposedof five typesof standard regression trees withdifferentrepresen- tations(fourhomogeneousandoneheterogeneous):aunivariate regressiontree;anobliqueregressiontree;aunivariatemodeltree;
anobliquemodeltree;anda mixedtreethatcontains different kindsoftestsintheinternalnodes(univariateandoblique)anddif- ferenttypesofleaves(regressionandmodel).Inmixedtrees,before each stepofrecursivepartitioning, thetype ofnodeis selected
Fig.5.Hyperplaneinitializationbasedonrandomlychosen‘longdipole’(left)andanexampleillustratinghowtheobliquetestiscreated(right).
randomlyandanappropriatetestormodelisgenerated.Theimpor- tanceofsucha heterogeneousinitialpopulationisitsdiversity.
Therecursivepartitioningisfinishedwhenthedependentvalue ispredictedforalltrainingobjectsinthenodeorthenumberof instancesinthenodeissmall(default:fiveinstances).Eachini- tialindividualiscreatedbasedonasemi-randomsubsampleofthe originaltrainingdata(default:10%ofdata)tokeep thebalance betweenexplorationandexploitation.Toensurethatthesubsam- plecontainsobjectswithvariousvaluesofthepredictedattribute, thetrainingdataissortedbythepredictedvalueandsplitintoa fixednumberofequal-sizefolds(default:10).Fromthesefolds, anequal numberofobjectsisrandomlychosenandplacedinto thesubsample.Testsinnon-terminalnodesarecalculatedfroma randomsubsetofattributes(default:50%).
Inthecaseoftheunivariateinternalnodes,oneofthreememetic searchstrategies[12]thatinvolvesemployingthelocallyoptimized testsischosen:
• LeastSquares(LS):thetestintheinternalnodeischosenaccord- ingtothenodeimpuritymeasuredbythesumofthesquared residuals.
• LeastAbsoluteDeviation(LAD):thetestreducesthesumofthe absolutedeviations.Itismorerobustandhasgreaterresistance tooutlyingvaluesthanLS.
• Dipolar:thetestisconstructedaccordingtothe‘longdipole’[12]
strategy.Atfirst,aninstancethat willconstitutethedipoleis randomlyselectedfrom thesetof instances fromthecurrent node.The restofthefeaturevectorsaresorted indecreasing orderaccordingtothedifferencebetweenthedependentvari- ablevaluesandtheselectedinstance.Thesecondinstancethat constitutesthedipoleshouldhaveamuchdifferentvaluethan thedependentvariable.Tofindit,weappliedamechanismsimi- lartotherankinglinearselection[32].Finally,thetestthatsplits thedipoleisconstructedbasedonarandomlyselectedattribute wheretheboundarythresholdisdefinedasamidpointbetween thepairsthatconstitutethedipole.
Thesearchstrategyusedtofindsplitsintheinternalnodesis differentfortheobliquetests.Aneffectivetestinanon-terminal nodeissearchedonlyusingthedipolarstrategy.Fig.5(left)illus- tratesthehyperplaneinitializationbasedonarandomlychosen
‘longdipole’.ThehyperplaneHij(w,)splitsthedipole(xi,xj)in suchawaythatthetwofeaturevectorsxiandxjaresituatedon theoppositesidesofthedividinghyperplane:
(w,xi−)∗(w,xj−)<0. (3) The hyperplane parameters are as follows: w=xi−xj and
=ı*w,xi+(1−ı)*w,xj,whereı∈(0,1)isarandomlydrawn coefficientthatdeterminesthedistancebetweentheoppositeends
ofthedipole.Hij(w,)isperpendiculartothesegmentconnecting thedipoleends.
Toprovideanumericexampleillustratinghowanobliquetest iscreated,let’simaginethetwo2dimensionalspaceillustratedin Fig.5(right).Aftertheselectionoftworandomlychosendipoles withCartesiancoordinatesequaltoA(1,1),B(5,3),andcoefficient ı=0.5,thesplittinghyperplaneHparametersare:w[5−1,3−1]
and=0.5*((5−1)*(1+5))+0.5*((3−1)*(1+3))=16.Therefore, thehyperplane HAB is a linedescribed as: y=−2*x+8. To per- forma split,wesimplycheck onwhich side ofthehyperplane Hallinstancesfromtheinternalnodearepositioned.Let’scon- siderpointC(1.5,2.5).Byapplyingittothehyperplaneequationw (1.5*4+2.5*2),weseethatthescore11issmallerthanthevalue of.Usingadifferentpoint,forexample,D(3.5,4.5)wouldresultin value23,whichmeansthatthepointDliesontheoppositesideof thehyperplanetopointC.Forthisparticularexample,theparame- terıequals0.5;therefore,thehyperplanewintersectsthemidpoint betweendipolesAandB.However,ifwechangetheparameterto ı=0.1,thenthehyperplanedenotedasHAB shiftstowardspointA.
WecanobservethatforthishyperplaneHpointCandpointDlie onthesamesideandthusbothinstanceswouldbedirectedafter thesplittothesamesub-node.
3.3. Goodnessoffit
Theevolutionarysearchprocessisverysensitivetotheproper definitionofthefitnessfunction.Inthecontextofregressiontrees, a direct minimization of the prediction error measured in the learningsetusuallyleadstotheover-fittingproblem.Intypical top-downinduction ofdecision trees [39], thisproblemis par- tiallymitigatedbydefiningastoppingconditionandbyapplying post-pruning[15].Inthecaseoftheevolutionaryapproach,the multi-objective functionis required tominimizetheprediction errorandthetreecomplexityatthesametime.
Inourapproach,aBayesianinformationcriterion(BIC)[41]is usedasafitnessfunction.Itwasshownthatthiscriterionworked wellwithregressionandmodeltrees[17,12]andoutperformsother popularapproaches.BICisgivenby:
FitBIC(T)=−2∗ln(L(T))+ln(n)∗k(T), (4) whereL(T)isthemaximumofthelikelihoodfunctionofthetreeT, nisthenumberofobservationsinthedata,andk(T)isthenumber ofmodelparametersinthetree.Thelog(likelihood)functionL(T) istypicalforregressionmodelsandcanbeexpressedas:
ln(L(T))=−0.5n∗[ln(2)+ln(SSe(T)/n)+1], (5) whereSSe(T)isthesumofsquaredresidualsofthetreeT.Theterm k(T)canalsobeviewedasapenaltyforover-parametrization.
Theproposedmixedtreerepresentationrequiresdefininganew penaltyforthetreeover-parametrization.Itisratherobviousthat
ininternalnodesanobliquesplitbasedonafewfeaturesismore complexthanaunivariatetest.Thesameappliestothedifferent leafrepresentations.Asaconsequence,thetreecomplexityk(T) shouldnotonlyreflectthetreesizebutalsothecomplexityofthe testsininternalnodesandmodelsintheleaves.However,itisnot easytoarbitrarilysettheimportanceofdifferentmeasuresbecause itoftendependsonthedatasetbeinganalyzed.Insuchasituation, thetreecomplexityk(T)isdefinedas:
k(T)=˛1∗Q(T)+˛2∗O(T)+˛3∗W(T), (6) whereQ(T)isthenumberofnodesinthemodeltreeT;O(T)isequal tothesumofthenumberofnon-zeroweightsinthehyperplanesin theinternalnodes,andW(T)isthesumofthenumberofattributes inthelinearmodelsintheleaves.Defaultvaluesoftheparame- tersare˛1=2.0,˛2=1.0,and˛3=1.0;however,furtherresearch todeterminetheirvaluesisneeded.Ifthei-thinternalnodeTiis univariate,thevalueofO(Ti)equals1.Ifthej-thleafcontainsacon- stantvalue,thentheparameterW(Tj)equalszerobecausethere arenoattributesinthelinearmodel.Otherwise,thevalueofO(Ti) andW(Tj)equalsthenumberofattributesusedtobuildthetestin internalnodeiorthemodelinleafj.
Theflexibilityofthefitnessfunctionallowsitssimpleconfig- urationbased onadditionalknowledge or userpreferences,for example,ifusersknowthebasicrelationshipsinthedataorwantto limittreerepresentationstothedesiredones,thefitnessfunction canassignahighvalueto˛2or˛3orboth.
3.4. Geneticoperators
Tomaintaingeneticdiversity,themGMTalgorithmappliestwo specializedgeneticoperatorscorrespondingtoclassicalmutation andcrossover.Ingloballyinducedtreeswithstrictrepresentations, thereareseveralvariantsoftheoperators[11,12];however,their availabilitymainlydependsontherepresentationtype.Bothoper- atorsareappliedwithagivenprobabilityandinfluencethetree structure,thetestsinnon-terminalnodes,andoptionallythemod- elsintheleaves.Afteranysuccessfulmutationorcrossover,itis usuallynecessarytorelocatelearningvectorsbetweenthepartsof thetreerootedinthealterednode.Thiscancausepruningofcertain partsofthetreethatdonotcontainanylearningvectors.Inaddi- tion,thecorrespondingmodelsintheaffectedindividualleavesare recalculated.Duetoperformancereasons,thecoefficientsinthe existinglinearmodelsarerecalculatedtofitarandomlyselected sampleoftheactualdata(nomorethan50instances)inthecorre- spondingleaves.
Eachcrossoverbeginswithrandomlyselectingtwoindividuals fromthepopulationthatwillbeaffected.Next,thecrossoverpoints inbothindividualsaredetermined.Wehaveadaptedallvariants proposedintheunivariatetreeinducer[12]toworkwiththemixed representation,visualizedinFig.6:
(a)exchangesubtrees:exchangedofsubtreesstartinginrandomly selectednodes;
(b) exchange branches: exchanges of branches that starts from selectednodesinrandomorder;
(c) exchange tests: recombines the tests (univariate nominal, univariate continuous-valued, and oblique) associated with randomlyselectedinternalnodes;
(d)withbest:crossoverswiththebestindividual;
(e)asymmetric: duplicates subtrees with small mean absolute errorsandreplacesnodeswithhigherrors.
Selected nodes for the recombination must have the same numberofoutputs;however,theymayhavedifferentrepresen- tations.Thiswaycrossoversshiftnotonlythetreestructurebut alsothenodes’representations.Inthevariants(d)withbestand
(e) asymmetric, the additional mechanism is applied to decide whichnodewouldbeaffected.Thealgorithmranksalltreenodes in bothindividuals accordingtotheirabsoluteerrordividedby thenumber of instancesin thenode. Theprobability ofselect- ingnodesisproportionaltotherankinalinearway.Thenodes with a small averageerror per instance are more likely to be donors,whereas theweaknodes(withahighaverageerrorper instance)aremorelikelytobereplacedbythedonorsfromthe second individual (and have a higher probability of becoming receivers).
Themutationofanindividualstartswiththeselectionofanode type (equal probabilityof selecting a leafor an internalnode).
Next,arankedlistofnodesoftheselectedtypeforthisindivid- ualiscreated.Dependingonthetypeofnode,therankingtakes intoaccountthelocationforinternalnodes(nodesinthelower partsofthetreearemutatedwithhigherprobability)andthepre- dictionerrorofthenode(nodeswithahighererrorperinstance aremorelikelytobemutated).Finally,amechanismanalogousto therankinglinearselection[32]isappliedtodecidewhichnode intheindividualwillbeaffected.Dependingonthenode’srepre- sentation,differentvariantsofoperatorsareavailableininternal nodes:
• prune:changesinternalnodetoaleaf(actslikeapruningproce- dure);
• parentwithchild(branches):replacesaparentnodewitharan- domlyselectedchildnode(internalpruning);
• parentwithchild(tests):exchangestestsbetweenparentandran- domlyselectedchildnodes;
• newdipolartest:testsinaffectednodeisreinitializedbyanew oneselectedusingthedipolarstrategy;
• newmemetictest:testsinnodeisreinitializedbyoneoftheopti- malitystrategiesproposedinSection3.2;
• modifytest: shiftshyperplaneorsetrandomweights (oblique test);shiftsthreshold(univariatetestonacontinuousattribute) or re-groups nominal attribute values by adding/merging branchesormovingvaluesbetweenthem;
• recalculatemodels:recursivelyrecalculateslinearmodelsusing alltheinstancesinthecorrespondingleaves;
andintheleaves:
• dipolarexpand:transformsleafintointernalnodewitha new dipolartest(randomtype);
• memeticexpand:transformsleafintointernalnodewithanew testselectedbyoneoftheoptimalitystrategies;
• changemodel:extends/simplifies/changesthelinearmodelinthe leafbyadding/removing/replacingarandomlychosenattribute orremovingtheleastsignificantone.
Foramoredetaileddescriptionofmutationvariants,pleaserefer to[12].
Inaddition,weproposeanewmechanismcalledSwitchthat assuresthediversityofnoderepresentationswithinthepopulation.
It isembeddedinthespecifiedvariantsofthemutation(prune, expand,andnewtest)thatrequirefindingnewtestsintheinternal nodesormodelsintheleaves.TheSwitchmechanismwithassigned probabilitychangestheinitialrepresentationoftheselectednodes:
• thetestintheinternalnodewhencalculatinganewtestwiththe samenumberofoutputs:
–withthechangefromunivariatetooblique(internalnodes), anewcalculatedhyper-planeinvolvesanattributefromthe univariatetest;
Fig.6.Visualizationofcrossovers,fromtoplefttobottomright:(a)exchangesubtrees,(b)exchangebranches,(c)exchangetests,(d)withbest,and(e)asymmetric.
–withthechangefromobliquetounivariate(internalnodes),a newunivariatetestisbasedonarandomlyselectedattribute fromtheobliquetest.
• newlycreatednodesthatinherittheirrepresentationfromthe initialrepresentation
–leavesfliprepresentationfromtheregressionconstantvalueto linearregressionmodel(orviceversa)whenpruninginternal nodes;
–internalnodesfliprepresentationfromtheobliquetesttothe univariateone(orviceversa)whenexpandingtheleaves.
Intherestofthemutationvariants,theSwitchmechanismis notapplied.Preservingtherepresentationin,forexample,themod- ifytestorchangemodelvariantallowsexploringtheneighborhood spaceofsolutionsratherthanstartingthesearchfromanewplace.
3.5. Selection,terminationcondition,andsmoothing
Therankinglinearselectionisappliedasaselectionmechanism.
Ineachgeneration,thesingleindividualwiththehighestvalueof thefitnessfunctioninthecurrentpopulationiscopiedtothenext
one(elitiststrategy).Evolutionterminateswhenthefitnessofthe bestindividualinthepopulationisnotimprovedduringthefixed numberofgenerations(default:1000).Inthecaseofaslowcon- vergence,themaximumnumberofgenerationsisalsospecified (defaultvalue:10,000)tolimitthecomputationtime.
ThemGMTsystemusesaformofsmoothingthatwasinitially introducedintheM5algorithm[46]foraunivariatemodeltree.As inthebasicGMTsolution[12],thesmoothingisappliedonlytothe bestindividualreturnedbyEAwhentheevolutionaryinductionis finished.Theroleofthesmoothingistoreducesharpdiscontinu- itiesthatoccurbetweenadjacentlinearmodelsintheleaves.For everyinternalnodeofthetree,thesmoothingalgorithmgenerates anadditionallinearmodelthatisconstitutedfromfeaturesthat occuralongthepathfromtheleaftothenode.Thisway,eachtested instanceispredictednotonlybyasinglemodelataproperleafbut alsobythedifferentlinearmodelsgeneratedforeachoftheinter- nalnodesuptotherootnode.Duetotheobliquesplitsthatmay appearinthetreeinducedbythemGMTsystem,wehaveupdated thesmoothingalgorithmtouseallattributesthatconstitutethe testsintheinternalnodes.
4. Experimentalvalidation
Toverifytheroleoftreerepresentations,wehaveperformed experimentsonbothartificialandreallifedatasets.Inthefirstsec- tionbelow,theimpactofthetreerepresentationisassessedusing fouralgorithmswithdifferenthomogeneousrepresentationsand theproposedmGMTinducer.Next,themGMTsolutionis com- paredwiththe resultsfrompaper [23]that cover experiments withpopulartreeinducersonpubliclyavailabledatasets.Finally, thepredictionperformanceoftheproposedsolutionistestedona largergroupofpubliclyavailabledatasets.
In all experiments reported in this section, a default set of parametersforallalgorithmsisusedinalltesteddatasets.Results presentedinthepapercorrespondtoaveragesof50runs.
4.1. Roleofthetreerepresentation
Inthissection,fivetypesoftreerepresentationsareanalyzed:
• univariateGlobal RegressionTree(denotedasuGRT) that has axis-paralleldecisionbordersandsimpleconstantpredictionsin theleaves;
• univariateGlobalModelTree(uGMT)thathasaxis-paralleldeci- sionbordersand multivariatelinearregression modelsinthe leaves;
• obliqueGlobalRegressionTree(oGRT)thatconstructsoblique splitsonbinaryorcontinuous-valuedattributesintheinternal nodes;
• oblique Global Model Tree (oGMT) – the most complex tree representation(obliquesplitsandmultivariatelinearregression models);
• mixedGlobalModelTree(mGMT)thatself-adaptsthetreerep- resentationtothecurrentlyanalyzeddata.
Thefirstfouralgorithms arebasedontheexisting solutions [10–12],andtheproposedmGMTalgorithmcanbetreatedasan extensionandunification.
Theimpactofrepresentationonthetreeperformanceistested ontwosetsofartificiallygenerateddatasets:
• armchair–variantsofthedatasetproposedin[11]thatrequire atleastfourleavesandthreesplits;
• noisy–datasetswithvariousdatadistributionsandadditional noise.
Table1
DefaultparametersofuGRT,uGMT,oGRT,oGMTandmGMT.
Parameter Value
Populationsize 50individuals
Crossoverrate 20%assignedtothetree
Mutationrate 80%assignedtothetree
Elitismrate 2%ofthepopulation(1individual) Maximumamountofgeneration
withoutimprovement
1000
Maxtotalnumberofgeneration 10,000
Allartificialdatasetshaveanalyticallydefineddecisionborders that fit toparticular treerepresentations: univariate regression (UR), univariate model (UM), oblique regression (OR), oblique model(OM),andmixed(MIX).Eachsetcontains1000instances, where33%oftheinstancesconstitutethetrainingsetandtherestof theinstancesconstitutethetestingset.Avisualizationanddescrip- tionoftheartificialdatasetsareincludedinAppendix.
4.1.1. Parametertuning
Parameter tuning for EAs is a difficult task. Hopefully, all importantEAparameters(e.g.,populationsize,theprobabilityof mutation and crossover,etc.)and the decision treeparameters (maximumsize,minimumobjectstomake asplit)wereexper- imentallyvalidatedandtunedinpreviouspapersfor treeswith homogeneousrepresentations[12].Thosegeneralsettingsshould alsoworkwellwiththemixedregressiontrees;therefore,they canbetreatedasdefault.Themainparameterforallalgorithmsis giveninTable1andtheprobabilitiesofselectingmutationopera- torvariantsareshowninTable2(theprobabilityofselectingeach crossovervariantisequalto20%).Thisway,onlytheroleofthe Switchmechanismthatisembeddedindifferentvariantsofmuta- tionoperatorsanddirectlyswitchesthenoderepresentation,for example,fromunivariatetoobliqueintheinternalnodeandfrom constantpredictiontomultivariatelinearregressionmodelinthe leaf,shouldbeinvestigated.
Parametertuningwasperformedonthearmchairdataset(ver- sion AMix1)according tothe guidelinesproposed in [14].Four differentSwitchmechanismvaluesthatcorrespondtotheprob- abilityofnoderepresentationchangeweretested:0.0,0.1,0.25, and0.5.TheimpactofthissettingontheproposedmGMTsolution andontherestofthetreeinducerswitha homogeneousinitial populationwaschecked.Forexample,whentheuGRTalgorithm isevaluatedandtheSwitchmechanismisenabled,thentherepre- sentationofmutatednodeswithassignedprobabilitiescanchange.
Thisway,thealgorithm canhaveamixedrepresentationandis abletohaveobliquesplitsormultivariateregressionmodelsin theleaves.Figs.7and8showthetreeerror(RMSE)ofthebest
Table2
ProbabilityofselectingasinglevariantofthemutationoperatorinuGRT,uGMT, oGRT,oGMT,andmGMT.
Mutationoperator Probabilityin:
uGRT&oGRT uGMT,oGMT
&mGMT
prune 30 20
parentwithson(branches) 5
parentwithson(tests) 2.5
newdipolartest 10
newmemetictest 2.5
modifytest 15
recalculatemodels 2.5
dipolarexpand 30 20
memeticexpand 2.5
changemodel 0 20
Fig.7.ImpactoftheSwitchmechanismonthebestindividualfortheuGRT,uGMT,oGRT,andoGMTinducersonthearmchairAMix1dataset.
individualduringthelearningphaseperformedonthetrainingset forallfivealgorithms:uGRT,uGMT,oGRT,oGMT,andmGMT.
One can observethat the impactof theSwitch mechanism isespecially visibleforthealgorithmswithhomogeneousinitial populations.IntheFig.7,enablingtheSwitchistheonlywaytofind optimalsolutionsfortheuGRT,oGRT,anduGMTalgorithms.When theSwitchissetto0.5,whichequalstotherandomrepresentation selection,theinducershavethefastestconvergence.IntheoGMT
algorithm,whichiscapableoffindingtheoptimalsolutiononits own,theapplicationoftheSwitchmechanismshortenstheinduc- ers’convergencetime.Astatisticalanalysisoftheresultsusingthe FriedmantestandthecorrespondingDunn’smultiplecomparison test(significancelevelequals0.05),asrecommendedbyDemsar [13],showedthatthereexistssignificantdifferencesbetweenthe Switchparametersettingforallfouralgorithmswithstrictrepre- sentations.Theperformedexperimentsshowedthattheoptimal
Fig.8. ImpactoftheSwitchmechanismonthebestindividualforthemGMTinducer onthearmchairAMix1dataset.
Switchsettingsfortheinducerswithhomogeneousrepresentation is0.5,whichequalsarandomrepresentationofthenewlycreated node.
ThemGMTresultsvisualizedinFig.8showtherearenobigdif- ferencesbetweenthealgorithmswithvariousSwitchsettings.This canbeexplainedbytheconstructionoftheinitialpopulationofthe algorithm,whichiscomposedoffivetypesofrepresentations.The individualrepresentationscanbesuccessfullycombinedwiththe crossoveroperators.However,wecanobserveaslightimprove- mentinthealgorithmconvergencetotheoptimalsolutionwhen theSwitchmechanismisenabled.
4.1.2. Comparisonofrepresentations
Toshowtheimpactoftreerepresentation,fiveinducerswere testedontwogroupsofdatasets,armchairandnoisy(eachsetwith sixvariants),describedinAppendix.Fourmetricswerecollected andillustrated:
• RootMeanSquaredError(RMSE)calculatedonthetestingset (Fig.9);
• averagenumberofleavesinthetree(Fig.10);
• average number ofattributesin theregression modelsin the leaves(Fig.11).Univariateinducersarenotshownastheaverage numberoftestsisalwaysequaltotheirsizedecreasedby1;
• averagenumberofattributesinthetestsintheinternalnodes (Fig.12).Regressioninducersarenotshownastherearenomod- elsintheleaves;therefore,theaveragenumberofattributesis alwaysequaltozero.
Allfourfiguresshouldbeanalyzedatthesametimetounder- standhoweachglobalinducerworks.
Artificialdatasetsweredesignedtobesolvedbyoneofthetested systemsandtheabbreviationsofdatasetsrevealwhichinduceris mostappropriatetouse.Ingeneral,allinducerswiththeappropri- ateindividualrepresentationmanagedtosuccessfullyinducethe definedtree.However,whentherepresentationdoesnotfitthe specificsofthedataset,itistoosimple(univariatesplit,regres- sionleaf)ortooadvanced(obliquesplit,modelintheleaf),andthe evolutionaryinducerswithhomogeneousrepresentationssome- timeshave difficultyfindinganoptimalsolution.In contrastto thefourglobalinducerswithdefinedrepresentations(uGRT,oGRT, uGMT,andoGMT),themGMTsystemhasflexiblerepresentation.
TheresultspresentedinFigs.9–12showthatmGMTsuccessfully adaptsthetreestructuretothespecificsofeachartificiallygener- ateddataset.InthedatasetsdenotedasUR,UM,OR,andOM,the mGMTsystemmanagedtokeepupwiththealgorithmswhose structurefittedthecharacteristicsofthedatasets.AsfortheMix datasetvariants,mGMTmanagedtooutperformtherestofthetree inducers.
Thereareatleasttworeasonswhythesystemswithstrictrep- resentationsoftheindividualshavedifficultywithsomevariants ofthedatasets.Thefirstreasonisthelimitationintheindividuals’
representation.Thenoaxis-paralleldecisionborderscaneasilybe solvedwithoGRToroGMTalgorithms.Theapplicationofunivariate splitsmaycausethe‘staircaseeffect’[8].Thisproblemissimilarfor theregressiontreesappliedfortheUMandOMdatasetsthatrequire regressionmodelsintheleaves.Toovercometheserestrictionsin therepresentation,regressiontrees(uGRT,uGMT,oGRT)increase theirtreesizes;however,thelimitationstillexists.Thelargesizeof theinducedtreeinfluencesnotonlyitsclaritybutmaycauseover- fittingtothetrainingdataandthusalargerpredictionerror.Letus explainthisfordifferentvariantsofthearmchairdatasetdescribed inAppendix:
Fig.9. RelativeMeanSquaredError(RMSE)ofthealgorithmson12artificialdatasetsdescribedinAppendix.Testedalgorithms:univariateGlobalRegressionTree(uGRT), obliqueGlobalRegressionTree(oGRT),univariateGlobalModelTree(uGMT),obliqueGlobalModelTree(oGMT),andmixedGlobalModelTree(mGMT).Forillustrative purposes,thevaluesoftheRMSEerrorforthenoisydatasethavebeenrescaled.
Fig.10.AveragenumberofleavesinthetreefordifferentGMTvariants.Thedefinedbarsrepresentthereferencevaluesthatareequaltotheoptimalnumbersofleavesfor thedatasets.
• AUR–canbeperfectlypredictedbyunivariateregressiontrees.
Allaforementionedinducersarecapableoffindingdecisiontrees withsmallRMSE(Fig.9),fourleaves(Fig.10),threeunivariate splits(Fig.11),andnoregressionmodelintheleaves(Fig.12).
EventheoGMTsystemmanagedtofindthedecision borders despiteitsadvancednoderepresentationoftheindividuals.The univariatesplitisjustaspecial caseofanobliquesplit,and a constantvalueisjustaspecialcaseofaregressionmodel.
• AUM – canbe perfectly predictedby univariate model trees.
ThisdatasetisdifficultfortheuGRTandoGRTsystemsbecause theyinduceonlytheregressiontrees.Forthesesystems,wecan observeamuchhighererrorrate(RMAE)andtreesthatare2–3 timeslarger.Itistypicalfortheregressiontreestoreducethetree errorbyaddingmanyleaveswithasmallnumberofinstances.In addition,theoGRTinducerappliedunnecessaryobliquesplitsin ordertominimizeRMSE.Therestofthealgorithmshadnoprob- lemwiththis datasetandinducetrees withfourleaves, three
univariatesplits,andusuallyperfect regressionmodelsinthe leaves.
• AOR–canbeperfectlypredictedbyobliqueregressiontrees.The applicationof thealgorithms withunivariate tests(uGRT and uGMT)tothedatasetwithnon-axisparalleldecisionbordersled totheirapproximationbyaverycomplicatedstair-likestructure.
• AOM,AMix1,andAMix2–canbeperfectlypredictedonlybythe inducerswiththemostadvanced treerepresentation (oblique splitsandmodelsintheleaves).Therefore,itisnotsurprising thatthealgorithmsuGRT,oGRT,anduGMTinduceovergrown decision trees.It isworth notingthat ofthose threesystems, thelargesttreesareinducedbythesystemthat hasthemost limitationsintherepresentationoftheindividuals–theuGRT.
Thesecondissueisthelargesearchspaceoftheinducerswith advancedtreerepresentationthatrequiresextensivecalculations tofindagoodsolution.Itcanbeobservedespeciallyforthetrees
Fig.11.ThesumofanaveragenumberofattributesusedintheinternalnodetestsfordifferentGMTvariants.Thedefinedbarsareequaltotheoptimalnumbersofattributes intheinternalnodetests.
Fig.12.Thesumofanaveragenumberofattributesthatconstituteleaves’modelsfordifferentGMTvariants.Whentheinducedtreehasonlyregressionleaves,thenno valueappearsonthechartasintheAURorNURdataset.Thedefinedbarsareequaltotheoptimalnumbersofattributesintheleaves’modes.
withobliquesplits.Theoretically,theoGMTsystemshouldbeable tofindoptimaldecisionsinalldatasetsasitinducestrees with themostcomplexrepresentation.However,wecanobservethat thetreesinducedbytheoGRTandoGMTsystemsdonotalways haveanoptimalstructure(eveniftheyarecapableoffindingit).
ForthesimplestdatasetslikeAUR,theinducerswithobliquesplits needsignificantlymoretimethantheuGRTsolution(whichfinds optimaldecisionsalmostinstantly).Thissituationisillustratedin Fig.13.AlthoughthemGMTsystemneededadditionaliterationsto settheappropriatetreerepresentation,itstilloutperformsoGRT andoGMT.InFig.13,wecanseethatthelargestnumberofitera- tionsisrequiredbytheinducerswithobliquesplitsintheinternal nodes.TheoGRTandoGMTsystemsneededsignificantlymoreiter- ationsthanuGRTbutmanagedtosuccessfullyreducetheprediction errorcalculatedonthetrainingsettozero.Itcanbeseenthatthe oGMTinducerdidnotfindtheoptimaltreesizeforall50runs.For afewruns,theoGMTalgorithmneededover10,000iterations,but additionalexperimentsshowedthatitiscapableoffindingoptimal
Fig.13.Influenceofthetreerepresentationontheperformanceofthebestindivid- ualontheAURtrainingsetfor5inducers.
trees.Inaddition,thelooptimeforglobalinducersdifferssignifi- cantly,asdifferentvariantsofmutationoperatorsareapplied.The averagelooptimes(inseconds)calculatedforalliterationsofall artificialdatasetsareshowninTable3.
Allobservationsmadeforthearmchairdatasetarealsocon- firmedforthenoisydataset.ThemGMTsolutionmanagedtofind alldefinedsplitsandmodelsdespitethenoiseanddifferentdata distributions.FromthedatasetvisualizationincludedinAppendix, itcanbeseenthatfindingappropriatedecisionbordersisnotan easytask.TheoGMTusuallykeptupwithmGMTbecausethedeci- siontreewassmaller(thedefinedtreehastwointernalnodesand threeleaves).
Fromtheperformedexperiments,wecanobservethatevery inducerwiththestricttreerepresentationhasitsprosandcons.
Thesystemsforunivariateregressiontreesareveryfastandgener- atesimpletestsininternalnodes;however,thetreeerrorandsize areusuallylarge.Obliqueregressiontreesareslightlysmallerand moreaccurate,butthesearchingofthesplittingrulesismuchmore computationallydemandingandthesimplicityoftheoutputtree islost.Theresultsgenerallyconfirmwhatisobservedfortheuni- variateandobliqueclassificationtrees.Currently,themostpopular treesfortheregressionproblemsareunivariatemodeltrees.From theresults, weseethattheyhaveagoodtrade-offbetweenthe treecomplexityandthepredictionperformance;inducedtreesare accurateandrelativelysmall.Theoretically,ifthecomputational complexityofthealgorithmwasnotanissue,theobliquemodel treesshouldbeasgoodasallaforementionedalgorithmsintermsof predictionpower.Unfortunately,theinductiontimeandthecom- plexityofthesolutionoftenhinderthepracticalapplicationofthe inducer,especiallyforthelargedatasets.
Ifweknewthecharacteristicsofthedatasetwecouldpre-select theinducerwiththemostappropriaterepresentation.However, thisisoftennotthecase;therefore,itmaybebettertoconsider
Table3
Averagesinglelooptimesofalliterationsofalldatasetsfordifferentsystems.
Algorithm uGRT uGMT oGRT oGMT mGMT
Averagetime 0.0013 0.0036 0.0017 0.0043 0.0024
±(stdev) 0.0002 0.0004 0.0005 0.0010 0.0003
systems.
4.2. mGMTvs.populartreeapproaches
Inthissetofexperiments,wecomparedtheproposedmGMT inducerwithdifferentpopulartreeapproaches.Inordertomake apropercomparisonwiththestateoftheartandthelatestalgo- rithmsintheliterature,weselectedthebenchmarkdatasetsalso usedin [23]. We precisely followed the preprocessing and the experimentalprocedurein [23] tomake thecomparisontothe resultsofthat paper asaccurateaspossible. Twopopularsyn- theticdatasetsandtworeal-lifedatasetsfromthewell-knownUCI MachineLearningRepository[5]wereused:
• Fried–artificialdatasetproposedbyFriedman[25]containing tenindependentcontinuousattributesuniformlydistributedin theinterval[0,1].Thevalueoftheoutputvariableisobtained withtheequation:
y=10∗sin(∗x1∗x2)+20∗(x3−0.5)2 +10∗x4+5∗x5+(0,1);
• 3DSin–artificialdatasetcontainingtwo continuouspredictor attributesuniformlydistributedininterval[3,3],withtheoutput definedas
y=3∗sin(x1)∗sin(x2);
usingWEKAsoftware[19] ontwo additionaltreeinducers,and includedtheresultsforourmGMTsystem:
• Hingealgorithm [23]thatisbasedononhinginghyperplanes identifiedbyafuzzyclusteringalgorithm;
• FRT–fuzzyregressiontree;
• FMID–fuzzymodelidentification;
• CART–state-of-the-artunivariateregressiontreeproposedby Breimanetal.[7];
• REPTree(RT)–populartop-downinducerthatbuildsaunivariate regressiontreeusingvarianceandprunesitusingreduced-error pruning(withbackfitting);
• M5–state-of-the-artunivariatemodeltreeinducerproposedby Quinlan[46];
• mGMT–proposedglobaltreeinducerwithmixedrepresentation.
Theperformanceofthemodelsismeasuredbythe(RMSE),a wellknownregressionperformanceestimator.Testingwasper- formedwith10-foldcross-validation,and50runswereperformed forthetested(bytheauthors)algorithms.Wehavealsoincluded theinformationaboutthealgorithms’standarddeviation(unfortu- nately[23],donotincludethisinformation).Theresultsshownin Table4indicatethatthemGMTsolutioncansuccessfullycompete withpopulardecisiontreeinducers.
Asthemeanvalueisnotpresentedintheresearch[23],wehave performedFriedmantests(significancelevelequalto0.05)using RMSEerrorvaluesontwogroups:
• mGMTvsHinge,FRT,FMIDandCART;
• mGMTvsuGRT,uGMT,oGRT,oGMT,RTandM5.
Table4
ComparisonofRMSEresultsofdifferentalgorithms.Algorithmswith*weretestedin[23]andtheirresultsarerecalled.ResultsformGMTalsoincludethestandarddeviation ofRMSEandthenumberofleavesinthetree.ThesmallestRMSEandsizeresultsforeachdatasetarebolded.
Algorithm Metric Fried 3DSin Abalone Kinman
Hinge* RMSE 0.92 0.18 4.1 0.16
Leaves 8 11 8 6
CART* RMSE 2.12 0.17 2.87 0.23
Leaves 495.6 323.1 664.8 453.9
FMID* RMSE 2.41 0.31 2.19 0.20
Leaves 12 12 12 12
FRT* RMSE 0.70 0.18 2.19 0.15
Leaves 15 12 4 20
RT RMSE 2.25±0.10 0.6±0.01 2.33±0.13 0.19±0.01
Leaves 445.7±37.6 724.2±30.1 168.8±33.7 720.8±78.1
M5 RMSE 1.81±0.09 0.23±0.01 2.12±0.14 0.16±0.01
Leaves 52.5±13.5 197.3±11.8 8.59±3.2 109.7±18.0
mGMT RMSE 0.67±0.01 0.15±0.003 2.13±0.08 0.14±0.001
Leaves 14.9±2.2 53.6±8.9 2.1±0.7 6.4±1.3
uGRT RMSE 3.66±0.09 0.53±0.04 2.55±0.03 0.21±0.007
Leaves 11.5±0.8 40.0±0.54 4.4±0.34 11.4±1.2
uGMT RMSE 0.66±0.01 0.15±0.003 2.19±0.001 0.16±0.002
Leaves 16.4±0.43 56.3±1.9 2.1±0.03 8.6±0.6
oGRT RMSE 3.41±0.05 0.62±0.008 2.50±0.10 0.19±0.01
Leaves 5.7±0.03 22.5±1.3 3.4±0.05 6.6±0.2
oGMT RMSE 1.13±0.02 0.15±0.01 2.21±0.05 0.17±0.001
Leaves 6.6±0.4 44.7±1.9 2.1±0.09 4.4±0.2
Fig.14.AnexampleofinducedtreeformGMTfortheKinmandataset.
For first group, the Friedman test showed significant sta- tistical differences between algorithms (P value=0.0109, F- statistic=10.62);however,aDunn’smultiplecomparisontestdid notshowanysignificantdifferencesinranksum,whichmaybe causedbyasmallsamplesize(onlyfourvaluesforfivealgorithms).
For the second group, a Friedman test also showed significant statistical differences between algorithms (P value<0.0001, F- statistic=194.6).AcorrespondingDunn’smultiplecomparisontest showedsignificantdifferencesinranksumbetweenmGMTandall algorithmsexceptuGMT.ItshouldalsobenotedthatmGMTman- agedtoinducemuchsmallertrees,oftenbyanorderofmagnitude smallerthanthetestedcounterparts.Arelativelyhighernumberof leavesforthemGMTinducerforthe3DSinandFrieddatasetscan beexplainedbyhighnon-linearityinthedatasets.AsthemGMT appliesmultivariatelinear regressionfunctionsin theleaves, it requiresmoresplitstofittothenon-lineardatasetscharacteristics.
Thecostoffindingpossiblynewhiddenregularitiesisthetree inductiontime. Itis wellknownthattheEAsin comparisonto thegreedysolutionsareslower,andthemGMTisnoexception.
TheefficiencycomparisonbetweenmGMTandbothtestedgreedy inducersshowedthattheproposedsolutionissignificantlyslower (verified withFriedmantest, Pvalue <0.0001) than both algo- rithms:M5and RT.ThemGMTtreeinductiontimewassmaller tothatoftheGMTsolution[12](Table3)andtook,dependingon thedataset,fromseveralsecondstoafewminutesonaregular PCcomputer.However,theprocessofevolutionaryinductionis progressive;therefore,intermediatesolutionsfrompre-maturely abortedrunsmayalsoyieldhigh-qualityresults.Inaddition,EAs arenaturallypronetoparallelism;therefore,theefficiencyproblem canbepartiallymitigated.
InFig.14,wepresentoneofthetreesinducedbymGMTfor theKinmandataset.Forthisparticularreal-lifedataset,allinduced treescontainedobliqueandunivariatesplitsandalmostalways multivariatelinearregressionsintheleaves.Thismaysuggestthat thismixedrepresentationisthemostsuitableoneforthisparticular
datasetandmayrevealnewrelationshipsandinformationhidden inthedata.Theoutputtreeismuchsmallerand hasthesmall- estpredictionerror,especiallywhen comparedtotheresultsof state-of-the-artsolutionslikeCARTandM5.However,itshouldbe noticedthatincaseofthemixed,obliqueormodeltreesthesize ofthetreeisnotanaccuratereflectionofitscomplexity.Thetrees withmoreadvancedtreerepresentationareusuallysmallerwhich iswhytheM5algorithminducesmuchsmallertreesthanCART.
Therefore,even verysmalltreeinducedbythemGMTbutwith complexobliquesplitsandmodelsintheleavescanbelesscom- prehensiblethan,forexample,largerunivariateregressiontree.In anextremescenario,theproposedsolutioncanbeascomplexas treesinducedbytheoGMTsystemorassimpleasonesinduced bytheuGRTalgorithm.However,mGMTis capableofadjusting therepresentationofthenodestoautomaticallyfittotheanalyzed whichisnotpossibleinthecompetitivesolutionswhichhaveonly homogeneoustreerepresentation.Although,thetrade-offbetween thecomprehensibilityandpredictionperformanceinmGMTstill exits,itcanbeeasilyadjustedtotheuserpreferencesduetothe parametersinthefitnessfunctionofthemGMTalgorithm.
4.3. OverallpredictionperformanceofmGMT
Inthelaststepoftheexperiments,wecomparedthepredic- tionperformanceofthemGMTinducerwiththatofotherpopular systemsonmultipledatasets.TestswereperformedwithWEKA software[19]usingthecollectionofbenchmarkregressiondatasets provided byLouisTorgo[45].Fromthis packageof 30datasets (availableontheWEKApage),weselectedonlythosewithamin- imumof 1000instances,described inTable5.We decidedthat datasetswith,forexample,43instancesandtwovariablesarenot thebestforvalidation.ThedatasetshavebeenprocessedbyWEKA’s supervisedNominalToBinaryfilterthatconvertsnominalattributes intobinarynumericattributesandtheunsupervisedReplaceMiss- ingValuesfilterthatreplacesmissing valueswiththeattributes’
Table5
Datasetcharacteristics:name,numericattributesnumber(Num),nominalattributesnumber(Nom),andthenumberofinstances.
ID Name Num Nom Instances ID Name Num Nom Instances
1 2dplanes 10 0 40,768 11 elevators 18 0 8752
2 abalone 7 1 4177 12 fried 10 0 40,768
3 ailerons 40 0 13,750 13 house16H 16 0 22,784
4 bank32nh 32 0 8192 14 house8L 8 0 22,784
5 bank8FM 8 0 8192 15 kin8nm 8 0 8192
6 calhousing 8 0 20,640 16 mv 7 3 40,768
7 cpuact 21 0 8192 17 pol 48 0 15,000
8 cpusmall 12 0 8192 18 puma32H 32 0 8192
9 deltaailerons 5 0 7129 19 puma8NH 8 0 8192
10 deltaelevators 6 0 7129