perspective of decision The role tree representation in regression problems – Anevolutionary Applied Soft Computing

(1)

The role of decision tree representation in regression problems – An evolutionary perspective

Marcin Czajkowski

^∗

, Marek Kretowski

FacultyofComputerScience,BialystokUniversityofTechnology,Wiejska45a,15-351Bialystok,Poland

a r t i c l e i n f o

Articlehistory:

Received26September2015 Receivedinrevisedform20June2016 Accepted2July2016

Availableonline16July2016

Keywords:

Evolutionaryalgorithms Datamining

Regressiontrees

Self-adaptablerepresentation

a b s t r a c t

Aregressiontreeisatypeofdecisiontreethatcanbeappliedtosolveregressionproblems.Oneof itscharacteristicsisthatitmayhaveatleastfourdifferentnoderepresentations;internalnodescan beassociatedwithunivariateorobliquetests,whereastheleavescanbelinkedwithsimpleconstant predictionsormultivariateregressionmodels.Theobjectiveofthispaperistodemonstratetheimpact ofparticularrepresentationsontheinduceddecisiontrees.Asitisdifﬁcultifnotimpossibletochoosethe bestrepresentationforaparticularprobleminadvance,theissueisinvestigatedusinganewevolutionary algorithmforthedecisiontreeinductionwithastructurethatcanself-adapttothecurrentlyanalyzed data.Theproposedsolutionallowsdifferentleavesandinternalnodesrepresentationwithinasingletree.

Experimentsperformedusingartiﬁcialandreal-lifedatasetsshowtheimportanceoftreerepresentation intermsoferrorminimizationandtreesize.Inaddition,thepresentedsolutionmanagedtooutperform populartreeinducerswithdeﬁnedhomogeneousrepresentations.

1. Introduction

Datamining[18]canrevealimportantandinsightfulinforma- tionhiddenindata.However,appropriatetoolsandalgorithmsare requiredtoeffectively identifycorrelationsand patternswithin thedata.Decisiontrees[24,40]representoneofthemaintech- niquesfordiscriminantanalysispredictioninknowledgediscovery.

Thesuccessoftree-basedapproaches canbeexplainedbytheir easeofapplication,fastoperation,andeffectiveness.Furthermore, thehierarchicaltreestructure, in which appropriate testsfrom consecutivenodesare sequentiallyapplied,closely resembles a humanwayofdecisionmaking.Allthismakesdecisiontreeseasy tounderstand,evenforinexperiencedanalysts.Despite50yearsof researchondecisiontrees,manyproblemsstillremain[30],such assearchingonlyforalocallyoptimalsplitintheinternalnodes;

appropriatepruningcriterion, efﬁcientanalysisofcost-sensitive dataorperformingmulti-objectiveoptimization.Tohelpresolve someoftheseproblems,evolutionarycomputation(EC)hasbeen appliedtodecisiontreeinduction[2].Thestrengthofthisapproach liesintheglobalsearchforsplitsandpredictions.Itresultsinhigher accuracyand smalleroutputtrees compared topopulargreedy decisiontreeinducers.

∗ Correspondingauthor.

E-mailaddress:m.czajkowski@pb.edu.pl(M.Czajkowski).

Finding appropriate representation of the predictor before actuallearningisadifﬁculttaskformanydataminingalgorithms.

Often,thealgorithmstructuremustbepre-definedandfixeddur- ingitslife-cycle,whichisamajorbarrierindevelopingintelligent artificialsystems.Thisproblemiswellknown[20]inartificialneu- ralnetworkswherethetopologyandthenumberofneuronsis unknown,insupportvectormachineswiththeirdifferenttypesof kernels,andindecisiontreeswherethereisaneedtoselectthe typeofnoderepresentation.Onesolutionistoautomaticallyadapt thestructureofthealgorithmtotheanalyzedproblemduringthe learningphase,whichcanbeaccomplishedusingtheevolution- aryapproach[27,33].Thisapproachisalsoappliedtoclassification trees [29,26]wherea mixed testrepresentation in theinternal nodesispossible.

Inthispaper,wewanttoinvestigatetheroleofregressiontree representationanditsimpactonpredictiveaccuracyandinduced treesizeasithasnotbeensufficientlyexplored.Usingartificially generateddatasets,wewillrevealtheprosandconsoftreeswith differentrepresentation types, focusing mainlyonevolutionary inducedtreesforregressionproblems[2].Differencesintherep- resentationofregressiontrees[30]canoccurintwoplaces:inthe testsintheinternalnodesandinthepredictionsintheleaves.For real-lifeproblems,itisdifficulttosaywhichkindofdecisiontree (univariate,oblique,regression,model)shouldbeused.Itisoften almostimpossibletochoosethebestrepresentationinadvance.To topitall,formanyproblemsheterogeneousnoderepresentation http://dx.doi.org/10.1016/j.asoc.2016.07.007

(2)

isrequiredwithinthesametree.Thisiswhywealsostudyaspe- cializedevolutionaryalgorithm(EA)calledtheMixedGlobalModel Tree(mGMT).Itinducesadecisiontreethatwebelieveself-adapts itsstructuretothecurrentlyanalyzeddata.Theoutputtreemay havedifferentinternal nodeandleafrepresentations, andfor a givendatasetitmaybeasgoodorevenbetterthananytreewith strictrepresentation.

Thepaperisorganizedasfollows.Thenextsectionprovides a briefbackgroundonregression trees. Section 3describesthe proposedextensionforevolutionaryinducerswithhomogeneous representations.AllexperimentsarepresentedinSection4,and thelastsectioncomprisestheconclusionandsuggestionsforfuture work.

2. Decisiontrees

Wemayﬁnddifferentvariantsofdecisiontreesintheliterature [30].Theycanbegroupedaccordingtothetypeofproblemthey areappliedto,thewaytheyareinduced,orthetypeofstructure.

Inclassificationtrees,aclasslabelisassignedtoeachleaf.Usu- ally,itisthemajorityclassofalltraininginstancesthatreaches thatparticularleaf.Inthispaper,wefocusonregressiontreesthat maybeconsideredvariantsofdecisiontreesdesignedtoapproxi- matereal-valuedfunctionsinsteadofbeingusedforclassification tasks.Althoughregressiontreesarenotaspopularasclassification trees,theyarehighlycompetitivewithdifferentmachinelearning algorithms[35]andareoftenappliedtomanyreal-lifeproblems [16,28].

Inthecaseofthesimplestregressiontree,eachleafcontains aconstantvalue,usuallyanaveragevalueofthetargetattribute.

Amodeltreecanbeseenasanextensionofthetypicalregres- siontree[46,31].Theconstantvalueineachleafoftheregression treeisreplacedinthemodeltreebyalinear(ornonlinear)regres- sionfunction.Topredictthetargetvalue,thenewtestedinstance is followed downthetree froma root nodetoa leaf usingits attributevaluestomakeroutingdecisionsateachinternalnode.

Next,thepredictedvalueforthenewinstanceisevaluatedbased on a regression model in the leaf. Examples of predicted values of classiﬁcation, regression, and model trees are given in Fig.1.Thegraylevel colorof eachregionrepresentsadifferent classlabel(foraclassiﬁcationtree), andtheheightcorresponds to the value of the prediction function (regression and model trees).

Mostdecisiontreespartitionthefeaturespacewithaxis-parallel decisionborders[44].Thistypeoftreeiscalledunivariatebecause eachsplitinthenon-terminalnodeinvolvesasinglefeature.For continuous-valuedfeatures,inequalitytestswithbinaryoutcomes areusuallyapplied,andfornominalfeaturesmutuallyexclusive groupsoffeaturevaluesareassociatedwiththeoutcomes.When morethanonefeatureistakenintoaccounttobuildatestinan

internalnode, wedeal withmultivariatedecision trees[8].The mostcommonformofsuchatestisanobliquesplit,whichisbased onalinearcombinationoffeatures.Thedecisiontreethatapplies onlyobliquetestsisoftencalledobliqueorlinear,whereashetero- geneoustreeswithunivariate,linear,andothermultivariate(e.g., instance-based)testsarecalledmixedtrees[29].Fig.2showsan exampleofunivariateandobliquedecisiontrees.Wecanobserve thatifdecisionbordersarenotaxis-parallel,thenusingonlyuni- variatetestsmayleadtoanovercomplicatedclassifier.Thiskindof situationisknownasa‘staircaseeffect’[8]andcanbeavoidedby applyingmoresophisticatedmultivariatetests.Whileobliquetrees aregenerallysmaller,thetestsareusuallymoredifficulttointer- pret.Itshouldbeemphasizedthatthecomputationalcomplexity ofmultivariatetreeinductionissignificantlyhigherthanthatof univariatetreeinduction[3].

Theroleoftreerepresentationhassofarbeendiscussedmainly interms ofclassificationproblems.Thestudy[25,8] showsthat univariate inducers return larger trees than multivariate ones, andtheyareoftenlessaccurate.However,multivariatetreesare difficulttounderstandandinterpret,andthetreeinductionissig- nificantlyslower.Therefore,makingageneralconclusionisrisky asthemostimportantfactorsarethecharacteristicsofthepar- ticular dataset [25]. To thebest of our knowledge,there is no detailedreportthatreferstotheroleofrepresentationinregres- siontrees.Itcouldbeexpectedthatunivariate andmultivariate regressiontreesshouldbehavesimilarlytotheclassificationones.

However,there is stillan openquestionabouttheinfluenceof the leaves’ representation on the tree performance. The paper focuses on evolutionary induced regression trees; therefore, to go further, we must briefly describe the process of creating a decision treefromthetraining set.Thetwo mostpopularcon- ceptsforthedecisiontreeinductionarethetop-downandglobal approaches. Thefirstis basedonagreedy procedureknownas recursivepartitioning[39].Inthetop-downapproach,theinduc- tionalgorithmstartsfromtherootnodewherethelocallyoptimal splitissearchedaccordingtothegivenoptimalitymeasure.Next, thetraininginstancesareredirectedtothenewlycreatednodes, andthisprocessisrepeatedforeachnodeuntilastoppingcon- dition ismet.Additionally, post-pruning [15]is usually applied aftertheinductiontoavoidtheproblemofover-fittingthetraining data.

Oneofthemostpopularrepresentativesoftop-downinduced univariateregressiontreesisasolutionproposedbyBreimanetal.

calledClassiﬁcationAndRegressionTree(CART)[7].Thealgorithm searches for a locally optimalsplit that minimizes the sumof squaredresidualsandbuildsapiecewiseconstantmodelwitheach terminalnodeﬁttedwiththetrainingsamplemean.Othersolutions have managedtoimprovethepredictionaccuracy byreplacing singlevalues intheleaveswithmoreadvancedmodels.TheM5 system[46]inducesatreethatcontainsmultiplelinearmodelsin

Fig.1. Anillustrationofpredictedvaluesoftheclassiﬁcation,regression,andmodeltrees.

(3)

Fig.2.Anexampleofobliqueandunivariatedecisiontrees.

theleaves.AsolutioncalledStepwiseModelTreeInduction(SMOTI) [31]canbeviewedasanobliquemodeltreeastheregressionmod- elsareplacednotonlyintheleavesbutalsointheupperpartsof thetree.Allaforementionedmethodsinducetreeswiththegreedy strategy,whichisfastandgenerallyefﬁcientbutoftenproduces onlylocallyoptimalsolutions.

Theglobalapproachforthedecisiontreeinductionlimitsthe negativeeffects oflocally optimaldecisions.It triestosimulta- neouslysearchforthetreestructure,testsintheinternalnodes,and modelsintheleaves.Thisprocessisobviouslymuchmorecompu- tationallycomplexbutcanrevealhiddenregularitiesthatareoften undetectablebygreedymethods.Theglobalinductionismainly representedbysystemsbasedonanevolutionaryapproach[2,4];

however,therearesolutionsthatapply,forexample,antcolony optimization[36,6].

In the literature, there are relatively fewer evolutionary approachesfortheregressionandmodeltreesthanfortheclas- siﬁcation trees. Popular representatives of EA-based univariate regressiontreesaretheTARGETsolution[17]thatevolveaCART- likeregression treewithbasic geneticoperators and theuGRT algorithm[11]thatintroducesspecializedvariantsofmutationand crossover.AstronglytypedGP(GeneticProgramming)approach calledSTGPwasalsoproposed[21]forunivariateregressiontree induction. Thereare also globally induced systems that evolve univariatemodeltrees,suchastheE-Motiontree[1]thatimple- ments standard 1-point crossover and two different mutation strategiesandtheGMTsystem[12]thatincorporatesknowledge aboutthe inducingproblem for theglobal modeltree intothe evolutionarysearch.Therearealsopreliminarystudiesonoblique treescalledoGMT[10].Intheliterature,wemayalsoﬁndtheGP

(4)

Fig.3. ThemGMTprocessdiagram.

approachthatevolvesthemodeltreeswithnonlinearregression modelsintheleavescalledGPMCC[38].Itiscomposedfromthe GPtoevolve thestructure ofthemodel treesandGAtoevolve polynomialexpressions(GASOPE)[37].

3. MixedGlobalModelTree

Thispaperfocusesontherepresentation ofgloballyinduced regressionandmodeltreesanditsinﬂuenceontheoutputtree.

In this section,we proposean extensionfor theGMT and GRT systems[12]calledtheMixedGlobalModelTree(mGMT)tobet- terunderstandtheunderlyingprocessbehindtheselectionofthe representation.Withtheevolutionarytreeinduction,weareable notonlytosearchforanoptimaltreestructure,testsininternal nodes,ormodelsintheleavesbutalsotoself-adaptthetreerepre- sentation.Thegeneralstructureofthealgorithmfollowsatypical EAframework[32]withanunstructuredpopulationandagen- erationalselection.It canbetreated asauniﬁedframework for bothunivariateandobliquetestsintheinternalnodesandregres- sionand modelsleaves.ThemGMTdoesnot requiretosetthe treerepresentation in advance becausetheEA validates differ- entvariantsoftherepresentationsnotonlyonthetreelevelbut alsoonthenodelevelandmayinduceaheterogeneoustreethat wecalledamixedtree.Adescriptionoftheproposedapproachis given,especiallywithrespecttoissuesthatarespeciﬁctomixed trees.

TheprocessdiagramofthemGMTalgorithmisillustratedin Fig.3.Theproposedsolutionevolvestheregressionandmodeltrees intheiractualforms.Thecandidatesolutionsthatconstitutethe populationareinitializedwiththesemi-randomgreedystrategy andareevaluatedusingthemulti-objectiveweightformulafitness function.Iftheconvergencecriteriaisnotsatisfied,alinearrank- ingselectionisperformedtogetherwiththeelitiststrategy.Next, geneticoperatorsareapplied,includingdifferentvariantsofspe- cializedmutationsandcrossovers.Aftertheevolutionprocessis finished,thebestindividualfoundusingtheEAissmoothed.Each elementofthemGMTsolutionisdiscussedindetailinthefollowing sections.

3.1. Representation

Amixedregression treeis acomplexstructure inwhichthe number and the type of nodes and even the number of test outcomes are not known in advance for a given learning set.

Fig.4.AnexamplerepresentationofthemGMTindividual.

Therefore,thecandidatesolutionsthatconstitutethepopulation are not encoded and are representedin theiractualform (see Fig.4).

Therearethreepossibletesttypesintheinternalnodes:two univariate and onemultivariate.In thecase of univariate tests, a test representation concerns only one attribute and depends ontheconsideredattributetype.Forcontinuous-valuedfeatures, typicalinequalitytests withtwo outcomesare used.Fornomi- nalattributes,atleastoneattributevalueisassociatedwitheach branchstartinginthenode,whichmeansthataninternaldisjunc- tionisimplemented.Onlybinaryorcontinuous-valuedattributes areusedtoconstructtheobliquesplit.Thefeaturespacecanbe dividedintotworegionsbyahyperplane:

H(w,)={x:w,x=}, (1)

wherexisavectoroffeaturevalues(objects),w=[w1,...,wP]isa weightvector,isathreshold,w,xrepresentsaninnerproduct, andPisthenumberofindependentvariables.Eachhyperplaneis representedbyaﬁxed-sizeP+1–dimensionaltableofrealnum- berscorrespondingtotheweightvectorwandthethreshold.

IneachleafofthemGMTsystem,amultiplelinearmodelcan beconstructedusingthestandardregressiontechnique.Itiscal- culatedonlyforobjectsassociatedwiththatnode.Adependent variableyisexplainedbythelinearcombinationofmultipleinde- pendentvariablesx₁,x₂,...,xP:

y=ˇ0+ˇ1∗x1+ˇ2∗x2+...+ˇP∗xP, (2) whereˇ0,...,ˇPareﬁxedcoefﬁcientsthatminimizethesumof thesquaredresidualsofthemodel.IfallB_i(0<i≤P)areequalto 0,theleafnodewillbearegressionnodewithaconstantequal toˇ0.IfonlyoneB_i /=0then,wedealwithsimplelinearregres- sion; otherwiseeach leafcontainssimple ormultivariatelinear regression.

3.2. Initialization

Eachinitialindividualinthepopulationiscreatedwiththeclas- sicaltop-downapproachthatresemblestheM5solution[46].The initialpopulationofmGMTisheterogeneousandiscomposedof ﬁve typesof standard regression trees withdifferentrepresen- tations(fourhomogeneousandoneheterogeneous):aunivariate regressiontree;anobliqueregressiontree;aunivariatemodeltree;

anobliquemodeltree;anda mixedtreethatcontains different kindsoftestsintheinternalnodes(univariateandoblique)anddif- ferenttypesofleaves(regressionandmodel).Inmixedtrees,before each stepofrecursivepartitioning, thetype ofnodeis selected

(5)

Fig.5.Hyperplaneinitializationbasedonrandomlychosen‘longdipole’(left)andanexampleillustratinghowtheobliquetestiscreated(right).

randomlyandanappropriatetestormodelisgenerated.Theimpor- tanceofsucha heterogeneousinitialpopulationisitsdiversity.

Therecursivepartitioningisfinishedwhenthedependentvalue ispredictedforalltrainingobjectsinthenodeorthenumberof instancesinthenodeissmall(default:fiveinstances).Eachini- tialindividualiscreatedbasedonasemi-randomsubsampleofthe originaltrainingdata(default:10%ofdata)tokeep thebalance betweenexplorationandexploitation.Toensurethatthesubsam- plecontainsobjectswithvariousvaluesofthepredictedattribute, thetrainingdataissortedbythepredictedvalueandsplitintoa fixednumberofequal-sizefolds(default:10).Fromthesefolds, anequal numberofobjectsisrandomlychosenandplacedinto thesubsample.Testsinnon-terminalnodesarecalculatedfroma randomsubsetofattributes(default:50%).

Inthecaseoftheunivariateinternalnodes,oneofthreememetic searchstrategies[12]thatinvolvesemployingthelocallyoptimized testsischosen:

• LeastSquares(LS):thetestintheinternalnodeischosenaccord- ingtothenodeimpuritymeasuredbythesumofthesquared residuals.

• LeastAbsoluteDeviation(LAD):thetestreducesthesumofthe absolutedeviations.Itismorerobustandhasgreaterresistance tooutlyingvaluesthanLS.

• Dipolar:thetestisconstructedaccordingtothe‘longdipole’[12]

strategy.Atfirst,aninstancethat willconstitutethedipoleis randomlyselectedfrom thesetof instances fromthecurrent node.The restofthefeaturevectorsaresorted indecreasing orderaccordingtothedifferencebetweenthedependentvari- ablevaluesandtheselectedinstance.Thesecondinstancethat constitutesthedipoleshouldhaveamuchdifferentvaluethan thedependentvariable.Tofindit,weappliedamechanismsimi- lartotherankinglinearselection[32].Finally,thetestthatsplits thedipoleisconstructedbasedonarandomlyselectedattribute wheretheboundarythresholdisdefinedasamidpointbetween thepairsthatconstitutethedipole.

Thesearchstrategyusedtoﬁndsplitsintheinternalnodesis differentfortheobliquetests.Aneffectivetestinanon-terminal nodeissearchedonlyusingthedipolarstrategy.Fig.5(left)illus- tratesthehyperplaneinitializationbasedonarandomlychosen

‘longdipole’.ThehyperplaneH_ij(w,)splitsthedipole(xⁱ,x^j)in suchawaythatthetwofeaturevectorsxⁱandx^jaresituatedon theoppositesidesofthedividinghyperplane:

(w,xⁱ−)∗(w,x^j−)<0. (3) The hyperplane parameters are as follows: w=xⁱ−x^j and

=ı*w,xⁱ+(1−ı)*w,x^j,whereı∈(0,1)isarandomlydrawn coefﬁcientthatdeterminesthedistancebetweentheoppositeends

ofthedipole.H_ij(w,)isperpendiculartothesegmentconnecting thedipoleends.

Toprovideanumericexampleillustratinghowanobliquetest iscreated,let’simaginethetwo2dimensionalspaceillustratedin Fig.5(right).Aftertheselectionoftworandomlychosendipoles withCartesiancoordinatesequaltoA(1,1),B(5,3),andcoefﬁcient ı=0.5,thesplittinghyperplaneHparametersare:w[5−1,3−1]

and=0.5*((5−1)*(1+5))+0.5*((3−1)*(1+3))=16.Therefore, thehyperplane H_AB is a linedescribed as: y=−2*x+8. To per- forma split,wesimplycheck onwhich side ofthehyperplane Hallinstancesfromtheinternalnodearepositioned.Let’scon- siderpointC(1.5,2.5).Byapplyingittothehyperplaneequationw (1.5*4+2.5*2),weseethatthescore11issmallerthanthevalue of.Usingadifferentpoint,forexample,D(3.5,4.5)wouldresultin value23,whichmeansthatthepointDliesontheoppositesideof thehyperplanetopointC.Forthisparticularexample,theparame- terıequals0.5;therefore,thehyperplanewintersectsthemidpoint betweendipolesAandB.However,ifwechangetheparameterto ı=0.1,thenthehyperplanedenotedasH_AB shiftstowardspointA.

WecanobservethatforthishyperplaneHpointCandpointDlie onthesamesideandthusbothinstanceswouldbedirectedafter thesplittothesamesub-node.

3.3. Goodnessofﬁt

Theevolutionarysearchprocessisverysensitivetotheproper definitionofthefitnessfunction.Inthecontextofregressiontrees, a direct minimization of the prediction error measured in the learningsetusuallyleadstotheover-fittingproblem.Intypical top-downinduction ofdecision trees [39], thisproblemis par- tiallymitigatedbydefiningastoppingconditionandbyapplying post-pruning[15].Inthecaseoftheevolutionaryapproach,the multi-objective functionis required tominimizetheprediction errorandthetreecomplexityatthesametime.

Inourapproach,aBayesianinformationcriterion(BIC)[41]is usedasaﬁtnessfunction.Itwasshownthatthiscriterionworked wellwithregressionandmodeltrees[17,12]andoutperformsother popularapproaches.BICisgivenby:

Fit_BIC(T)=−2∗ln(L(T))+ln(n)∗k(T), (4) whereL(T)isthemaximumofthelikelihoodfunctionofthetreeT, nisthenumberofobservationsinthedata,andk(T)isthenumber ofmodelparametersinthetree.Thelog(likelihood)functionL(T) istypicalforregressionmodelsandcanbeexpressedas:

ln(L(T))=−0.5n∗[ln(2)+ln(SSe(T)/n)+1], (5) whereSSe(T)isthesumofsquaredresidualsofthetreeT.Theterm k(T)canalsobeviewedasapenaltyforover-parametrization.

Theproposedmixedtreerepresentationrequiresdeﬁninganew penaltyforthetreeover-parametrization.Itisratherobviousthat

(6)

ininternalnodesanobliquesplitbasedonafewfeaturesismore complexthanaunivariatetest.Thesameappliestothedifferent leafrepresentations.Asaconsequence,thetreecomplexityk(T) shouldnotonlyreﬂectthetreesizebutalsothecomplexityofthe testsininternalnodesandmodelsintheleaves.However,itisnot easytoarbitrarilysettheimportanceofdifferentmeasuresbecause itoftendependsonthedatasetbeinganalyzed.Insuchasituation, thetreecomplexityk(T)isdeﬁnedas:

k(T)=˛1∗Q(T)+˛2∗O(T)+˛3∗W(T), (6) whereQ(T)isthenumberofnodesinthemodeltreeT;O(T)isequal tothesumofthenumberofnon-zeroweightsinthehyperplanesin theinternalnodes,andW(T)isthesumofthenumberofattributes inthelinearmodelsintheleaves.Defaultvaluesoftheparame- tersare˛1=2.0,˛2=1.0,and˛3=1.0;however,furtherresearch todeterminetheirvaluesisneeded.Ifthei-thinternalnodeT_iis univariate,thevalueofO(T_i)equals1.Ifthej-thleafcontainsacon- stantvalue,thentheparameterW(T_j)equalszerobecausethere arenoattributesinthelinearmodel.Otherwise,thevalueofO(Ti) andW(T_j)equalsthenumberofattributesusedtobuildthetestin internalnodeiorthemodelinleafj.

Theflexibilityofthefitnessfunctionallowsitssimpleconfig- urationbased onadditionalknowledge or userpreferences,for example,ifusersknowthebasicrelationshipsinthedataorwantto limittreerepresentationstothedesiredones,thefitnessfunction canassignahighvalueto˛2or˛3orboth.

3.4. Geneticoperators

Tomaintaingeneticdiversity,themGMTalgorithmappliestwo specializedgeneticoperatorscorrespondingtoclassicalmutation andcrossover.Ingloballyinducedtreeswithstrictrepresentations, thereareseveralvariantsoftheoperators[11,12];however,their availabilitymainlydependsontherepresentationtype.Bothoper- atorsareappliedwithagivenprobabilityandinfluencethetree structure,thetestsinnon-terminalnodes,andoptionallythemod- elsintheleaves.Afteranysuccessfulmutationorcrossover,itis usuallynecessarytorelocatelearningvectorsbetweenthepartsof thetreerootedinthealterednode.Thiscancausepruningofcertain partsofthetreethatdonotcontainanylearningvectors.Inaddi- tion,thecorrespondingmodelsintheaffectedindividualleavesare recalculated.Duetoperformancereasons,thecoefficientsinthe existinglinearmodelsarerecalculatedtofitarandomlyselected sampleoftheactualdata(nomorethan50instances)inthecorre- spondingleaves.

Eachcrossoverbeginswithrandomlyselectingtwoindividuals fromthepopulationthatwillbeaffected.Next,thecrossoverpoints inbothindividualsaredetermined.Wehaveadaptedallvariants proposedintheunivariatetreeinducer[12]toworkwiththemixed representation,visualizedinFig.6:

(a)exchangesubtrees:exchangedofsubtreesstartinginrandomly selectednodes;

(b) exchange branches: exchanges of branches that starts from selectednodesinrandomorder;

(c) exchange tests: recombines the tests (univariate nominal, univariate continuous-valued, and oblique) associated with randomlyselectedinternalnodes;

(d)withbest:crossoverswiththebestindividual;

(e)asymmetric: duplicates subtrees with small mean absolute errorsandreplacesnodeswithhigherrors.

Selected nodes for the recombination must have the same numberofoutputs;however,theymayhavedifferentrepresen- tations.Thiswaycrossoversshiftnotonlythetreestructurebut alsothenodes’representations.Inthevariants(d)withbestand

(e) asymmetric, the additional mechanism is applied to decide whichnodewouldbeaffected.Thealgorithmranksalltreenodes in bothindividuals accordingtotheirabsoluteerrordividedby thenumber of instancesin thenode. Theprobability ofselect- ingnodesisproportionaltotherankinalinearway.Thenodes with a small averageerror per instance are more likely to be donors,whereas theweaknodes(withahighaverageerrorper instance)aremorelikelytobereplacedbythedonorsfromthe second individual (and have a higher probability of becoming receivers).

Themutationofanindividualstartswiththeselectionofanode type (equal probabilityof selecting a leafor an internalnode).

Next,arankedlistofnodesoftheselectedtypeforthisindivid- ualiscreated.Dependingonthetypeofnode,therankingtakes intoaccountthelocationforinternalnodes(nodesinthelower partsofthetreearemutatedwithhigherprobability)andthepre- dictionerrorofthenode(nodeswithahighererrorperinstance aremorelikelytobemutated).Finally,amechanismanalogousto therankinglinearselection[32]isappliedtodecidewhichnode intheindividualwillbeaffected.Dependingonthenode’srepre- sentation,differentvariantsofoperatorsareavailableininternal nodes:

• prune:changesinternalnodetoaleaf(actslikeapruningproce- dure);

• parentwithchild(branches):replacesaparentnodewitharan- domlyselectedchildnode(internalpruning);

• parentwithchild(tests):exchangestestsbetweenparentandran- domlyselectedchildnodes;

• newdipolartest:testsinaffectednodeisreinitializedbyanew oneselectedusingthedipolarstrategy;

• newmemetictest:testsinnodeisreinitializedbyoneoftheopti- malitystrategiesproposedinSection3.2;

• modifytest: shiftshyperplaneorsetrandomweights (oblique test);shiftsthreshold(univariatetestonacontinuousattribute) or re-groups nominal attribute values by adding/merging branchesormovingvaluesbetweenthem;

• recalculatemodels:recursivelyrecalculateslinearmodelsusing alltheinstancesinthecorrespondingleaves;

andintheleaves:

• dipolarexpand:transformsleafintointernalnodewitha new dipolartest(randomtype);

• memeticexpand:transformsleafintointernalnodewithanew testselectedbyoneoftheoptimalitystrategies;

• changemodel:extends/simpliﬁes/changesthelinearmodelinthe leafbyadding/removing/replacingarandomlychosenattribute orremovingtheleastsigniﬁcantone.

Foramoredetaileddescriptionofmutationvariants,pleaserefer to[12].

Inaddition,weproposeanewmechanismcalledSwitchthat assuresthediversityofnoderepresentationswithinthepopulation.

It isembeddedinthespeciﬁedvariantsofthemutation(prune, expand,andnewtest)thatrequireﬁndingnewtestsintheinternal nodesormodelsintheleaves.TheSwitchmechanismwithassigned probabilitychangestheinitialrepresentationoftheselectednodes:

• thetestintheinternalnodewhencalculatinganewtestwiththe samenumberofoutputs:

–withthechangefromunivariatetooblique(internalnodes), anewcalculatedhyper-planeinvolvesanattributefromthe univariatetest;

(7)

Fig.6.Visualizationofcrossovers,fromtoplefttobottomright:(a)exchangesubtrees,(b)exchangebranches,(c)exchangetests,(d)withbest,and(e)asymmetric.

–withthechangefromobliquetounivariate(internalnodes),a newunivariatetestisbasedonarandomlyselectedattribute fromtheobliquetest.

• newlycreatednodesthatinherittheirrepresentationfromthe initialrepresentation

–leavesﬂiprepresentationfromtheregressionconstantvalueto linearregressionmodel(orviceversa)whenpruninginternal nodes;

–internalnodesﬂiprepresentationfromtheobliquetesttothe univariateone(orviceversa)whenexpandingtheleaves.

Intherestofthemutationvariants,theSwitchmechanismis notapplied.Preservingtherepresentationin,forexample,themod- ifytestorchangemodelvariantallowsexploringtheneighborhood spaceofsolutionsratherthanstartingthesearchfromanewplace.

3.5. Selection,terminationcondition,andsmoothing

Therankinglinearselectionisappliedasaselectionmechanism.

Ineachgeneration,thesingleindividualwiththehighestvalueof theﬁtnessfunctioninthecurrentpopulationiscopiedtothenext

(8)

one(elitiststrategy).Evolutionterminateswhenthefitnessofthe bestindividualinthepopulationisnotimprovedduringthefixed numberofgenerations(default:1000).Inthecaseofaslowcon- vergence,themaximumnumberofgenerationsisalsospecified (defaultvalue:10,000)tolimitthecomputationtime.

ThemGMTsystemusesaformofsmoothingthatwasinitially introducedintheM5algorithm[46]foraunivariatemodeltree.As inthebasicGMTsolution[12],thesmoothingisappliedonlytothe bestindividualreturnedbyEAwhentheevolutionaryinductionis ﬁnished.Theroleofthesmoothingistoreducesharpdiscontinu- itiesthatoccurbetweenadjacentlinearmodelsintheleaves.For everyinternalnodeofthetree,thesmoothingalgorithmgenerates anadditionallinearmodelthatisconstitutedfromfeaturesthat occuralongthepathfromtheleaftothenode.Thisway,eachtested instanceispredictednotonlybyasinglemodelataproperleafbut alsobythedifferentlinearmodelsgeneratedforeachoftheinter- nalnodesuptotherootnode.Duetotheobliquesplitsthatmay appearinthetreeinducedbythemGMTsystem,wehaveupdated thesmoothingalgorithmtouseallattributesthatconstitutethe testsintheinternalnodes.

4. Experimentalvalidation

Toverifytheroleoftreerepresentations,wehaveperformed experimentsonbothartiﬁcialandreallifedatasets.Intheﬁrstsec- tionbelow,theimpactofthetreerepresentationisassessedusing fouralgorithmswithdifferenthomogeneousrepresentationsand theproposedmGMTinducer.Next,themGMTsolutionis com- paredwiththe resultsfrompaper [23]that cover experiments withpopulartreeinducersonpubliclyavailabledatasets.Finally, thepredictionperformanceoftheproposedsolutionistestedona largergroupofpubliclyavailabledatasets.

In all experiments reported in this section, a default set of parametersforallalgorithmsisusedinalltesteddatasets.Results presentedinthepapercorrespondtoaveragesof50runs.

4.1. Roleofthetreerepresentation

Inthissection,ﬁvetypesoftreerepresentationsareanalyzed:

• univariateGlobal RegressionTree(denotedasuGRT) that has axis-paralleldecisionbordersandsimpleconstantpredictionsin theleaves;

• univariateGlobalModelTree(uGMT)thathasaxis-paralleldeci- sionbordersand multivariatelinearregression modelsinthe leaves;

• obliqueGlobalRegressionTree(oGRT)thatconstructsoblique splitsonbinaryorcontinuous-valuedattributesintheinternal nodes;

• oblique Global Model Tree (oGMT) – the most complex tree representation(obliquesplitsandmultivariatelinearregression models);

• mixedGlobalModelTree(mGMT)thatself-adaptsthetreerep- resentationtothecurrentlyanalyzeddata.

Theﬁrstfouralgorithms arebasedontheexisting solutions [10–12],andtheproposedmGMTalgorithmcanbetreatedasan extensionanduniﬁcation.

Theimpactofrepresentationonthetreeperformanceistested ontwosetsofartiﬁciallygenerateddatasets:

• armchair–variantsofthedatasetproposedin[11]thatrequire atleastfourleavesandthreesplits;

• noisy–datasetswithvariousdatadistributionsandadditional noise.

Table1

DefaultparametersofuGRT,uGMT,oGRT,oGMTandmGMT.

Parameter Value

Populationsize 50individuals

Crossoverrate 20%assignedtothetree

Mutationrate 80%assignedtothetree

Elitismrate 2%ofthepopulation(1individual) Maximumamountofgeneration

withoutimprovement

1000

Maxtotalnumberofgeneration 10,000

Allartificialdatasetshaveanalyticallydefineddecisionborders that fit toparticular treerepresentations: univariate regression (UR), univariate model (UM), oblique regression (OR), oblique model(OM),andmixed(MIX).Eachsetcontains1000instances, where33%oftheinstancesconstitutethetrainingsetandtherestof theinstancesconstitutethetestingset.Avisualizationanddescrip- tionoftheartificialdatasetsareincludedinAppendix.

4.1.1. Parametertuning

Parameter tuning for EAs is a difﬁcult task. Hopefully, all importantEAparameters(e.g.,populationsize,theprobabilityof mutation and crossover,etc.)and the decision treeparameters (maximumsize,minimumobjectstomake asplit)wereexper- imentallyvalidatedandtunedinpreviouspapersfor treeswith homogeneousrepresentations[12].Thosegeneralsettingsshould alsoworkwellwiththemixedregressiontrees;therefore,they canbetreatedasdefault.Themainparameterforallalgorithmsis giveninTable1andtheprobabilitiesofselectingmutationopera- torvariantsareshowninTable2(theprobabilityofselectingeach crossovervariantisequalto20%).Thisway,onlytheroleofthe Switchmechanismthatisembeddedindifferentvariantsofmuta- tionoperatorsanddirectlyswitchesthenoderepresentation,for example,fromunivariatetoobliqueintheinternalnodeandfrom constantpredictiontomultivariatelinearregressionmodelinthe leaf,shouldbeinvestigated.

Parametertuningwasperformedonthearmchairdataset(ver- sion AMix1)according tothe guidelinesproposed in [14].Four differentSwitchmechanismvaluesthatcorrespondtotheprob- abilityofnoderepresentationchangeweretested:0.0,0.1,0.25, and0.5.TheimpactofthissettingontheproposedmGMTsolution andontherestofthetreeinducerswitha homogeneousinitial populationwaschecked.Forexample,whentheuGRTalgorithm isevaluatedandtheSwitchmechanismisenabled,thentherepre- sentationofmutatednodeswithassignedprobabilitiescanchange.

Thisway,thealgorithm canhaveamixedrepresentationandis abletohaveobliquesplitsormultivariateregressionmodelsin theleaves.Figs.7and8showthetreeerror(RMSE)ofthebest

Table2

ProbabilityofselectingasinglevariantofthemutationoperatorinuGRT,uGMT, oGRT,oGMT,andmGMT.

Mutationoperator Probabilityin:

uGRT&oGRT uGMT,oGMT

&mGMT

prune 30 20

parentwithson(branches) 5

parentwithson(tests) 2.5

newdipolartest 10

newmemetictest 2.5

modifytest 15

recalculatemodels 2.5

dipolarexpand 30 20

memeticexpand 2.5

changemodel 0 20

(9)

Fig.7.ImpactoftheSwitchmechanismonthebestindividualfortheuGRT,uGMT,oGRT,andoGMTinducersonthearmchairAMix1dataset.

individualduringthelearningphaseperformedonthetrainingset forallﬁvealgorithms:uGRT,uGMT,oGRT,oGMT,andmGMT.

One can observethat the impactof theSwitch mechanism isespecially visibleforthealgorithmswithhomogeneousinitial populations.IntheFig.7,enablingtheSwitchistheonlywaytoﬁnd optimalsolutionsfortheuGRT,oGRT,anduGMTalgorithms.When theSwitchissetto0.5,whichequalstotherandomrepresentation selection,theinducershavethefastestconvergence.IntheoGMT

algorithm,whichiscapableoffindingtheoptimalsolutiononits own,theapplicationoftheSwitchmechanismshortenstheinduc- ers’convergencetime.Astatisticalanalysisoftheresultsusingthe FriedmantestandthecorrespondingDunn’smultiplecomparison test(significancelevelequals0.05),asrecommendedbyDemsar [13],showedthatthereexistssignificantdifferencesbetweenthe Switchparametersettingforallfouralgorithmswithstrictrepre- sentations.Theperformedexperimentsshowedthattheoptimal

(10)

Fig.8. ImpactoftheSwitchmechanismonthebestindividualforthemGMTinducer onthearmchairAMix1dataset.

Switchsettingsfortheinducerswithhomogeneousrepresentation is0.5,whichequalsarandomrepresentationofthenewlycreated node.

ThemGMTresultsvisualizedinFig.8showtherearenobigdif- ferencesbetweenthealgorithmswithvariousSwitchsettings.This canbeexplainedbytheconstructionoftheinitialpopulationofthe algorithm,whichiscomposedofﬁvetypesofrepresentations.The individualrepresentationscanbesuccessfullycombinedwiththe crossoveroperators.However,wecanobserveaslightimprove- mentinthealgorithmconvergencetotheoptimalsolutionwhen theSwitchmechanismisenabled.

4.1.2. Comparisonofrepresentations

Toshowtheimpactoftreerepresentation,ﬁveinducerswere testedontwogroupsofdatasets,armchairandnoisy(eachsetwith sixvariants),describedinAppendix.Fourmetricswerecollected andillustrated:

• RootMeanSquaredError(RMSE)calculatedonthetestingset (Fig.9);

• averagenumberofleavesinthetree(Fig.10);

• average number ofattributesin theregression modelsin the leaves(Fig.11).Univariateinducersarenotshownastheaverage numberoftestsisalwaysequaltotheirsizedecreasedby1;

• averagenumberofattributesinthetestsintheinternalnodes (Fig.12).Regressioninducersarenotshownastherearenomod- elsintheleaves;therefore,theaveragenumberofattributesis alwaysequaltozero.

Allfourﬁguresshouldbeanalyzedatthesametimetounder- standhoweachglobalinducerworks.

Artificialdatasetsweredesignedtobesolvedbyoneofthetested systemsandtheabbreviationsofdatasetsrevealwhichinduceris mostappropriatetouse.Ingeneral,allinducerswiththeappropri- ateindividualrepresentationmanagedtosuccessfullyinducethe definedtree.However,whentherepresentationdoesnotfitthe specificsofthedataset,itistoosimple(univariatesplit,regres- sionleaf)ortooadvanced(obliquesplit,modelintheleaf),andthe evolutionaryinducerswithhomogeneousrepresentationssome- timeshave difficultyfindinganoptimalsolution.In contrastto thefourglobalinducerswithdefinedrepresentations(uGRT,oGRT, uGMT,andoGMT),themGMTsystemhasflexiblerepresentation.

TheresultspresentedinFigs.9–12showthatmGMTsuccessfully adaptsthetreestructuretothespecificsofeachartificiallygener- ateddataset.InthedatasetsdenotedasUR,UM,OR,andOM,the mGMTsystemmanagedtokeepupwiththealgorithmswhose structurefittedthecharacteristicsofthedatasets.AsfortheMix datasetvariants,mGMTmanagedtooutperformtherestofthetree inducers.

Thereareatleasttworeasonswhythesystemswithstrictrep- resentationsoftheindividualshavedifﬁcultywithsomevariants ofthedatasets.Theﬁrstreasonisthelimitationintheindividuals’

representation.Thenoaxis-paralleldecisionborderscaneasilybe solvedwithoGRToroGMTalgorithms.Theapplicationofunivariate splitsmaycausethe‘staircaseeffect’[8].Thisproblemissimilarfor theregressiontreesappliedfortheUMandOMdatasetsthatrequire regressionmodelsintheleaves.Toovercometheserestrictionsin therepresentation,regressiontrees(uGRT,uGMT,oGRT)increase theirtreesizes;however,thelimitationstillexists.Thelargesizeof theinducedtreeinﬂuencesnotonlyitsclaritybutmaycauseover- ﬁttingtothetrainingdataandthusalargerpredictionerror.Letus explainthisfordifferentvariantsofthearmchairdatasetdescribed inAppendix:

Fig.9. RelativeMeanSquaredError(RMSE)ofthealgorithmson12artiﬁcialdatasetsdescribedinAppendix.Testedalgorithms:univariateGlobalRegressionTree(uGRT), obliqueGlobalRegressionTree(oGRT),univariateGlobalModelTree(uGMT),obliqueGlobalModelTree(oGMT),andmixedGlobalModelTree(mGMT).Forillustrative purposes,thevaluesoftheRMSEerrorforthenoisydatasethavebeenrescaled.

(11)

Fig.10.AveragenumberofleavesinthetreefordifferentGMTvariants.Thedeﬁnedbarsrepresentthereferencevaluesthatareequaltotheoptimalnumbersofleavesfor thedatasets.

• AUR–canbeperfectlypredictedbyunivariateregressiontrees.

Allaforementionedinducersarecapableofﬁndingdecisiontrees withsmallRMSE(Fig.9),fourleaves(Fig.10),threeunivariate splits(Fig.11),andnoregressionmodelintheleaves(Fig.12).

EventheoGMTsystemmanagedtoﬁndthedecision borders despiteitsadvancednoderepresentationoftheindividuals.The univariatesplitisjustaspecial caseofanobliquesplit,and a constantvalueisjustaspecialcaseofaregressionmodel.

• AUM – canbe perfectly predictedby univariate model trees.

ThisdatasetisdifﬁcultfortheuGRTandoGRTsystemsbecause theyinduceonlytheregressiontrees.Forthesesystems,wecan observeamuchhighererrorrate(RMAE)andtreesthatare2–3 timeslarger.Itistypicalfortheregressiontreestoreducethetree errorbyaddingmanyleaveswithasmallnumberofinstances.In addition,theoGRTinducerappliedunnecessaryobliquesplitsin ordertominimizeRMSE.Therestofthealgorithmshadnoprob- lemwiththis datasetandinducetrees withfourleaves, three

univariatesplits,andusuallyperfect regressionmodelsinthe leaves.

• AOR–canbeperfectlypredictedbyobliqueregressiontrees.The applicationof thealgorithms withunivariate tests(uGRT and uGMT)tothedatasetwithnon-axisparalleldecisionbordersled totheirapproximationbyaverycomplicatedstair-likestructure.

• AOM,AMix1,andAMix2–canbeperfectlypredictedonlybythe inducerswiththemostadvanced treerepresentation (oblique splitsandmodelsintheleaves).Therefore,itisnotsurprising thatthealgorithmsuGRT,oGRT,anduGMTinduceovergrown decision trees.It isworth notingthat ofthose threesystems, thelargesttreesareinducedbythesystemthat hasthemost limitationsintherepresentationoftheindividuals–theuGRT.

Thesecondissueisthelargesearchspaceoftheinducerswith advancedtreerepresentationthatrequiresextensivecalculations toﬁndagoodsolution.Itcanbeobservedespeciallyforthetrees

Fig.11.ThesumofanaveragenumberofattributesusedintheinternalnodetestsfordifferentGMTvariants.Thedeﬁnedbarsareequaltotheoptimalnumbersofattributes intheinternalnodetests.

(12)

Fig.12.Thesumofanaveragenumberofattributesthatconstituteleaves’modelsfordifferentGMTvariants.Whentheinducedtreehasonlyregressionleaves,thenno valueappearsonthechartasintheAURorNURdataset.Thedeﬁnedbarsareequaltotheoptimalnumbersofattributesintheleaves’modes.

withobliquesplits.Theoretically,theoGMTsystemshouldbeable toﬁndoptimaldecisionsinalldatasetsasitinducestrees with themostcomplexrepresentation.However,wecanobservethat thetreesinducedbytheoGRTandoGMTsystemsdonotalways haveanoptimalstructure(eveniftheyarecapableofﬁndingit).

ForthesimplestdatasetslikeAUR,theinducerswithobliquesplits needsignificantlymoretimethantheuGRTsolution(whichfinds optimaldecisionsalmostinstantly).Thissituationisillustratedin Fig.13.AlthoughthemGMTsystemneededadditionaliterationsto settheappropriatetreerepresentation,itstilloutperformsoGRT andoGMT.InFig.13,wecanseethatthelargestnumberofitera- tionsisrequiredbytheinducerswithobliquesplitsintheinternal nodes.TheoGRTandoGMTsystemsneededsignificantlymoreiter- ationsthanuGRTbutmanagedtosuccessfullyreducetheprediction errorcalculatedonthetrainingsettozero.Itcanbeseenthatthe oGMTinducerdidnotfindtheoptimaltreesizeforall50runs.For afewruns,theoGMTalgorithmneededover10,000iterations,but additionalexperimentsshowedthatitiscapableoffindingoptimal

Fig.13.Inﬂuenceofthetreerepresentationontheperformanceofthebestindivid- ualontheAURtrainingsetfor5inducers.

trees.Inaddition,thelooptimeforglobalinducersdifferssigniﬁ- cantly,asdifferentvariantsofmutationoperatorsareapplied.The averagelooptimes(inseconds)calculatedforalliterationsofall artiﬁcialdatasetsareshowninTable3.

Allobservationsmadeforthearmchairdatasetarealsocon- firmedforthenoisydataset.ThemGMTsolutionmanagedtofind alldefinedsplitsandmodelsdespitethenoiseanddifferentdata distributions.FromthedatasetvisualizationincludedinAppendix, itcanbeseenthatfindingappropriatedecisionbordersisnotan easytask.TheoGMTusuallykeptupwithmGMTbecausethedeci- siontreewassmaller(thedefinedtreehastwointernalnodesand threeleaves).

Fromtheperformedexperiments,wecanobservethatevery inducerwiththestricttreerepresentationhasitsprosandcons.

Thesystemsforunivariateregressiontreesareveryfastandgener- atesimpletestsininternalnodes;however,thetreeerrorandsize areusuallylarge.Obliqueregressiontreesareslightlysmallerand moreaccurate,butthesearchingofthesplittingrulesismuchmore computationallydemandingandthesimplicityoftheoutputtree islost.Theresultsgenerallyconﬁrmwhatisobservedfortheuni- variateandobliqueclassiﬁcationtrees.Currently,themostpopular treesfortheregressionproblemsareunivariatemodeltrees.From theresults, weseethattheyhaveagoodtrade-offbetweenthe treecomplexityandthepredictionperformance;inducedtreesare accurateandrelativelysmall.Theoretically,ifthecomputational complexityofthealgorithmwasnotanissue,theobliquemodel treesshouldbeasgoodasallaforementionedalgorithmsintermsof predictionpower.Unfortunately,theinductiontimeandthecom- plexityofthesolutionoftenhinderthepracticalapplicationofthe inducer,especiallyforthelargedatasets.

Ifweknewthecharacteristicsofthedatasetwecouldpre-select theinducerwiththemostappropriaterepresentation.However, thisisoftennotthecase;therefore,itmaybebettertoconsider

Table3

Averagesinglelooptimesofalliterationsofalldatasetsfordifferentsystems.

Algorithm uGRT uGMT oGRT oGMT mGMT

Averagetime 0.0013 0.0036 0.0017 0.0043 0.0024

±(stdev) 0.0002 0.0004 0.0005 0.0010 0.0003

(13)

systems.

4.2. mGMTvs.populartreeapproaches

Inthissetofexperiments,wecomparedtheproposedmGMT inducerwithdifferentpopulartreeapproaches.Inordertomake apropercomparisonwiththestateoftheartandthelatestalgo- rithmsintheliterature,weselectedthebenchmarkdatasetsalso usedin [23]. We precisely followed the preprocessing and the experimentalprocedurein [23] tomake thecomparisontothe resultsofthat paper asaccurateaspossible. Twopopularsyn- theticdatasetsandtworeal-lifedatasetsfromthewell-knownUCI MachineLearningRepository[5]wereused:

• Fried–artiﬁcialdatasetproposedbyFriedman[25]containing tenindependentcontinuousattributesuniformlydistributedin theinterval[0,1].Thevalueoftheoutputvariableisobtained withtheequation:

y=10∗sin(∗x1∗x2)+20∗(x3−0.5)² +10∗x4+5∗x5+(0,1);

• 3DSin–artiﬁcialdatasetcontainingtwo continuouspredictor attributesuniformlydistributedininterval[3,3],withtheoutput deﬁnedas

y=3∗sin(x1)∗sin(x2);

usingWEKAsoftware[19] ontwo additionaltreeinducers,and includedtheresultsforourmGMTsystem:

• Hingealgorithm [23]thatisbasedononhinginghyperplanes identiﬁedbyafuzzyclusteringalgorithm;

• FRT–fuzzyregressiontree;

• FMID–fuzzymodelidentiﬁcation;

• CART–state-of-the-artunivariateregressiontreeproposedby Breimanetal.[7];

• REPTree(RT)–populartop-downinducerthatbuildsaunivariate regressiontreeusingvarianceandprunesitusingreduced-error pruning(withbackﬁtting);

• M5–state-of-the-artunivariatemodeltreeinducerproposedby Quinlan[46];

• mGMT–proposedglobaltreeinducerwithmixedrepresentation.

Theperformanceofthemodelsismeasuredbythe(RMSE),a wellknownregressionperformanceestimator.Testingwasper- formedwith10-foldcross-validation,and50runswereperformed forthetested(bytheauthors)algorithms.Wehavealsoincluded theinformationaboutthealgorithms’standarddeviation(unfortunately[23],donotincludethisinformation).Theresultsshownin Table4indicatethatthemGMTsolutioncansuccessfullycompete withpopulardecisiontreeinducers.

Asthemeanvalueisnotpresentedintheresearch[23],wehave performedFriedmantests(signiﬁcancelevelequalto0.05)using RMSEerrorvaluesontwogroups:

• mGMTvsHinge,FRT,FMIDandCART;

• mGMTvsuGRT,uGMT,oGRT,oGMT,RTandM5.

Table4

ComparisonofRMSEresultsofdifferentalgorithms.Algorithmswith*weretestedin[23]andtheirresultsarerecalled.ResultsformGMTalsoincludethestandarddeviation ofRMSEandthenumberofleavesinthetree.ThesmallestRMSEandsizeresultsforeachdatasetarebolded.

Algorithm Metric Fried 3DSin Abalone Kinman

Hinge* RMSE 0.92 0.18 4.1 0.16

Leaves 8 11 8 6

CART* RMSE 2.12 0.17 2.87 0.23

Leaves 495.6 323.1 664.8 453.9

FMID* RMSE 2.41 0.31 2.19 0.20

Leaves 12 12 12 12

FRT* RMSE 0.70 0.18 2.19 0.15

Leaves 15 12 4 20

RT RMSE 2.25±0.10 0.6±0.01 2.33±0.13 0.19±0.01

Leaves 445.7±37.6 724.2±30.1 168.8±33.7 720.8±78.1

M5 RMSE 1.81±0.09 0.23±0.01 2.12±0.14 0.16±0.01

Leaves 52.5±13.5 197.3±11.8 8.59±3.2 109.7±18.0

mGMT RMSE 0.67±0.01 0.15±0.003 2.13±0.08 0.14±0.001

Leaves 14.9±2.2 53.6±8.9 2.1±0.7 6.4±1.3

uGRT RMSE 3.66±0.09 0.53±0.04 2.55±0.03 0.21±0.007

Leaves 11.5±0.8 40.0±0.54 4.4±0.34 11.4±1.2

uGMT RMSE 0.66±0.01 0.15±0.003 2.19±0.001 0.16±0.002

Leaves 16.4±0.43 56.3±1.9 2.1±0.03 8.6±0.6

oGRT RMSE 3.41±0.05 0.62±0.008 2.50±0.10 0.19±0.01

Leaves 5.7±0.03 22.5±1.3 3.4±0.05 6.6±0.2

oGMT RMSE 1.13±0.02 0.15±0.01 2.21±0.05 0.17±0.001

Leaves 6.6±0.4 44.7±1.9 2.1±0.09 4.4±0.2

(14)

Fig.14.AnexampleofinducedtreeformGMTfortheKinmandataset.

For first group, the Friedman test showed significant statistical differences between algorithms (P value=0.0109, F- statistic=10.62);however,aDunn’smultiplecomparisontestdid notshowanysignificantdifferencesinranksum,whichmaybe causedbyasmallsamplesize(onlyfourvaluesforfivealgorithms).

For the second group, a Friedman test also showed significant statistical differences between algorithms (P value<0.0001, F- statistic=194.6).AcorrespondingDunn’smultiplecomparisontest showedsignificantdifferencesinranksumbetweenmGMTandall algorithmsexceptuGMT.ItshouldalsobenotedthatmGMTman- agedtoinducemuchsmallertrees,oftenbyanorderofmagnitude smallerthanthetestedcounterparts.Arelativelyhighernumberof leavesforthemGMTinducerforthe3DSinandFrieddatasetscan beexplainedbyhighnon-linearityinthedatasets.AsthemGMT appliesmultivariatelinear regressionfunctionsin theleaves, it requiresmoresplitstofittothenon-lineardatasetscharacteristics.

Thecostofﬁndingpossiblynewhiddenregularitiesisthetree inductiontime. Itis wellknownthattheEAsin comparisonto thegreedysolutionsareslower,andthemGMTisnoexception.

TheefficiencycomparisonbetweenmGMTandbothtestedgreedy inducersshowedthattheproposedsolutionissignificantlyslower (verified withFriedmantest, Pvalue <0.0001) than both algorithms:M5and RT.ThemGMTtreeinductiontimewassmaller tothatoftheGMTsolution[12](Table3)andtook,dependingon thedataset,fromseveralsecondstoafewminutesonaregular PCcomputer.However,theprocessofevolutionaryinductionis progressive;therefore,intermediatesolutionsfrompre-maturely abortedrunsmayalsoyieldhigh-qualityresults.Inaddition,EAs arenaturallypronetoparallelism;therefore,theefficiencyproblem canbepartiallymitigated.

InFig.14,wepresentoneofthetreesinducedbymGMTfor theKinmandataset.Forthisparticularreal-lifedataset,allinduced treescontainedobliqueandunivariatesplitsandalmostalways multivariatelinearregressionsintheleaves.Thismaysuggestthat thismixedrepresentationisthemostsuitableoneforthisparticular

datasetandmayrevealnewrelationshipsandinformationhidden inthedata.Theoutputtreeismuchsmallerand hasthesmall- estpredictionerror,especiallywhen comparedtotheresultsof state-of-the-artsolutionslikeCARTandM5.However,itshouldbe noticedthatincaseofthemixed,obliqueormodeltreesthesize ofthetreeisnotanaccuratereﬂectionofitscomplexity.Thetrees withmoreadvancedtreerepresentationareusuallysmallerwhich iswhytheM5algorithminducesmuchsmallertreesthanCART.

Therefore,even verysmalltreeinducedbythemGMTbutwith complexobliquesplitsandmodelsintheleavescanbelesscom- prehensiblethan,forexample,largerunivariateregressiontree.In anextremescenario,theproposedsolutioncanbeascomplexas treesinducedbytheoGMTsystemorassimpleasonesinduced bytheuGRTalgorithm.However,mGMTis capableofadjusting therepresentationofthenodestoautomaticallyﬁttotheanalyzed whichisnotpossibleinthecompetitivesolutionswhichhaveonly homogeneoustreerepresentation.Although,thetrade-offbetween thecomprehensibilityandpredictionperformanceinmGMTstill exits,itcanbeeasilyadjustedtotheuserpreferencesduetothe parametersintheﬁtnessfunctionofthemGMTalgorithm.

4.3. OverallpredictionperformanceofmGMT

Inthelaststepoftheexperiments,wecomparedthepredic- tionperformanceofthemGMTinducerwiththatofotherpopular systemsonmultipledatasets.TestswereperformedwithWEKA software[19]usingthecollectionofbenchmarkregressiondatasets provided byLouisTorgo[45].Fromthis packageof 30datasets (availableontheWEKApage),weselectedonlythosewithamin- imumof 1000instances,described inTable5.We decidedthat datasetswith,forexample,43instancesandtwovariablesarenot thebestforvalidation.ThedatasetshavebeenprocessedbyWEKA’s supervisedNominalToBinaryﬁlterthatconvertsnominalattributes intobinarynumericattributesandtheunsupervisedReplaceMiss- ingValuesﬁlterthatreplacesmissing valueswiththeattributes’

Table5

Datasetcharacteristics:name,numericattributesnumber(Num),nominalattributesnumber(Nom),andthenumberofinstances.

ID Name Num Nom Instances ID Name Num Nom Instances

1 2dplanes 10 0 40,768 11 elevators 18 0 8752

2 abalone 7 1 4177 12 fried 10 0 40,768

3 ailerons 40 0 13,750 13 house16H 16 0 22,784

4 bank32nh 32 0 8192 14 house8L 8 0 22,784

5 bank8FM 8 0 8192 15 kin8nm 8 0 8192

6 calhousing 8 0 20,640 16 mv 7 3 40,768

7 cpuact 21 0 8192 17 pol 48 0 15,000

8 cpusmall 12 0 8192 18 puma32H 32 0 8192

9 deltaailerons 5 0 7129 19 puma8NH 8 0 8192

10 deltaelevators 6 0 7129