vol. XX (XXXX) No. X
Sele tion of Prototypes with the E
k
P System byKarol Grudzi«ski
InstituteofPhysi s
KazimierzWielkiUniversity
Bydgosz z,Poland.
e-mail: grudzinski.kgmail. om
Abstra t: A ompletelynewsystemforasele tionofreferen e
instan es,whi h is alledE
k
P(Exa tlyk
Prototypes)hasbeenin- trodu ed by us re ently. In this paper we study a suitability oftheE
k
Pmethod fortrainingdataredu tiononseventeendatasets.As theunderlying lassier the well known IB1 system (1-Nearest
Neighbor lassier) has been hosen. We ompare generalization
ability of our method to performan e of IB1 trained onthe entire
trainingdata and performan e ofLVQforwhi h thesamenumber
of odebooks has been hosen asthe number of prototypes whi h
has been sele ted by the E
k
P system. The results indi ate, thatevenwithonlyafewprototypeswhi hhavebeen hosenbytheE
k
Pmethod,onnearlyallseventeendatasets statisti allyindistinguish-
ableresultsfrom these attained with IB1have been obtained. On
many datasets generalization ability of the E
k
P system has beenlargerthantheoneattainedwithLVQ.
1. Introdu tion
Dataminingis ommonlyemployedinmanydomains. A ase-basedwayofdata
explanationisverypopularamongresear hers. Su hanapproa htoknowledge
dis overyandunderstanding isparti ularlyoftenemployedin medi ine,where
a medi al do tor makes a diagnosis by referring to other similar ases in a
databaseof patients.
Interestinginstan eve tors,knownasreferen e ases, anbeeithersele ted
from training data or an be generated out of a training set. In the latter
aseinstan es' features havein generaldierentvaluesthan theones that are
storedin theoriginaltrainingset. Both te hniques(i.e. instan e sele tionand
prototypegeneration)oftenleadtoasigni anttrainingsetsize redu tion.
Thispaper on ernstherstabovementionedproblem, i.e. `instan e sele -
tion', `trainingdata ompression,redu tionorpruning'. Theideabehind this
ma hinelearningparadigmisthatonlyasmallfra tionofausuallymu hlarger,
M.,Mi halski,R.,2000;MartinezT.,WilsonD.,1997,2000;Gro howskiM.,
2003;Gro howskiM.,JankowskiN.,2004-1,-;Du h. W.,Grudzinski. K,2000;
GrudzinskiK.,2004,2008).
Prototypesele tionisanextremelyimportantproblemwhi hhasbeenfre-
quentlystudiedbyma hinelearningandpatternre ognitionresear hers. Sele -
tionofreferen einstan es ansigni antlyspeedup lassi ationand analysis
of data later and usually leads to better data understanding and may lower
sensitivitytonoiseofsome lassiers. Strongtrainingsetredu tionmaysome-
timesresultinstatisti allysigni antdegradationofthe lassi ationa ura y
attained on unseen samples, howeveras many experiments illustrate often it
is the other way around, i.e. data pruning improves generalization ability of
lassiers. Samples sele tedwith the E
k
P system an be used forexample tobuildprototype-basedrules,whi hhadbeenintrodu edbyDu het. al. (Du h
W.,GrudzinskiK., 2001;Bla hnikM.,Du hW. ,2004)andwhi hareavery
interestingalternativeto lassi logi alrules.
The a ronym E
k
P is short for Exa tly-k
-Prototypes. We want to stress here that our new system diers ompletely from our earlier model, PM-M(GrudzinskiK.,2004).
2. Methodologies for Referen e Instan es Sele tion
Beforewepro eedto presentationoftheE
k
Psystemand theresultsobtainedwith this method, avery on isereview of someofthe known te hniques em-
ployed in sele tionof thereferen e asesis provided. This presentation draws
heavilyontheex ellentworkofGro howski ontainedinhisM.S . thesis(Gro-
howskiM.,2003).
2.1. Problem Formulation
Theproblemof sele tionof thereferen einstan es anbedened asapro ess
of nding thesmallestset
S
of ases representingthe samepopulationasthe originaltrainingsetT
andleadingto orre t lassi ationofthesamplesfrom notonlyT
butmoreimportantlyoftheunseen aseswithminimaldegradation ofthegeneralizationabilityoftheunderlying lassier. Inotherwords,referen esele tionisamethodforsele tionorgenerationofthemostinformativesamples
from
T
and reje tion ofthe noisy asesorofthese instan es that degradethe generalization when the original training setT
is used for learning. Thus, restri ting ourselves to prototype sele tion by whi h we understand sele tionof referen e asesin whi h
S
is asubset ofT
, the problem is to nd optimal subsetS
ofallpossible2 n − 1
subsetswith respe tto generalizationability of theunderlying lassier. Byn
, thenumberof samplesof theoriginaltrainingset
T
isdenoted.Thereferen eve torssele tionalgorithms anbedividedintoafewnumber
2.1.1. Noise Filters
This ategory of methods, known also as editing rules, is based on reje ting
noisy ases oroutlayiersfrom
T
. Therateof data pruningisusually low and thesete hniquesareusuallyemployedastherstdataprepro essingstepwhi histhenfollowedbyothermethods. ENN,RENN(WilsonD. , 1972),All
k
-NN(Tomek I.,1976)and ENRBF(JankowskiN. , 2000)are thekeyexamplesof
thealgorithmsthatbelongtothisgroup.
2.1.2. Data Condensation Algorithms
Thisgroupofmethodsisalsoknownasdatapruningordata ompressionte h-
niques. Themain ideabehindthis approa his to a hievethehighest possible
trainingdataredu tionwithoutorwithminimumsa ri ationofgeneralization
oftheemployedunderlying lassiers. CNN(HartP.,1968),RNN(GatesG.,
1972),GA,RNGE(Bhatta haryaB.K.,PoulsenR.S.,ToussaintG.T.,1981),
ICF(BrightonH.,MellishC.,2002)andDROP15(MartinezT.,WilsonD. ,
2000)arethemainsystemsthat fellintothis ategory.
2.1.3. Prototype Methods
Thefamilyofreferen esele tionalgorithmsthatareaimedatndingextremely
low number of highly informative superve tors, arrying parti ularly large
amountof information and apable ofrepresenting largenumberof ases, are
known asprototypes methods. However the dieren e between data onden-
sation algorithmsand prototypemethods is verysubtle, in our understanding
prototypesele tionandgenerationalgorithmspushtheredu tionofthetraining
datato theextremetakingsometimestheriskofslightlylargerdegradationof
generalizationoftheunderlying lassiers. Thus,howeverbothgroupsofmeth-
ods try to arriveat the smallestset
S
, the stress in data pruning te hniques isput ongeneralization,whilstin the aseof prototypealgorithms itisontheextremelylowamountofsamplesthataresele ted. Itshouldnotbesurprising,
that some of the algorithms, parti ularly these in whi h one has the ontrol
overtheamountofthesamplessele ted,maybetreatedeitherasdatapruning
methodsorasprototypesele tionmodels. LVQ(KaskiS.,KohonenT.,OjaM.
,2003),MC1andRMHC(SkalakD.,1994),IB3(AhaD.,AlbertM.,KiblerD.
, 1991),ELH, ELGrowand Expolore(Cameron-JonesR. ,1995) andour own
models PM-M (Grudzinski K., 2004)andE
k
P (GrudzinskiK. ,2008) anbe in ludedintotheprototypesele tiongroupof methods.3. The E
k
P SystemThe E
k
P system is based ona minimization of a ost fun tion whi h returns the numberof errorsthe lassier makes. Despite ofthis, theEk
P method istraining set is onstru ted out of only the preset number of
k
instan es. Ittakesse ondsforthe E
k
P methodto perform10-fold ross-validationonmost ommonUCIdatasets. Inourimplementationweusedthewellknownsimplexmethod (Nelder J., MeadR. , 1965) forfun tion minimization whi h wehave
takenfromtheInternet(LamptonM.,2004).
The simplex must be initialized rst before a minimization pro edure is
started. The E
k
P system is very sensitive to the way in whi h the simplexis initializedand thereforewehavede idedto providetheE
k
P's initialization algorithm whi h is given below. We have found in lusion of this pseudo odeveryimportantfortherepli ationofthismethod.
Algorithm1TheE
k
P'ssimplexinitialization algorithm Require: A trainingset trainInstan esRequire: A ve tor
p
[℄ of optimization parameters (numProtoPerClass * numClasses*numAttributesdimensional)Require: A matrixsimplexto onstru tasimplex
LetnumPointsdenotethenumberofpointstobuild simplexon
for
i
=0tonumPoints-1dofor
j
=0tonumClasses*numProtoPerClass-1do fork
=0tonumAttributes-1dop
[k
+numAttributes*j
℄:=trainInstan es[i
℄[k
℄endfor
simplex[
k
℄[numAttributes℄:= ostFun tion(p[]
)endfor
endfor
Twovariantsofthe ostfun tion algorithm havebeenimplementedin our
system. Therst variant isbased onthe internal ross-validation learningon
trainingpartitionswhilst inthese ond algorithmvarianta lassieristrained
by ondu tingaplaintest(the prunedtrainingpartitionsareusedforlearning
and the test on the entire training partition is used for estimating training
a ura y). Thedetails about both variantsof the ost fun tion algorithm are
givenin thepseudo odelistingswhi h aregivenbelow.
OurimplementationoftheE
k
Pmethod isnotthesimplest oneasour odewillbe omeabasisforanextendedversionofthisalgorithm. Inordertogivea
shortdes riptionofthealgorithminthetextofthepaper,itisworthmentioning
that thearrayofoptimizationparametersis(numProtoPerClass*numClasses
* numAttributes) dimensional but the instan es stored in this ve tor are not
involved in any parameter modi ation. They are simply extra ted from the
parameterve torandareaddedtothetrainingpartitioninevery ostfun tion
evaluation. Inotherwordsthetrainingpartitionsarebuiltbyextra tingsamples
from a parameter ve tor whi h always ontains numProtoPerClass examples
from every lass o urringin aproblem domain. Inasimplerimplementation
Algorithm 2TheE
k
P-1 ostfun tion algorithm (learningviainternal ross-validation)
Require: A trainingset trainInstan es
Require: A ve tor
p
[℄ of optimization parameters (numProtoPerClass * numClasses*numAttributesdimensional)for
k
=1tonumCrossValidationLearningFoldsdo Createtheemptytrainingset vTrainBuild the
k
-thtest partition vTestfor
i
=0tonumClasses*numProtoPerClass-1do forj
=0tonumAttributes-1doAddtheprototypestored in
p
[℄starting fromp
[j +numAttributes*
i
℄ and endinginp
[numAttributes - 1+ numAttributes*i
℄ tovTrain
endfor
endfor
Build (train)the lassieron vTrainandtestiton vTest
endfor
Rememberthe optimal
p
[℄ valueand the asso iated with it lowest value ofnumClassi ationErrors
return numClassi ationErrors
Algorithm 3 The E
k
P-2 ost fun tion algorithm (learning via test on theentiretrainingpartitiontakingprunedtrainingpartitionforbuilding(training)
a lassier)
Require: A trainingset trainInstan es
Require: A ve tor
p
[℄ of optimization parameters (numProtoPerClass * numClasses*numAttributesdimensional)Createtheemptytrainingset tmpTrain
for
i
=0tonumClasses*numProtoPerClass-1do forj
=0tonumAttributes-1doAdd the prototype stored in
p
[℄ starting fromp
[j + numAttributes*
i
℄ and ending inp
[numAttributes - 1 + numAttributes *i
℄ totmpTrain
endfor
endfor
Build (train)the lassier ontmpTrainandtestitontrainInstan es
Rememberthe optimal
p
[℄ valueand the asso iated with it lowest value ofnumClassi ationErrors
return numClassi ationErrors
Table1. Datasetsusedin ourexperiments
numProtoPerClass * numClasses ve tors themselves in the parameter array.
Note that numAttributes denotes the total number of attributes in adataset
in ludingthe lassattribute.
4. Numeri al Experiments
In order to verify suitability of the E
k
P system for data analysis the lassi-ation experiments on seventeen real-world problems (mainly taken from the
well-known UCI repositoryof ma hine-learning databases(Mertz C., Murphy
P.)) have been performed. The information about the datasets used an be
foundin Table1. TheE
k
P system anbebasedonanarbitrary lassier, i.e.it anbeaneural-network,support-ve torma hine orade ision-treemethod,
et . InourexperimentstheIB1(AhaD.,AlbertM., KiblerD. ,1991)system
hasbeenused bothastheunderlying lassierfor theE
k
P systemandasthereferen emethod. Thereasonforsele tingtheIB1systemis thatthismethod
requires verysmall training datasets whi h may onsist of just afew samples
in order to make lassi ationpossible. Other lassiers, in luding IB
k
(AhaD.,AlbertM.,KiblerD., 1991)requireslightlylargertrainingsetsinorder to
operate. Ouraimwhenwewere ondu tingtheexperimentsforthispaperwas
toshowthateventhe al ulationswiththeextremelylownumberofprototypes
sele ted may lead to attaining ex ellent results on unseen samples. The well
knownLVQmethod(Hyninen,Kangas,Kohonen,Laaksonnen,Torkolla,1996;
KohonenT., 2001; KaskiS.,KohonenT., OjaM., 2003),whi his howevera
prototype-generationsystem,hasalsobeentakenasthereferen emodelinour
experiments. These ondreasonfor hoosingtheIB1 lassierastheunderlying
methodfortheE
k
Psystemisthefa tthattheLVQmethodusesthek
-NearestGeneralizationabilityof the E
k
P systemwith only one,twoand three in-stan esper lasssele tedfromatrainingsethasbeen omparedtothe lassi-
ationperforman eofLVQforwhi h thesamenumberof odebookshasbeen
used. Additionally,theresultsobtainedwiththeIB1(1-NearestNeighbor)sys-
tem whi h has been trained on the entire ross-validation trainingpartitions
(i.e. alltrainingsamplesfromeverylearningfoldhavebeenused)areprovided.
Ten-fold stratied ross-validation test has been performed for all seven-
teen domains. In the experiments ondu ted with the E
k
P system, in ea hross-validation fold, the trainingpartition has been pruned so that only the
prototype asesremained,theE
k
P'sunderlying lassierhasbeentrainedandit'sgeneralizationabilityhasbeenestimatedonthe ross-validationtestparti-
tion. Afterthe ompletion ofthe al ulationonalltenfolds thetesthasbeen
repeatedtentimesandtheaverage lassi ationa ura yanditsstandardde-
viationwhi hweretakenovertheallavailablehundredpartialresultshavebeen
reported.
Thesingle orre tedre-sampledT-Test(FrankE.WittenI.,,2000;Dobosz
K.,2006)hasbeenusedto al ulatestatisti alsigni an eoftheresults(with
thefa torof0.05)inordertohelpmakingthede isionwhethertheE
k
Psystemperformedbetter,thesameorworsethanthereferen emodels.
TheLVQWekaimplementationoftheLVQmethodthathasbeenemployed
inour al ulationswaswrittenbyJasonBrownlee(BrownleeJ.,2004). Finally,
whatremainstobementionedis,thattheE
k
Psystemhasbeenwrittenbytheauthor intheJavaprogramminglanguageasapluginto thewellknownWeka
ma hine learningworkben h(FrankE.WittenI.,, 2000).
4.1. Experiment1: Generalization Ability E
k
Pvs. IB1Intherstexperimentoursystemunderstudyhasbeen omparedto theper-
forman eof IB1 onall seventeen domains. Theresults of thestatisti al tests
againstthe majority lassier, bothof IB1and E
k
P, havenotbeen ontainedin ourpaper. Thebaserateresultshowever,whi h arethevaluesobtainedby
themajority lassier 1
onalltested datasets arelisted inTable1. Itis worth
mentioningthatIB1appearedtooutperformthemajority lassieronthirteen
domains. On appendi its,breast- an er, german- reditand hepatitisdatasets
theresultshavebeenstatisti allyinsigni ant.
The E
k
P system has been used mainly with the samedefault settings forallseventeenproblemsbe ausethe al ulationshavebeenperformedinabat h
modewhi hmadeperformingnumeri alexperimentsand olle tingtheresults
forthepapermu h easier. Thesimplex ostfun tion toleran ehasbeenset to
1E-16andthemaximumnumberof ostfun tionevaluationshasbeenrestri ted
to 300 alls ex ludinga ertainnumberoftargetfun tion evaluationsrequired
to initializethesimplex. Thislattervalueistheparameterwhi his alled the
1
Themajority lassierinthe Weka systemwhi hhad been usedinourexperimentsis
numberofsimplexpointsonwhi h asimplex isspanned. Thus,themaximum
numberofthe ostfun tionevaluationsvaluehastobein reasedbythenumber
ofsimplexpointsinordertoattainthetotalnumberoftargetfun tion alls. For
allexperimentsthathavebeen ondu tedin ourpaperwehavesetthenumber
ofsimplexpointsto fty. Theupperlimitationonthevalueof thisparameter
is the number of samples in the training partition. Therefore, be ause the
smallestproblemoutofthestudied seventeen domains onsistsofhardly sixty
samples,thesele tedbyusvalueforthisparameterseemstobeagood hoi e.
Themaximumnumberof ost alls settingof 300wastaken asthedefault for
the datasets of the size of a oupleof hundred ases and this hoi e is based
onour earlierexperien ewith similarminimization-based learningsystemswe
had been working on. What on ernstheE
k
P's form of learningused fortheExperiment 1, both the rst variant of the ost fun tion algorithm involving
leave-one-out ross-validation learningas well as the se ond variant hasbeen
employed. TheIB1 lassierhasbeen hosenastheE
k
P's lassi ationengine.Tables2and3summarizetheresultsof theExperiment1. Itiseasyto noti e
that generalization abilityof the E
k
P system trainedwith the rst algorithmvariantdepends stronglyon thenumberof prototypessele ted. Choosing one
prototype per lass to be sele ted by E
k
P-1 statisti ally degraded the results withrespe ttoones obtainedwiththeIB1systemonlyonthreeoutoftheallseventeendomains. Thisistheex ellentresult. Whentwoprototypesper lass
havebeensele ted, thenumberof timestraining dataredu tiondegraded the
resultsdroppedtoonlytwo. Withthreeprototypesper lass hosentheresults
have been statisti ally insigni ant from these attained with IB1 on sixteen
problems. Therst variantof theE
k
Palgorithm that hasbeentaken forourexperimentswastrainedwithleaveoneout rossvalidation. Theinuen e of
thevalueofthe rossvalidationlearningfoldonthegeneralizationhasnotbeen
yetfullyinvestigated. Leaveone-out rossvalidationseemstoleadtoobtaining
verystable models and the best generalization at the expense of signi antly
lenghtenningthe al ulationtime. In aseofthese ondalgorithmversion(E
k
P-2)statisti allysigni antdegradationofthegeneralizationresultswithrespe t
to onesattained withtheIB1system ouldhavebeennotedonthreedatasets
independentlyonthenumberofprototypesper lass hosen.
4.2. Experiment2: Generalization Ability LVQ vs. IB1 and LVQ
vs. E
k
PForthis experiment, LVQversion1 with 'random trainingdata proportional'
as well as 'simple
k
-means' initialization, learning rate of 0.3, total training iterationsof1000, linearde aylearningfun tion and disabledvotinghasbeenused. GeneralizationabilityofLVQagainstIB1hasbeentestedrst. Be ause
themethod of initializationof thepositions of odebooks seemednotto make
anystatisti allysigni antinuen eongeneralizationoftheLVQsystem,only
Table2. A omparisonof generalizationresults attainedwiththeE
k
P systemwith
one,twoandthreeprototypesper lasssele tedvs. thegeneralization obtainedwith
the IB1 lassier. E
k
P has been trainedwith the rst version of the ost fun tion
algorithmwhi hisdenotedasE
k
P-1. Fiftysimplexpointshavebeenusedtotrainthe Ek
P system. Thestatisti al degradation ofthe resultswithrespe tto thereferen e ones(i.e. theseofIB1)ismarkedwithaboldfont.Table3. A omparison of thegeneralization results attainedwith theE
k
P system withone,twoand threeprototypesper lasssele tedvs. thegeneralizationobtainedwith the IB1 lassier. E
k
P has been trained with the se ond version of the ost fun tionalgorithmwhi hisdenotedasEk
P-2. Fiftysimplexpointshavebeenusedto train theEk
P system. Thestatisti al degradationof theresults withrespe tto the referen eones(i.e. theseofIB1)ismarkedwithaboldfont.Table4. A omparisonofthegeneralizationresultsattainedwiththeLVQ-1system
(with thelinear de ay learning andthe trainingdataproportional initializationset-
tings)with2,4and6 odebookssetvs. thegeneralizationresults obtainedwiththe
IB1 lassier. Thestatisti al degradationoftheresultswithrespe ttothereferen e
ones(i.e. theseofIB1)ismarkedbyusingaboldfont.
the'randomtrainingdataproportional'initialization.
As it anbe seenfrom Table 4, the LVQ systemperformed rather poorly
and on seventeen problems with two odebooks set twelve times statisti ally
signi ant degradationof the results with respe t to these attained with the
IB1 lassierhas beennoted. In reasingthe numberof odebooks tofour has
ledtoaminorimprovementofthegeneralizationoftheLVQsystemandonten
domainstheresultshavebeenstillworsethantheseobtainedwithIB1. Sele tion
ofsix odebookshasledtostatisti allysigni antdegradationoftheresultswith
respe ttothereferen eonesonnineproblemsoutofseventeenstudied. Inthis
experiment also no improvement over IB1's generalization ability ould have
beenobserved.
Inthe se ond experiment in this se tion thetest estimating generalization
ability of LVQ against E
k
P has been performed. This test is made only ontwo- lassproblemsto assurethatthenumberof LVQ odebooksaswellasthe
prototypessele tedbytheE
k
Psystemisthesame. Re allthat Ek
P takesthenumberofprototypesper lassasitsadaptiveparameterwhilsttheLVQsystem
requiresatotalnumberof odebooks tobespe ied. Sin eallthe al ulations
havebeenperformedinabat hmodewiththesamesettingsforall lassi ation
domains, thelistofdatasets hadtoberestri tedtotwo lassproblems. What
anbenotedbytakinga loserlookatTable5is,thattheresultsofLVQmore
stronglydependonthenumberof odebookssele tedthanitisin aseofE
k
P1. Theaverage lassi ationa ura y ofE
k
P1 takenoveralltwelvedomainsTable5. A omparisonofthegeneralizationresultsattainedwiththeLVQ-1system
withtwo,fourandsix odebooksvs.thegeneralizationobtainedwiththeE
k
P lassi-er. E
k
Phasbeentrainedwiththerstversionofthe ostfun tionalgorithmwhi his denotedasEk
P-1.FiftysimplexpointshavebeenusedtotraintheEk
Psystem. The statisti aldegradation oftheresults oftheLVQsystemwithrespe ttothereferen eonesismarkedwithaboldfont.
Table6. A omparisonofthegeneralizationresultsattainedwiththeLVQ-1system
withtwo,four andsix odebooksvs. the generalizationobtainedwiththeE
k
P las- sier. Ek
P has beentrained withthe se ond version ofthe ostfun tionalgorithm whi h is denoted as Ek
P-2. Fifty simplexpointshave been used to train the Ek
P system. Thestatisti aldegradation oftheresultsoftheLVQsystemwithrespe ttothereferen eonesismarkedwithaboldfont.
only64%. Going withthe numberof odebooks to four andsix, in reasesthe
averageLVQ'sgeneralizationabilitytoabout70%and72%respe tively. Similar
trends anbeobservedwhenLVQisput againsttheE
k
P2(seeTable6).4.3. Experiment3: Time Requirements
ThetrainingtimesoftheE
k
Psystem,whi harehoweverallstatisti allyworse thantheseofIB1(itisnotasurprise),arequiteshortandinaverageareequalto about 1s(E
k
P1) and 0.2s (Ek
P-2) for learning on a singlepartition of atypi alUCIdatasetofasizeofa oupleofhundred ases(seeTable7and8).
2
ThetrainingtimesofLVQareevenshorterthantheseobtainedwithoursystem.
As anbeseenfrom Table9,LVQhasbeatenup ompletely bothvariantsof
theE
k
Pmethodonallseventeen lassi ationproblems. Itturnedoutthatthe LVQsystem anbetrainedintimewhi hisofthreeordersofmagnitudeshorterthanthe oneobtainedbymeasuringthe E
k
P's learningtime. Fortunatelythe Ek
Ptestingtimes areshorterthanthese ofIB1bythree ordersof magnitude.Table10 ontainsthesummaryoftheresultsofthemeasurementsofthetesting
time. Itisnothardto seethat ittakesmu h lessthanaminutefortheentire
10-fold ross-validationtest that is ondu ted with oursystemto omplete on
most ommonUCIdatasets. Thisisa eptableresult. Itshouldbenotedthat
training the E
k
P method with lower-fold ross-validation than leaveoneout leadstoasigni antredu tionofthetimerequirementsforthis algorithm.5. Con lusions
Weare lu kythat wehavemanaged to reate quitea fastprototypesele tion
systemdespiteofemployingthesimplexminimization routinewhi his usually
expensive. Theinitial experiments indi ate that the method may turn out to
be ompetitiveto otherdata pruningsystems. In thepreliminary al ulations
themethod dis ussedin this paperhaveshownstatisti al insigni an eofthe
generalizationabilitywith respe t toIB1almost on all lassi ationproblems
and sometimesturned outto besuperior to theLVQsystemver. 1. However
theE
k
PtrainingtimesarelongerthattheseofIB1andofLVQbutthetestingtimes areshorterthan theones obtainedbytiming IB1. After all,oneshould
remember about the general idea laying behind the sele tion of prototypes:
on e the instan es are initially found (training sets are pruned), the tests on
unseensampleswhi hareusuallyfrequentlyperformed anbe ondu tedmu h
faster. Before the E
k
P system is not onfronted with many other prototypesele tion algorithms and before further experiments with our method are not
performed it will be hard to estimate a real value of our ontribution to the
patternre ognitioneld.
2
The al ulationshavebeenperformedonalaptopequippedwitha2.4GHzIntelCore2
Duopro essorrunning64bitUbuntuLinuxOperatingSystemunder64bitOpenJVMJava
Table7. ThetrainingtimesoftheE
k
Pmethodattainedonone rossvalidationfold inse onds. Ek
Phasbeentrainedwiththerstversionofthe ostfun tionalgorithm whi h is denoted as Ek
P-1. Fifty simplexpointshave been used to train the Ek
P system. Thestatisti al degradation of the results of the Ek
P system withtwo and three prototypesper lass sele ted with respe tto the referen e ones (i.e. theseofE
k
P1withonereferen einstan eper lass hosen)ismarkedwithaboldfont.Table 8. The training times of the E
k
P method attained on one rossvalidation foldin se onds. Ek
P has beentrained with these ond version of the ostfun tion algorithmwhi hisdenotedasEk
P-2. Fiftysimplexpointshavebeenusedtotrainthe Ek
P system. Thestatisti al degradation ofthe results of theE
k
P system withtwo
and threeprototypesper lasssele tedwithrespe ttothereferen e ones(i.e. these
ofE
k
P2withonereferen e instan eper lass hosen)ismarkedwithaboldfont.
Table9. ThetrainingtimesoftheE
k
Pmethodattainedonone rossvalidationfold inse onds. Ek
P has been trainedwith the rstand these ond version of the ost fun tionalgorithmwhi hisdenotedasEk
P1andEk
P-2respe tively.Two odebooks /prototypeshavebeen hosen. FiftysimplexpointshavebeenusedtotraintheEk
P system. Thestatisti al degradationofthe resultsoftheEk
P systemwithrespe tto thereferen eones(i.e. theseofLVQ)ismarkedwithaboldfont.Table10. ThetestingtimesoftheE
k
Pmethodattainedonone rossvalidationtest foldinse onds. Ek
P has beentrained with these ond version of the ostfun tion algorithm whi h isdenoted asEk
P-2. Fifty simplexpoints have beenused totrain theEk
Psystem. Thestatisti alimprovementoftheresultsoftheE
k
Psystemwith
respe ttothereferen eones(i.e. theseofIB1)ismarkedwithabold,itali font.
Referen es
Aha D., Kibler D.and Albert M.: Instan e-basedlearning algorithms.Ma hine
Learning,6,(1991),37-66
Bhatta haryaB.,PoulsenR.,ToussaintG.: Appli ationofproximitygraphsto
editingnearestneighborde isionrule.In:InternationalSymposiumonInformation
Theory,SantaMoni a,(1981)
BrightonH.,MellishC.:Advan esininstan esele tionforinstan ebasedlearning
algorithms.DataMiningandKnowledgeDis overy6,(2002),153-172
Brownlee J.: A java implementation of the SOM-LVQ PAK.
http://www.it.swin.edu.au/personal/jbrownlee/
Cameron-Jones R.: Instan e sele tion by en oding length heuristi with random
mutationhill limbing.In: Pro eedings oftheEighthAustralian JointConferen e
onArti ialIntelligen e,(1995),99-106
DoboszK.: Statisti alSigni an eTestsinEstimationoftheResultsObtainedwith
VariousSystemsthatLearn.M.S .thesis,Ni holausCoperni usUniversity,Toru«,
Poland,(2006)(InPolish)
Du hW.,Bla hnik M.: Fuzzyrule-basedsystemsderivedfromsimilaritytoproto-
types.Le tureNotesinComputerS ien e,Vol.3316, (2004),912-917
Du hW.,GrudzinskiK.: Prototypebasedrules-newwaytounderstandthedata.
IEEEInternationalJointConferen eonNeuralNetworks,WashingtonD.C,(2001),
1858-1863
Gates G.: The redu ed nearest neighbor rule. IEEE Transa tions on Information
Theory18,(1972),665-669
Gro howskiM.: Sele tingReferen eVe torsinSele tedMethodsforClassi ation.
M.S .thesis,Ni holausCoperni usUniversity,DepartmentofAppliedInformati s,
Toru«,Poland,(2003)(InPolish)
Gro howski M., Jankowski N.: Comparison ofInstan e Sele tionAlgorithmsII:
ResultsandComments.Arti ialIntelligen eandSoftComputingICAISC2004,in
Le tureNotesinArti ialIntelligen e(LNAI3070),580-585.
GrudzinskiK.,Du hW.: SBL-PM:ASimpleAlgorithmfor Sele tionofReferen e
Instan es for Similarity-Based Methods. Intelligent Information Systems, Bystra,
Poland,2000, inAdvan esinSoftComputing,Physi a-Verlag,(2000),99-108
GrudzinskiK.: SBL-PM-M:ASystemforPartialMemoryLearning.Arti ialIntel-
ligen eandSoftComputingICAISC2004,inLe tureNotesinArti ialIntelligen e
(LNAI3070),586-591
GrudzinskiK.: E
k
P:Afastminimizationbasedprototypesele tionalgorithm.Pro- eedingsoftheInternationalIIS'08Conferen e,Zakopane,Poland,2008.In: Chal-lengingProblemsofS ien e,ComputerS ien e.A ademi PublishingHouseEXIT,
Hart P.: The ondensednearest neighbor rule. IEEE Transa tionsonInformation
Theory14,(1968),515-516
Hyninen,Kangas,Kohonen,Laaksonnen,Torkolla.:LVQ_PAK:TheLearning
Ve torQuantizationProgramPa kage,(1996)
JankowskiN.: Dataregularization.InRutkowski,L.,Tadeusiewi z,R.,eds.:Neural
NetworksandSoftComputing,Zakopane,Poland,(2000),209-214
Jankowski N., Gro howski M.: Comparison of Instan es Sele tion Algorithms
I:Algorithms Survey.Arti ial Intelligen e andSoft ComputingICAISC2004, in
Le tureNotesinArti ialIntelligen e(LNAI3070),598-603
Kohonen T.: Self-Organizing Maps. Thirded. Berlin Heidelberg. Springer-Verlag,
(2001). (Thomas SHuang; TeuvoKohonen, and ManfredR. S hroeder. Springer
SeriesinInformationS ien es,30).
Lampton M.: neldermead.java (http://www. ea.berkeley.edu/ mlamp-
ton/neldermead.java)
Maloof M., Mi halskiR.: Sele tingExamplesforPartial MemoryLearning.Ma-
hineLearning,41,(2000),27-52
Mertz C., Murphy P.: UCI repository of ma hine learning databases.
http://www.i s.u i.edu/pub/ma hine-learning-data-bases.
NelderJ.,MeadR.: Asimplexmethodforfun tionminimization.ComputerJour-
nal7,(1965),308-313
Oja M., Kaski S.,Kohonen T.: Bibliographyof Self-Organizing Map(SOM)Pa-
pers: 1998-2001 Addendum,NeuralComputingSurveys,3,(2003),1-156
Skalak D.: Prototype andfeaturesele tion by samplingandrandommutationhill
limbingalgorithms.In:InternationalConferen eonMa hineLearning(1994),293-
301
Tomek I.: Anexperimentwiththeeditednearestneighborrule.IEEETransa tions
onSystems,Man,andCyberneti s6,(1976),448-452
WilsonD.: Asymptoti propertiesofnearestneighborrulesusingediteddata.IEEE
Transa tionsosSystems,Man,andCyberneti s2,(1972)408-421
Wilson D., Martinez T.: Instan e Pruning Te hniques. In Fisher, D.: Ma hine
Learning: Pro eedingsof the FourteenthInternationalConferen e. Morgan Kauf-
mannPublishers, SanFran is o,CA.,(1997),404-417
WilsonD.,MartinezT.: Redu tionTe hniquesforInstan e-BasedLearningAlgo-
rithms.Ma hineLearning,38,(2000),257-286
Witten I., Frank E.: DataMining: Pra ti al Ma hine Learning Tools and Te h-
niqueswithJavaImplementations.MorganKaufmannPublishers,(2000).