On the usefulness of
one-class one-classifiers for
multi-class problems
Bartosz Krawczyk
Outline
• One-class classification • Multi-class decomposition
• One-class decomposition as a special case of one-versus-all • One-Class Clustering-based Ensemble
• Online one-class ensembles for massive multi-class data streams with concept drift
One-class vs. multi-class
Multi-class classification aims at establishing a decision boundary that results in lowest classification error for object originating from two or more classes.
One-class vs. multi-class
Multi-class classification aims at establishing a decision boundary that results in lowest classification error for object originating from two or more classes.
One-class vs. multi-class
In one-class classification (OCC) we have only data representing a single class available. This is known as learning in the absence of counterexamples.
One-class vs. multi-class
One of the most popular approaches for such a case is to establish a
One-class vs. multi-class
It is assumed that during the exploitation of such a classifier there can appear new objects from the target concept and new, unknown during the learning process, objects originating from a different distribution(s). Such negative objects are labeled as outliers.
One-class ensembles
One-class problems tend to have a complex structure of the target class.
A proper model selection of one-class classifier for a given dataset can be troublesome and time-consuming.
For complex data with underlying internal structures single-model approaches may return complex boundaries or become overtrained. Ensemble learning allows to exploit unique strengths of each model from
the pool.
Ensembles increase the quality and robustness of one-class learning systems.
One-class ensembles - challenges
One may have two different types of one-class ensembles:heterogeneous[1] and homogeneous[2].
In case of homogeneous ensembles, usually Random Subspace method is used, which requires the overproduce-and-select mode.
We need to choose classifiers, that are individually competent and
mutually complementary.
The problem lies in forming any kind of efficient competence / quality or diversity measures[3] for one-class classification task.
[1] Krawczyk B., Woźniak M., Dynamic Classifier Selection for One-Class Classification. Information Fusion
(in review)
[2] Krawczyk B., One-Class Classifier Ensemble Pruning and Weighting with Firefly Algorithm. Neurocomputing, 2015, vol.150, part B: 490–500.
[3] Krawczyk B., Woźniak M., Diversity measures for one-class classifier ensembles. Neurocomputing, 2014, vol. 126: 36–44.
Multi-class decomposition
When considering a multi-class problem we often deal with a large number of classes (10 / 20 / 100 / 500 /etc.).
This poses a significant challenge for the classification system, as it increases the complexity of the decision boundary.
According to the divide-and-conquer rule it is worthwhile to split the complex problem into a set of simplified subproblems and solve them individually.
Decomposition strategies aim at reducing a multi-class problem into a number of subproblems with lower number of classes.
Binary decomposition is the most popular one.
We delegate a local classifier to each subproblem, thus creating a combined classifier.
Multi-class decomposition
One-versus-one (OVO) is a strategy in which we consider all possible pairs of classes and build a classifier for each of them.
Multi-class decomposition
One-versus-all (OVA) is a strategy in which we use one class as a positive one and aggregate all remaining ones into a negative class.
Multi-class decomposition
OVO advantages: set of simple binary problems, that can be solved by local classifiers.
OVO drawbacks: large number of base classifiers to be trained (for M-class problem we need to train M(M-1)/2 base M-classifiers), complex aggregation of local decisions.
OVA advantages: small number of base classifiers (for for M-class problem we need to train M base classifiers), classifiers use all of samples for training.
OVA drawbacks: using an aggregation of classes as negative one introduces artificial imbalance (e.g., for 10 classes we will get imbalance ratio 1:9).
Multi-class decomposition
The question is how to reconstruct the input multi-class problem from a set of local binary outputs.
One needs to use a dedicated classifier combination methods.
OVA decomposition usually relies on maximum function and winner-takes-all approach.
OVO has a series of methods dedicated, like pairwise coupling or
decision directed acyclic graphs.
One may also use a more compound combiner like Error-Correcting Output Codes or Decision Templates.
Decomposition with one-class
Is it worthwhile to use one-class classifiers for cases where we have an access to counterexamples?
One may doubt the idea of using OCC for decomposing multi-class datasets. OCC uses only information about the target class, therefore, we discard available useful information.
OCC methods have several desirable properties, that may aid the process of multi-class decomposition.
However, we emphasize that OCC is by no means a superior method to binary classifiers.
For standard datasets, binary classifiers will perform better, due to their access to counterexamples.
The applicability of OCC lies in complex data, where standard binary classifiers tend to fail.
Decomposition with one-class
To understand the possible advantages of OCC, let us first take a look on the differences between a binary and an one-class classifier.
While OCC uses less information about the problem being considered, its properties allows to deal with difficulties embedded in the nature of the data: imbalance, class noise or inner outliers, to name a few.
Decomposition with one-class
Using OCC for decomposing a multi-class dataset is very intuitive - each class is considered as independent and a different one-class model is delegated to it. Therefore, for an M-class problem, we get M separate one-class tasks.
Decomposition with one-class
Learning using only examples from a single class may seem as a disadvantage.
In practice lifts one of the biggest problems related to OVA -imbalanced object quantity between the target class and the remaining ones[1].
Additionally, we get smaller ensembles than in case of OVO.
[1] Krawczyk B., Filipczuk P., Cytological Image Analysis with Firefly Nuclei Detection and Hybrid One-Class
Experimental analysis
20 datasets with large number of classes (4 - 95). Base classifiers: C4.5, SVM, One-Class Parzen, One-Class Mixture of Gaussians, One-Class Support Vector Machine, Support Vector Data Description.
Combination methods used: OVO (Pairwise coupling and DDAG), OVA (MAX), ECOC and Decision Templates.
5x2 CV F-test for training / testing / pairwise comparison, Friedman ranking test and Shaffer post-hoc test for comparison over multiple datasets.
We will present exemplary results. For details please refer to [1].
[1] Krawczyk B., Woźniak M., Herrera F. On the Usefulness of One-Class Classifier Ensembles for
Experimental analysis
Results for Auslan dataset with 95 classes using Decision Templates combination.
Experimental analysis
Results of Shaffer post-hoc test over 20 datasets for comparing different combination methods for SVDD classifier. Symbol '=' stands for classifiers without significant differences, '+' for situation in which the method on the left is superior and '-' vice versa.
Recommendations
One-class classification is not an universal solution. For standard data, with no difficulties embedded in them, binary decomposition performed much better having access to all the data. However, if you have a more complex problem, OCC is a worthwhile direction.
One-class decomposition works exceptionally well with datasets having a
high number of classes.
One-class classifiers can deal with problems embedded in the nature of the data - imbalanced distribution, feature and label noise or small number of objects.
When having a dataset with moderate or low number of examples and / or risk of being affected by noise / in-class outliers, it is preferable to apply boundary methods, like OCSVM or SVDD. However, when the class objects are plentiful, density-based methods can give a better description of the data.
Designing one-class classifiers
The presented decomposition
technique relies on the quality of used one-class classifiers.
Often classes have complex
structure and present internal outliers (label noise).
These two factors may lead to creating an one-class classifier
with so-called „empty regions” –
areas within the decision
boundary, but not covered by training samples. Classifier may be deemed as incompetent in such areas.
For complex data with underlying internal structures single-model approaches
may return complex boundaries / become overtrained.
A proper model selection of one-class classifier for a given dataset can be
OCClustE
We developed a novel ensemble for forming locally specialised classifiers, named One-Class Clustering-Based Ensemble (OCClustE).
This allows us to create base one-class models that work on delegated subspaces of the original decision problem, thus increasing their local competence.
This approach is rooted in our previous experiences wth multi-class classification and our original method named Adaptive Splitting and Selection (AdaSS) for selecting classifiers to local decision in areas of their highest competence [1] [2].
[1]Jackowski K., Krawczyk B. & Woźniak M., Improved Adaptive Splitting and Selection: The Hybrid Training
Method of a Classifier Based on a Feature Space Partitioning. International Journal of Neural Systems,
2014, vol. 24, no. 3: 1430007-1 – 1430007-18.
[2] Woźniak M., Krawczyk B., Combined Classifier Based On Feature Space Partitioning. Journal of Applied Mathematics and Computer Science, 2012, vol. 22, no. 4: 855–866.
OCClustE
OCClustE [1] has two important components:
Method for decomposing the original decision problem into a set of
reduced, less complex sub-problems.
Usage of specific base one-class classifiers trained on detected areas of competence, that can increase robustness to possible internal outliers and reduce empty areas within their boundaries.
[1] Krawczyk B., Woźniak M. & Cyganek B., Weighted One-Class Classifier Ensemble Based on Fuzzy Feature
OCClustE
To detect compact subgroups within target class, we employ a kernel
fuzzy c-means clustering.
Due to the usage of a kernel, we are able to search for a more atomic and efficient representation of the target concept in different spaces. However, the fuzzy c-means clustering relies on the user-defined numer
of clusters.
To avoid time-consumig trial-and-error settings, we automatically establish the numer of clusters with the entropy criterion.
We select number of clusters that introduces the lowest entropy.
Of course, this criterion does not always return optimal solution, but is automatic and can be used as a starting point for further tuning.
OCClustE
As a base classifier for our ensemble, we have selected Weighted One-Class Suport Vector Machine (WOCSVM).
It assigns weights to each training object to control their influence on the shape of the decision boundary and to filter outliers and irrelevant objects.
WOCSVM outputs a more compact decision boundary, thus significantly reducing the size of empty areas present within.
OCClustE
In standard WOCSVM weights assigned to objects are established on the basis of the distance between the object and the hypersphere centre. This however requires additional computation and can be costly,
especially for big data.
We propose to utilize membership functions, obtained from fuzzy c-means clustering, as objects weights.
Such an approach significantly reduces the computational time and improves an individual accuracy of each base classifier in the ensemble.
OCClustE
We can use OCClustE for multi-class problems by developing M ensemble models, each for one class [1].
[1] Krawczyk B., Woźniak M. & Cyganek B., Clustering-based ensembles for one-class classification. Information Sciences, 2014, vol. 264: 182-195
Experimental analysis
20 datasets with large number of classes (4 - 95). Base classifiers: OCClustE, C4.5, SVM, One-Class Parzen, One-Class Mixture of Gaussians, One-Class Support Vector Machine, Support Vector Data Description.
Combination methods used: OVO (Pairwise coupling and DDAG), OVA (MAX), ECOC and Decision Templates.
5x2 CV F-test for training / testing / pairwise comparison, Friedman ranking test and Shaffer post-hoc test for comparison over multiple datasets.
Experimental analysis
Results for Auslan dataset with 95 classes using Decision Templates combination.
Recommendations
OCClustE is a flexible ensemble framework, that can be used with any components for both one-class and multi-class problems.
It can offer an efficient tool for decomposition with two-level classifier combination.
It can be easily run in paralel or distributed systems. Higher computational cost than single-model approach.
Mining massive data streams
One-class classification is also an interesting tool for mining data streams - both stationary and those with concept drift.
It can be used for novelty detection, non-stationary classes, or cases where access to class labels is limited.
Mining massive data streams
Recently we have proposed a novel model for one-class data stream classification: incremental WOCSVM with forgetting[1] [2].
It applies the incremental learning mechanism with embedded weight modification. For incoming objects we assign the highest possible weights, as they represent the current state of the concept.
Objects from previous iterations have slowly reduced degree of importance by modifying their weights with our proposed forgetting function:
[1] Krawczyk B., Woźniak M., One-Class Classifiers with Incremental Learning and Forgetting for Data
Streams with Concept Drift. Soft Computing. DOI: 10.1007/s00500-014-1492-5
[2] Krawczyk B., Woźniak M., Incremental Weighted One-Class Classifier for Mining Stationary Data
Mining massive data streams
We have developed a modification of this method suitable for online learning.
Mining massive data streams
We propose to use an ensemble of these classifiers for online mining of massive evolving data streams.
Experimental analysis
12 data streams. Reference online ensemble classifiers: Online Bagging with an ADWIN change detector (BAG), Leveraging Bagging (LEV), the Dynamic Weighted Majority (DWM), and the Adaptive Classifier.
Experimental analysis
Obtained accuracies:Experimental analysis
Remaining performance measures:Experimental analysis
Conclusions
One-class classifiers are an attractive tool not only for learning in the absence of counterexamples, but also for decomposing multi-class datasets.
They are very useful for cases with a high number of classes and
difficulties embedded in the nature of data.
One should remember, that one-class classifiers are not an universal solution for multi-class problems. However, in specific scenarios they may return excellent performance.
In future, we plan to extend our works on multi-class decomposition by using dynamic classifier selection according to their competence, applying novel competence measures based on empty regions and adapt our methods for mining multi-class imbalanced data.
The 16th International Conference on Intelligent Data Engineering and Automated Learning IDEAL 2015, October 14-16, 2015, Wroclaw, Poland
Submission deadline: 15.05.2015 Website: ideal2015.pwr.edu.pl
The 1st IEEE International Workshop on Classification Problems Embedded in the
Nature of Big Data IEEE CPBD 2015, August 20-22, 2015, Helsinki, Finland
Submission deadline: 20.05.2015
Website: research.comnet.aalto.fi/BDSE2015/cpbd2015/
General chair: Prof. Michał Woźniak
PC chair: Dr. Robert Burduk
Publicity chair: Bartosz Krawczyk
Co-chair:
The 2nd International Workshop on Machine Learning in Life Sciences, September
11, 2015, Porto, Portugal, within ECML-PKDD2015.
Submission deadline: 22.06.2015 Machine learning competition:
Website: http://competition.gmum.net/
Co-chair: Bartosz Krawczyk
Co-chair: