1
Wprowadzenie do programu RapidMiner, część 3 Michał Bereta
www.michalbereta.pl
1. W programie RapidMiner mamy do dyspozycji kilka dyskryminacyjnych modeli liniowych jako operatory:
a. LDA – Linear Discriminant Analysis b. QDA – Quadratic Linear Analysis
c. RDA – Regularized Discriminant Analysis
d. Classification by Regression (może użyd dowolny model regresyjny jako subproces) e. Perceptron
Z Dokumentacji RM:
Linear Discriminant Analysis (RapidMiner Core)
This operator performs linear discriminant analysis (LDA). This method tries to find the linear combination of features which best separate two or more classes of examples. The resulting combination is then used as a linear classifier. Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups, it may have a descriptive or a predictive objective. Quadratic Discriminant Analysis (RapidMiner Core)
This operator performs a quadratic discriminant analysis (QDA). QDA is closely related to linear discriminant analysis (LDA), where it is assumed that the measurements are normally distributed. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical. To estimate the parameters required in quadratic discrimination more computation and data is required
2
than in the case of linear discrimination. If there is not a great difference in the group covariance matrices, then the latter will perform as well as quadratic discrimination. Quadratic Discrimination is the general form of Bayesian discrimination.
Regularized Discriminant Analysis (RapidMiner Core)
The regularized discriminant analysis (RDA) is a generalization of the linear discriminant analysis (LDA) and the quadratic discreminant analysis (QDA). Both algorithms are special cases of this algorithm. If the alpha parameter is set to 1, this operator performs LDA. Similarly if the alpha parameter is set to 0, this operator performs QDA.
Classification by Regression (RapidMiner Core)
This operator builds a polynominal classification model through the given regression learner.
The Classification by Regression operator is a nested operator i.e. it has a subprocess. The subprocess must have a regression learner i.e. an operator that generates a regression model. This operator builds a classification model using the regression learner provided in its subprocess.
Here is an explanation of how a classification model is built from a regression learner. For each class i of the given ExampleSet, a regression model is trained after setting the label to +1 if the label is i and to -1 if it is not. Then the regression models are combined into a classification model. This model can be applied using the Apply Model operator. In order to determine the prediction for an unlabeled example, all regression models are applied and the class belonging to the regression model which predicts the greatest value is chosen.
Perceptron (RapidMiner Core)
This operator learns a linear classifier called Single Perceptron which finds separating hyperplane (if existent). This operator cannot handle polynominal attributes.
The perceptron is a type of artificial neural network invented in 1957 by Frank Rosenblatt. It can be seen as the simplest kind of feed-forward neural network: a linear classifier. Beside all biological analogies, the single layer perceptron is simply a linear classifier which is efficiently trained by a simple update rule: for all wrongly classified data points, the weight vector is either increased or decreased by the corresponding example values.
3 Zadanie:
Porównajmy działanie tych modeli liniowych na danych pima-indians-diabetes.csv. Użyjemy prostego „Split Validation”
4 Validation QDA:
Validation RDA:
Validation regression:
Operator „Classification by Regression” posiada jako podproces regresję liniową (operator “Linear Regression”), która będzie zastosowana do każdej klasy z osobna:
5 Validation perceptron:
Przykładowe wyniki: LDA
6 RDA
Regresja liniowa
Perceptron
Jakie będą wyniki, jeśli zmienią się parametry operatorów? Np.
7 Zadanie:
Wykonaj powyższe porównanie dla problemu klasyfikacji szkła (zmieo nazwę pliku glass.data na glass.csv przed zaimportowaniem do RM):
http://archive.ics.uci.edu/ml/machine-learning-databases/glass/
8
2. Bagging – zastosowanie próbkowania bootstrap do generowania klasyfikatora opartego na głosowaniu większościowym prostych klasyfikatorów.
Z dokumentacji RM: Bagging
Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm to improve classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree models, it can be used with any type of model.
The Bagging operator is a nested operator i.e. it has a subprocess. The subprocess must have a learner i.e. an operator that expects an ExampleSet and generates a model. This operator tries to build a better model using the learner provided in its subprocess.
Empirically, ensembles tend to yield better results when there is a significant diversity among the models. Many ensemble methods, therefore, seek to promote diversity among the models they combine. Although perhaps non-intuitive, more random algorithms (like random decision trees) can be used to produce a stronger ensemble than very deliberate algorithms (like entropy-reducing decision trees). Using a variety of strong learning algorithms, however, has been shown to be more effective than using techniques that attempt to dumb-down the models in order to promote diversity.
9
„Iterations” – liczba bazowych klasyfikatorów=iteracji generowania pseudoprób uczących. Jako podstawowy klasyfikator w metodzie bagging ustawiamy „DecisionTree”:
10 Zadanie:
a. Prównaj czy / o ile bagging poprawia działanie w stosunku do pojedynczego drzewa? Porównaj dla bazy Pima, Sonar oraz Glass. Wypróbuj różne ustawienia parametrów drzewa decyzyjnego (również w operatorze bagging), np. wyłącz prepruning.
b. Czy zwiększanie liczby podstawowych klasyfikatorów w metodzie bagging przynosi poprawę? Jeśli tak, to czy dla pewnej liczby następuje „nasycenie” tej tendencji?
c. Jak działa bagging dla klasyfikatorów liniowych z poprzedniego zadania jako podstawowych klasyfikatorów? Porównaj np. „Perceptron” z „Bagging” opartym na perceptronach.
Przykładowe wyniki dla porównania „Perceptrona” z „Bagging” + „Perceptron” dla baz Indian Pima: Dla „Perceptron”:
Dla „Bagging” + „Perceptron”: