Systemy uczace sie lab6 pca

(1)

1

Wprowadzenie do programu RapidMiner, część 5 Michał Bereta

www.michalbereta.pl

1. Przekształcenia atrybutów (ang. attribute reduction / transformation, feature extraction). Zamiast wybierad częśd atrybutów z oryginalnego zbioru atrybutów, można utworzyd nowe atrybuty na podstawie istniejących. Każdy nowy atrybut powstaje poprzez pewne przekształcenie, w ogólnym przypadku wszystkich, oryginalnych atrybutów.

Jednym z częstych podejśd jest tworzenie nowych atrybutów poprzez liniową kombinację istniejących. W ten sposób można spojrzed na przekształcenie dokonywane przez metodę LDA Fishera.

Inną, bardzo popularną metodą jest metoda PCA (ang. Principal Component Analysis). Celem jej jest znaleźd liniowe przekształcenia oryginalnych atrybutów tak by w kolejnych tak wyznaczonych kierunkach rzutowania wychwytywały jak najwięcej zmienności (wariancji) w danych. Każdy kolejny kierunek jest ortogonalny do wcześniejszych.

W metodzie PCA nie zwraca się uwagi na przynależnośd klasową danych. Mimo wszystko jest to często przynoszący poprawę krok wstępny. Jest on korzystny zwłaszcza gdy liczba atrybutów oryginalnych jest bardzo duża.

W RM możemy ustalid ile nowych atrybutów chcemy dostad. Można ich użyd zamiast atrybutów oryginalnych.

Zadanie:

Porównaj działanie operatora PCA dla różnych danych:

(2)

2 Przykładowo dla Iris:

(3)

3 Dla Glass:

Dla Pima:

Jeśli celem jest wizualizacja danych to częstym wyborem jest utworzenie 2 lub 3 nowych atrybutów przez PCA. Można jednak ustawid takie kryterium wyboru liczby nowych atrybutów, by w nowych danych zachowany był odpowiedni procent (np. 95%) wariancji obecnej w danych oryginalnych.

(4)

4

Przykładowo dla Pima otrzymamy dwa nowe atrybuty (z ośmiu oryginalnych):

A dla Glass okazuje się, że już jeden atrybut PCA (na 10 oryginalnych) wychwytuje >=95% zmienności oryginalnych danych:

Zadanie:

Jak metoda PCA poradzi sobie z całkiem nieużytecznymi (losowymi) atrybutami?

(5)

5 Nowy zestaw atrybutów:

(6)

6

Widad, że jedna klasa jest nadal dobrze oddzielona od pozostałych, sytuacja jest gorsza dla pozostałych dwóch, jednak mimo dużej ilości losowych danych, PCA nadal jest w stanie uchwycid istotne informacje. Powtórz powyższy przykład dla innych danych.

Zadanie:

Przetestuj inne dostępne operatory do przekształcania cech:

A zwłaszcza:

 Independent Component Analysis Z dokumentacji:

Independent component analysis (ICA) is a very general-purpose statistical technique in which observed random data are linearly transformed into components that are maximally independent from each other, and simultaneously have "interesting" distributions. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction. ICA is used for revealing hidden factors that underlie sets of random variables or measurements. ICA is superficially related to principal component analysis (PCA) and factor analysis. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely. This operator implements the FastICA-algorithm of A. Hyvärinen and E. Oja. The FastICA-algorithm has most of the advantages of neural algorithms: It is parallel, distributed, computationally simple, and requires little memory space.

 Generalized Hebbian Algorithm Z dokumentacji:

(7)

7

This operator is an implementation of the Generalized Hebbian Algorithm (GHA) which is an iterative method for computing principal components. The user can specify manually the required number of principal components.

 PCA (Kernel) Z dokumentacji:

This operator performs Kernel Principal Component Analysis (PCA) which is a non-linear extension of PCA.  SVD – Singular Value Decomposition

Z dokumentacji:

Singular Value Decomposition (SVD) can be used to better understand an ExampleSet by showing the number of important dimensions. It can also be used to simplify the ExampleSet by reducing the number of attributes of the ExampleSet. This reduction removes unnecessary attributes that are linearly dependent in the point of view of Linear Algebra. It is useful when you have obtained data on a number of attributes (possibly a large number of attributes), and believe that there is some redundancy in those attributes. In this case, redundancy means that some of the attributes are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy, you believe that it should be possible to reduce the observed attributes into a smaller number of components (artificial attributes) that will account for most of the variance in the observed attributes.

The Principal Component Analysis technique is a specific case of SVD. It is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated attributes into a set of values of uncorrelated attributes called principal components. The number of principal components is less than or equal to the number of original attributes. This transformation is defined in such a way that the first principal component's variance is as high as possible (accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it should be orthogonal to (uncorrelated with) the preceding components.