• Nie Znaleziono Wyników

Naive Bayes Classifier

N/A
N/A
Protected

Academic year: 2021

Share "Naive Bayes Classifier"

Copied!
14
0
0

Pełen tekst

(1)

Data mining

Piotr Paszek

Classification

Naive Bayes Classifier

(2)

Bayes Classification Methods

What are Bayesian classifiers?

Bayesian classifiers are statistical classifiers, based on Bayes’

theorem.

They can predict class membership probabilities such as the probability that a given tuple belongs to a particular class.

Bayesian Classifiers

Naive Bayesian Classifiers

Assume independency between the e↵ect of a given attribute on a given class and the other values of other attributes

Bayesian Belief Networks Graphical models

Allow the representation of dependencies among subsets of attributes

(Piotr Paszek) Data mining Bayesian Classifier 3 / 15

(3)

Bayes’ Theorem

Bayes’ Theorem

P (H|X) = P (X|H) · P (H) P (X)

Let X be a data sample (to classify): class label is unknown Let H be a hypothesis that X belongs to class C

Classification is to determine P (H|X), (i.e., “a posteriori”

probability): the probability that the hypothesis holds given the observed data sample X

P (H) (“a priori” probability): the initial probability P (X): probability that sample data is observed

P (X|H) (conditional probability): the probability of observing the sample X, given that the hypothesis holds

(4)

Classification using the Bayes Theorem

Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n attribute vector X = (x1, x2, . . . , xn).

Suppose there are m classes C1, C2, . . . , Cm. Classification is to derive the maximum posteriori, i.e., the maximal P (Ci|X) for i = 1, 2, . . . , m.

From Bayes’ theorem P (Ci|X) = P (X|CP (X)i)·P (Ci).

Since P (X) is constant for all classes, onlyP (X|Ci)· P (Ci) needs to be maximized (and computed).

The classifier predicts that the class label of tuple X is the class Ci if and only if

P (X|Ci)· P (Ci) > P (X|Cj)· P (Cj) for 1 j  m, j 6= i.

(Piotr Paszek) Data mining Bayesian Classifier 5 / 15

(5)

Naive Bayes Classifier

Assumption of class-conditional independence

Attributes are conditionally independent

(i.e., no dependence relation between attributes) P (X|Ci) = Qn

j=1P (xj|Ci).

This greatly reduces the computation cost:

Only counts the class distribution

If Ak is categorical, P (xk|Ci) is the # of tuples in Ci having value xk for Ak divided by # of tuples of Ci in D.

If Ak is continous-valued, P (xk|Ci) is usually computed based on Gaussian distribution with a mean µ and standard deviation

g(x, µ, ) = 1

p2⇡e (x µ)22 2 .

(6)

Naive Bayes Classifier

Maximize P (X |C

i

) · P (C

i

)

To maximize P (X|Ci)P (Ci),

– We need to know (compute) class prior probabilities P (Ci) If the probabilities are not known, assume that

P (C1) = P (C2) =· · · = P (Cm)) maximize P (X|Ci) Class probabilities can be estimated by P (Ci) = |C|D|i,D|

– Assume Class Conditional Independence to reduce computational cost of P (X|Ci)

given X ={x1, . . . , xn}: P (X|Ci) =Qn

j=1P (xj|Ci) The probabilities P (x1|Ci), . . . , P (xn|Ci) can be estimated from the training tuples

(Piotr Paszek) Data mining Bayesian Classifier 7 / 15

(7)

Naive Bayes Classifier

Estimating P (x

k

|C

i

)

Categorical Attributes

Recall that xk refers to the value of attribute Ak for tuple X P (xk|Ci) = |{x2Ci,D|C:Ak(x)=xk}|

i,D|

is the number of tuples of class Ci in D having the value xk for Ak, divided by the number of tuples of class Ci in D

Continuous-Valued Attributes

A continuous-valued attribute is assumed to have a Gaussian (Normal) distribution with mean µ and standard deviation P (xk|Ci) = g(xk, µCi, Ci)

Estimate µCi and Ci the mean and standard variation of the values of attribute Ak for training tuples of class Ci

(8)

Naive Bayes Classifier – Example

ID age income student status buys computer

1  30 high no single no

2  30 high no married no

3 31..40 high no single yes

4 >40 medium no single yes

5 >40 low yes single yes

6 >40 low yes married no

7 31..40 low yes married yes

8  30 medium no single no

9  30 low yes single yes

10 >40 medium yes single yes

11  30 medium yes married yes

12 31..40 medium no married yes

13 31..40 high yes single yes

14 >40 medium no married no

Han J., Kamber M., Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006

(Piotr Paszek) Data mining Bayesian Classifier 9 / 15

(9)

Naive Bayes Classifier – Example I

Tuple to classify is:

X = (age 30, income = medium, student = yes, status = single).

Class:

C1 (buys computer = yes), C2 (buys computer = no).

Maximize:

P (X|Ci)· P (Ci), i = 1, 2

(10)

Naive Bayes Classifier – Example II

P (C1) = 9/14 = 0.64 P (C2) = 5/14 = 0.36

P (age 30 | C1) = 2/5 = 0.4 P (age 30 | C2) = 3/5 = 0.6

P (income = medium | C1) = 4/6 = 0.67 P (income = medium | C2) = 2/6 = 0.33 P (student = yes| C1) = 6/7 = 0.86 P (student = yes| C2) = 1/7 = 0.14 P (status = single| C1) = 6/8 = 0.75 P (status = single| C2) = 2/8 = 0.25 P (X | C1) = 0.4· 0.67 · 0.86 · 0.75 = 0.173 P (X | C2) = 0.6· 0.33 · 0.14 · 0.25 = 0.007

(Piotr Paszek) Data mining Bayesian Classifier 11 / 15

(11)

Naive Bayes Classifier – Example III

P (X | C1) = 0.173 P (C1) = 9/14 = 0.643 P (X | C2) = 0.007 P (C2) = 5/14 = 0.357

P (X | C1)· P (C1) = 0.173· 0.643 =0.111 P (X | C2)· P (C2) = 0.007· 0.357 = 0.003

We choose the maximum value of

P (X|Ci)· P (Ci) i = 1, 2 So the naive Bayes classifier will classify X to class:

C1 (buys computer = yes).

(12)

Naive Bayes Classifier: Zero-Probability Problem

From the assumption of class-conditional independence follow that if some conditional probability is equal zero, then the predicted

probability will be zero.

Example

Suppose a dataset with 1000 tuples: income = low (0), income = medium (990), and income = high (10).

So: P rob(income = low) = 0,

P rob(income = low|Ci) = 0 for i = 1, 2.

and for any X such that income(X) = low P (X|Ci) =Qn

j=1P (xj|Ci) = 0 for i = 1, 2.

The classifier can’t predict (select) the class label of such tuple.

(Piotr Paszek) Data mining Bayesian Classifier 13 / 15

(13)

Avoiding the Zero-Probability Problem

Naive Bayesian prediction requires each conditional probability be non-zero. Otherwise, the predicted probability will be zero Solution:

Use Laplacian correction: – adding 1 to each case

Example

Suppose a dataset with 1000 tuples, income = low (0), income = medium (990), and income = high (10).

After Laplacian correction:

P rob(income = low) = 1/1003

P rob(income = medium) = 991/1003 P rob(income = high) = 11/1003

The “corrected” probability estimates are close to their

“uncorrected” counterparts.

(14)

Naive Bayes Classifier: Comments

Advantages

Easy to implement

Good results obtained in most of the cases

Disadvantages

Assumption: class conditional independence, therefore loss of accuracy

Practically, dependencies exist among variables (e.g. medical data)

Dependencies among these cannot be modeled by Naive Bayes Classifier

How to deal with these dependencies?

Bayesian Belief Networks

(Piotr Paszek) Data mining Bayesian Classifier 15 / 15

Cytaty

Powiązane dokumenty

Although IEEE does not place constraints on the order of the double column equations relative to the equations of the main text (that is to say a set of double column equations can

The characters which belong to the support of the Plancherel measure of the hypergroup can be viewed as the set of all irreducible representations which are weakly contained in

The values of features in data sets are discretized by using a median-based discretization method and applying two different discretization algorithms, the Fayyad and Irani method

We show (Section 9) that the modular class considered here coincides with the characteristic class of a Lie algebroid with a representation on its top exterior bundle in the sense

Із проведених тестів за точністю класифікатора та часом класифікації, найкращі результати показав Multinomial Naive Bayes, який показав найменший

In case of the clothes industry, the brands mentioned by the respondents were divided into four groups: Group 1 (exclusive brands): Hugo Boss, Armani, Calvin Klein, Dolce

The local Bayesian classifier with reject option assigns a signal to one class from the set 1, 2 or rejects it what is denoted by zero. The signal is represented by a sequence

(i) Copy the tree diagram and add the four missing probability values on the branches that refer to playing with a stick.. During a trip to the park, one of the dogs is chosen