• Nie Znaleziono Wyników

Naive Bayes (only short notes available)

N/A
N/A
Protected

Academic year: 2021

Share "Naive Bayes (only short notes available)"

Copied!
7
0
0

Pełen tekst

(1)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Lecture notes:

Naive Bayes classier

(2)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Naive Bayes

We assume in this lecture that all the attributes are nominal (categorical).

The training set T consists of N observations, each being a n-dimensional vector of (nominal) attributes.

We treat each attribute Xi and the decision attribute Y as

random variables.

The goal is to classify a vector x = (x1,x2, ...,xn)

We apply the Bayes formula:

P(Y = y|X = x) = P(X = x|Y = y)P(Y = y)P(X = x) (interpretation: the probability that the decision attribute Y is equal to y, conditioned on the fact that the attribute vector (to be classied) is represented by the vector x)

(3)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Bayes classication rule

We classify the vector x to that class y (the value of the decision attribute), for which the Bayes probability above is maximal. Thus, we compute the above probability for all the possible classes/categories y (values of the variable Y) and select the value y giving the maximal value of the probability

P(Y = y|X = x)

Due to the fact that all the compared probabilities have the same denominator (P(X = x)), it is possible to omit it in computations.

(4)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Naive Bayes classier

The key assumption for the naive Bayes classication is the naive assumption that all the attributes are independent random variables, so that:

P(X = (x1, ...,xn)|Y = y) = P(X1=x1|Y = y)∗...∗P(Xn=xn|Y = y)

Thus, due to independence we obtain:

P(Y = y|X = (x1, ...,xn)) ∝P(X1=x1|Y = y) ∗ ... ∗ P(Xn=

xn|Y = y) ∗ P(Y = y)

where, the estimations of the probabilities can be made directly from the training set:

P(Xi=xi|Y = y) (ratio of the observations in the training set

T that have the value of the attribute Xi =xi among all the

observations that have the value of the decision attribute Y = y)

P(Y = y) (ratio of the observations in the training set that have the value of the decision attribute Y = y)

(5)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Smoothing

It may happen that in the training set T there is no observation that satises Xj =xj and Y = y for some attribute j.

In such case, the estimation of the probability

P(Xj =xj|Y = y) from the training set T would be equal to 0

and would make the whole product of probabilities being zero, independently on the values of all the other probabilities P(Xi =xi|Y = y).

To avoid this problem, the technique of smoothing can be applied. It consists of assuring that even in such case the probability will be non-zero, i.e. it will be substituted by some small, positive value. This is achieved by borrowing

(decreasing) part of value from all the other non-zero values of probabilities for this attribute.

(6)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Simple smoothing

A simple implementation of the idea of smoothing is as follows. We modify the ratio representing the probability so that we add 1 to the numerator and add the number of dierent values of this attribute to the denominator.

In this way, all the conditional probabilities for this attribute sum up to 1, and the zero estimation of the probability is avoided even if such case is not present in the training set T.

(7)

Lecture notes: Naive Bayes classier (c) Marcin Sydow

Cytaty

Powiązane dokumenty

and [PT], namely: Let D be an oriented diagram of n components, and let cr(D) denote the number of crossings in D. , b n ) be base points of D, one point for each component of D,

The parameter σ α has appeared in many papers on exponential sums but we are not aware of an upper bound of the type (1.13) ever appearing before, even for the case of

We did not use Watt’s mean-value bound (Theorem 2 of [12]) in prov- ing Lemma 6, because the hypothesis T ≥ K 4 (in our notation) limits the former’s usefulness in this problem to

Besides these the proof uses Borel–Carath´ eodory theorem and Hadamard’s three circles theorem (the application of these last two theorems is similar to that explained in [4], pp..

In this section we use the Strong Gap Principle to bound the number of large, but not very large, solutions of the Thue–Mahler equation in a fixed approximation and magnitude class

In 1842 Dirichlet proved that for any real number ξ there exist infinitely many rational numbers p/q such that |ξ−p/q| < q −2.. This problem has not been solved except in

1998: Adaptive output feedback control of currentfed induction motors with uncertain rotor resistance and load torque.. 1993: Adaptive input-output linearizing control of

In operator theory it is a generally used, fruitful method that in order to explore the structure and properties of operators belonging to a large, undetected class one relates