• Nie Znaleziono Wyników

RSDM3

N/A
N/A
Protected

Academic year: 2021

Share "RSDM3"

Copied!
18
0
0

Pełen tekst

(1)

Rough sets in Discretization

Nguyen Hung Son

This presentation was prepared on the basis of the following public materials:

1. Jiawei Han and Micheline Kamber, „Data mining, concept and techniques”http://www.cs.sfu.ca 2. Gregory Piatetsky-Shapiro, „kdnuggest”, http://www.kdnuggets.com/data_mining_course/

(2)

preprocessing 2

Outline

„ Classification of discretization methods

„ Rough set and Boolean approach to discretization

‰ Problem encoding

‰ MD-Heuristics

(3)

Classification of discretization methods

1. Local versus Global methods:

„ Local methods produce partitions that are applied to localized regions of

object space (e.g. decision tree).

„ Global methods produce a mesh over k-dimensional real space, where

each attribute value set is partitioned into intervals independent of the other attributes.

2. Static versus Dynamic Methods:

„ Static methods perform one discretization pass for each attribute and

determine the maximal number of cuts for this attribute independently of the others.

„ Dynamic methods are realized by searching through the family of all

possible cuts for all attributes simultaneously.

3. Supervised versus Unsupervised methods:

(4)

preprocessing 4

Discernibility by cuts

„ Let S = (U,A [ {d}) be a given decision table.

„ We say that a cut (a; c) on an attribute a discerns a pair of

objects (x, y) if

(a(x) − c)(a(y) − c) < 0

„ Two objects are discernible by a set of cuts C if they are

(5)

Consistent set of cuts

„ A set of cuts C is

consistent with S (or S -consistent, for short) if and only if for any pair of

objects (x, y) such that

dec(x)dec(y), the following condition holds:

IF x, y are discernible by A THEN x, y are discernible

(6)

preprocessing 6

(7)

Boolean reasoning approach to

discretization

„ Boolean variable „ Encoding function „ MD heuristics

(8)

preprocessing 8

Boolean variable

„ C – a set of candidate cuts defined either

‰ by an expert/user or

‰ by taking all generic cuts

„ We associate with each cut (a,c) ∈ C a Boolean

variable p(a,c)

(9)

Encoding function

„ For any pair of objects ui, uj ∈ U.

„ Discernibility function for two objects

(10)

preprocessing 10

(11)
(12)
(13)

MD-heuristics

„ A supervised, dynamic discretization method

„ Quality of a cut = number of pairs discerned by this

cut

„ Both local and global versions are possible

„ Global version may have high time complexity

(O(n3k) per cut)

„ Time complexity can be reduced by using additional

(14)
(15)
(16)

preprocessing 16

Improved algorithm

„ DTree - a modified decision tree structure for

discretization.

„ Possible operations:

‰ Init(S): initializes the data structure for the given decision table; ‰ Conflict(): returns the number of pairs of undiscerned objects; ‰ GetBestCut(): returns the best cut point with respect to the

discernibility measure;

‰ InsertCut(a, c): inserts the cut (a, c) and updates the data structure.

„ Init(S) requires O(nk log n)

(17)
(18)

preprocessing 18

Properties of MD-heuristics

„ Boundary cuts

„ Discretization problem in R2 still remains NP-hard „ Local MD-heuristics for discretization Î decision

tree

Cytaty

Powiązane dokumenty

Which famous sportsperson appears in “The Hangover”?. What is the name of the hospital where Dr Gregory

We study a projection method with level control for nonsmoooth convex minimization problems.. We introduce a changeable level pa- rameter to

Institute of Computer Science Polish Academy of Sciences 21 Ordona street, 01–237 Warsaw, Poland.

The quality of the corresponding partition is compared with the partitions obtained by analogously with 100 various random initializations the l2 -clustering algorithm Kogan,

(For the case q = 1, this proof was also given in [11].) In fact, it shows that certain cases of Theorem (3.1) are equivalent to Doob’s results.. We end the section by deriving the

For a differential inclusion with Lipschitz right hand side without state constraints, several papers [2, 5, 6, 9–11] yield results on the relaxation theorem and some other

Is it possible to hedge it using portfolio consisting of the risk-free instrument B, the basic instrument S and European call option with expiry date T − δ for fixed δ &gt;

Maria rolls a die with 3 red and 3 blue faces twice. They get a point for each red faces that