• Nie Znaleziono Wyników

(Core and reduct relationship) The core is included in each reduct

Hence, the core CORE(Q) of the set of attributes Q in the information system S = <U, Q, v, f > defines the knowledge KCORE(Q) = (CORE(Q))*, which cannot be removed in any reduction process, minimizing the size of the original knowledge KQ = Q*, without loss of classification abilities. Therefore, the knowledge KCORE(Q) is in a sense the most relevant part of the knowledge KQ, and the core itself is the most relevant subset of attributes.

Nevertheless, it is possible that the core CORE(Q) is an empty set and then there is lack of such essential part of knowledge in the information system S = <U, Q, v, f >.

The following relation is true between the notion of core and the reduct of the set of attributes (see Pawlak 1995a)

  

 

C RED C

C C

CORE Q

C

'

'

: , (2.3:10)

Theorem 2.3:3 (Core and reduct relationship) The core is included in each reduct.

Proof

Based on (10), the core is the intersection of all reducts. Hence

 

:

 

'

'

,C REDC q COREC q C

Q

q    

 , (2.3:11)

what ends the proof.

Like CRSA, the QDRSA is able to exploit the relative counterparts of many concepts.

The description of relative independence, relative reducts and relative cores will explain this issue in more detail.

Definition 2.3:31 (Relative independence, after Pawlak 1995a, adapted to QDRSA)

In the information system S = <U, Q, v, f > the attribute set C  Q is relatively independent with respect to the set of attributes R  Q (i.e. it is R-independent) if for each proper subset P  C the following inequality is satisfied: PosP (R*)  PosC (R*), where PosP (R*) denotes positive region of the family R* with respect to set of attributes P (see Pawlak 1982, 1991). Otherwise, the set of attributes C  Q is dependent with respect to set of attributes R  Q (i.e., it is R-dependent).

Note, that for relatively independent set of attributes C, each removal of the attribute from this set results in worse quality of classification of the abstract classes generated by relation I(R) using attributes from C.

Lemma 2.3:3 (Independence)

Classical independence of the set of attributes R  Q is equivalent to the relative C-independence of the set of attributes R  Q if R = C.

Proof

When R = C, the condition of relative independence can be transformed to the condition of classical independence (I(P)  I(C)):

 

*

 

*

 

*

 

*

Pos R Pos R Pos C Pos C

C

R  PCPC , (2.3:12)

Then, the generalized notion of relative independence becomes classical notion of independence, what ends the proof.

Definition 2.3:32 (Relative reduct, after Pawlak 1995a, adapted to QDRSA)

Set C‟  C is called the relative reduct of C with respect to R (R-reduct of C) if C‟ is R-independent subset of C and PosC (R*) = PosC ‟ (R*), or, what is equivalent, if C‟ is the biggest (in the sense of set inclusion) R-independent subset of C.

Theorem 2.3:4 (Generality of the relative reduct)

The relative reduct is generalized version of the classical reduct.

Proof

If R = C then R-reduct of the set C becomes C-reduct of the set C. From Lemma 3 it follows that independence of the set C is equivalent to the relative C-independence of the same set C. Therefore, C-reduct of the set C becomes the reduct of the set C. Hence, classical reduct is the special case of the relative reduct, what ends the proof.

The set of attributes can have more than one relative reduct. Consider the family of all R-reducts of the set C  Q, denoted by REDR (C). Then it follows that

 

:

 

* ( *) '

,

,R QC RED C Pos ' R Pos R

Q

B   R CC

 , (2.3:13)

and

 

C C R C R

RED C

Q R Q

C   R k  k

 , , ' : ' . (2.3:14)

Definition 2.3:33 (Relative irremovability, after Pawlak 1995a, adapted to QDRSA)

In the in the information system S = <U, Q, v, f > with C  Q and R  Q, the attribute q  C is relatively redundant in C (relatively removable form C) with respect to R (R-redundant or R-removable) when PosC (R*) = PosC-{q} (R*). The attribute q  C is

relatively irremovable from C with respect to R (R-irremovable), when PosC (R*)  Pos C-{q} (R*).

Lemma 2.3:4 (Irremovability)

Relative C-irremovability of the attribute q from the set C is equivalent to classical irremovability of q from C.

Proof

When R = C, then the condition of R-irremovability of attribute q from set B becomes the condition I(C)  I(C-{q}) of the classical irremovability of q from C:

 

 

*

 

*  

 

*

 

*

Pos R Pos R Pos C Pos C

C

R  CqCCqC . (2.3:15)

and the notion of R-irremovability of the attribute q from the set C becomes the classical notion of irremovability, what ends the proof.

Definition 2.3:34 (Relative core, after Pawlak 1995a, adapted to QDRSA)

The relative core CORER (C) of the set of attributes C with respect to R (R-core of the set C) is defined as the set of all R-irremovable attributes from the set of attributes C

 

C

q C:Pos

 

R* Pos  

 

R*

CORER   CCq . (2.3:16)

Theorem 2.3:5 (Generality of the relative core)

The relative core is a generalization of the classical core.

Proof

If R = C then R-core of the set C becomes C-core of the set C. Based on Lemma 4, it follows that the classical irremovability of attribute q from the set C is equivalent to relative C-irremovability of the same attribute from the set C. Hence, relative C-core of the set C becomes the core of the set C, what proves that classical core is the special case of the relative core.

Summarizing, like in CRSA, in QDRSA, relative reduct and relative core are generalizations of the reduct and the core, respectively, and they relay on relative dependence and independence of attributes. Furthermore, it follows that R-core and R-reduct are satisfying the formula (Pawlak 1985a):

  

 

C RED C R

R

C C

CORE

'

' . (2.3:17)

It is worth to notice that since the intersection of R-reducts can be an empty set, therefore, there exist possibility that the set of attributes does not have the relative core.

Presented above generalizations of the classical notions of core and the reduct are relevant in classification problems, when the information system S = <U, Q, v, f > becomes the decision table T = <U, C, D, v, f > (see Mrózek 1992a) by letting Q = C  D, i.e. by separating the conditional and decision attributes (C and D respectively). In fact, some special cases of these notions are really important, however these are special cases different that those, which reduced to the classical notions. This problem is explained below in detail.

In the analysis of the decision tables (and therefore in the classification problems such as those considered in the section 2.4 and section 4.3.3) there is used the following special case of the notions defined for the information system S = <U, Q, v, f >. Consider two sets of attributes C, D  Q such that C  D =  and C  D = Q. Then S = <U, Q, v, f > becomes the decision table T = <U, C, D, v, f > and all conclusions concerning the reduction of the size of knowledge covered in information system can be used for minimization of the number of conditional attributes C in classification problems.

More precisely, for the given decision table T = <U, C, D, v, f >, the D-core of the set of conditional attributes C, denoted as CORED (C), constitutes the most essential set of attributes from the classification point of view. It includes all these attributes which cannot be removed without reducing the determinism level C (D*) of the decision table T. On the other hand, the D-reduct of the set of conditional attributes C defines in the decision table T = <U, C, D, v, f > such set of the conditional attributes C‟REDD (C), which generates the new decision table T’ = <U, C’, D, v, f > derived from the original table T by cutting C to C‟

and such that T‟ is equivalent with T in terms of decision rules covered.

While the notions presented above are defined in the CRSA and QDRSA, after a modification of equivalence relation to tolerance relation (and therefore, after changing abstract classes to dominance cones) these notions are incorporated to the DRSA without the loss of the general meaning. However, there are also notions defined in CRSA and QDRSA which cannot be used in DRSA in their common sense. In fact, existence of such concepts which cannot be directly incorporated to DRSA inspired the author to propose the QDRSA.

Within this latter model, there is used the information about the preference order in attribute values (like in DRSA) but (contrary to DRSA) this information is incorporated in such a way, which preserves the equivalence relation, and therefore such concepts as relative value reducts, defined below, can be efficiently utilized.

It follows that in QDRSA (but not in DRSA) further simplification of the information system S = <U, Q, v, f > can be implemented by such elimination of the value of particular attribute for some elements of the universe (however without eliminating the attribute from S)

that the classification ability is not reduced. The notions used in this type of the knowledge reduction are analogues to the notions used in the reduction of redundant attributes.

Definition 2.3:35 (Irremovability for given element, after Pawlak 1995a, adapted to QDRSA) In the information system S = <U, Q, v, f > with C  Q, the value of the attribute q  C is removable for the element x  U if and only if [x]I (C) = [x]I (C-{q}). Otherwise the value of the attribute q is irremovable for x.

Definition 2.3:36 (Independence for given element, after Pawlak 1995a, adapted to QDRSA)