Application of the Dominance-based Rough Set Approach to Ranking and Similarity-based Classiﬁcation Problems

(1)

Application of the Dominance-based Rough Set

Approach to Ranking and Similarity-based

Classification Problems

Marcin Szeląg

Institute of Computing Science, Poznań University of Technology, 60-965 Poznań, Poland

19.4.2011

(2)

Outline

1 Introduction

General Motivations Basic Definitions

Multi-Criteria Ranking Problem

Multi-Attribute Similarity-Based Classification Problem

2 Methodology for Solving Multi-criteria Ranking Problems

Preference Information

Dominance-based Rough Set Approach Decision Rules

Application of Decision Rules Ranking Procedures

3 Methodology for Solving Similarity-based Classification Problems

Preference Information

Dominance-based Rough Set Approach Decision Rules

Application of Decision Rules Illustrative Example

(3)

General Motivations

Multi-criteria ranking and multi-attribute classification are important, non-trivial, and practical problems.

Main difficulty consists in aggregation of different and usually conflicting criteria/attributes; usually such aggregation is performed arbitrary, using weights or aggregation operators like sum, average or distance metrics.

Need for multi-criteria/multi-attribute modeling method that allows to include domain knowledge, can handle possible inconsistencies in data, and avoids any aggregation operators. Case-based reasoning is a natural way in which people solve problems. It is a process of solving new problems based on the solutions of similar problems from the past.

(4)

Information and Decision Table

Information tableis defined by a set of objects 𝐴.

Objects are described by set 𝐺 of criteria and/or regular attributes. If set 𝐺 is divided into two disjoint subsets of conditions 𝐶 and decisions 𝐷, then information table is called adecision table.

Criterionis an attribute with values ordered according to a scale of preference introduced (by a decision maker) as a part of domain knowledge.

We distinguish two types of criteria:

ordinal, with values expressed on ordinal scale,

(5)

Multi-Criteria Ranking Problem

Multi-criteria ranking problemis a decision problem in which a set of objects (alternatives) 𝐴 described by a set of criteria (and regular attributes) 𝐺 has to be ordered, either completely or partially.

(6)

Multi-Attribute Similarity-Based Classification Problem

There is given a set of objects 𝐴 described in terms of condition attributes from set 𝐶, and a set of decision classes 𝐷. Each decision class 𝑑𝑗 ∈ 𝐷, 𝑗 ∈ {1, . . . , ∣𝐷∣} is afuzzy setwith

membership function𝜇𝑑𝑗 : 𝐴 → [0, 1]. For a given object 𝑥 ∈ 𝐴, value 𝜇𝑑𝑗(𝑥) specifies a gradedmembership of object 𝑥 to decision class 𝑑𝑗.

There is given asimilarity functionfor each condition attribute 𝑐𝑖∈ 𝐶, 𝑖 ∈ {1, . . . , ∣𝐶∣}, and some or all objects from set 𝐴 are

marked asreference objects (“cases”).

The task is to build a similarity-based model that is capable of

fuzzy classification, i.e., which can assign appropriate membership value to each decision class 𝑑𝑗 ∈ 𝐷, for new (test) objects.

(7)

Motivations for Application of DRSA

Both problems can be effectively solved using Dominance-based Rough Set Approach (DRSA), introduced by Greco, Matarazzo and Słowiński in 1996, which:

can handleinconsistences in data (preprocessing), resulting, e.g., from imprecise of incomplete information,

takes into accountdomain knowledge:

domains of attributes, i.e., sets of values that an attribute may take while being meaningful for user’s perception,

division of attributes into condition and decision attributes, preference orderin domains of attributes andsemantic correlationbetween attributes, both addressed by the dominance principle,

works withheterogenousattributes – nominal, ordinal and cardinal (no need of discretization),

enables to infer decision rule model

(disaggregation-aggregation paradigm).

(8)

Motivations for Using Decision Rule Model

Advantages of decision rules:

comprehensible form of knowledge representation,

can represent any function (more general than utility functions or binary relations),

“resistant” to irrelevant attributes, do not require aggregation operators, support “backtracking”,

(9)

Methodology for Solving Multi-criteria Ranking Problems

The only objective information concerning set 𝐴 of objects is the

dominance relation. However, usually this relation leaves many objects incomparable.

Thus, additionalpreference information in terms ofpairwise comparisonsof some objects from 𝐴 has to be elicited from the decision maker (DM).

This information is used toinducea preference modelin terms of a set of “if . . . then . . . ” decision rules.

After acceptance by the DM, this model can be used to build a

ranking(complete or partial) of all objects from set 𝐴.

(10)

Pairwise Comparison Table (PCT)

Created by pairwise comparisons of some objects from 𝐴. 𝐵 ⊆ 𝐴 × 𝐴 = set of pairs of compared objects.

For each pair of objects (𝑥, 𝑦) ∈ 𝐵 the DM is asked whether: “𝑥is at least as good as𝑦” (i.e., 𝑥outranks𝑦, denoted by 𝑥𝑆𝑦) or

“𝑥is NOT at least as good as𝑦” (i.e., 𝑥 does not outrank𝑦, denoted by 𝑥𝑆𝑐_𝑦).

Alternatively, the DM may specify a ranking (weak order) on some 𝐴𝑅_{⊆ 𝐴.}

For cardinal criteria (∈ 𝐺𝑁 _{⊆ 𝐺)} _{differences of evaluations}_are

stored. For ordinal criteria (∈ 𝐺𝑂_{⊆ 𝐺) and regular attributes}

(11)

Pairwise Comparison Table (PCT)

Exemplary PCT, created for cardinal criteria:

(12)

Dominance Relation for Pairs of Objects

Let 𝐶 = 𝐶𝑁 _{∪ 𝐶}𝑂_{∪ 𝐶}𝐴_{, where 𝐶}𝑁 _{– set of} _{cardinal criteria}_,

𝐶𝑂 – set ofordinal criteria, 𝐶𝐴 – set ofregular attributes.

Dominance oncardinal criteriafrom 𝑃𝑁 _{⊆ 𝐶}𝑁_:

(𝑥, 𝑦)𝐷_𝑃𝑁(𝑤, 𝑧)⇔ ∀𝑔_𝑖∈ 𝑃𝑁 : Δ_𝑖(𝑥, 𝑦) ર Δ_𝑖(𝑤, 𝑧).

Dominance onordinal criteriafrom 𝑃𝑂_{⊆ 𝐶}𝑂_:

(𝑥, 𝑦)𝐷_𝑃𝑂(𝑤, 𝑧)⇔ ∀𝑔_𝑖∈ 𝑃𝑂 : 𝑔_𝑖(𝑥) ર 𝑔_𝑖(𝑤) ∧ 𝑔_𝑖(𝑦) ⪯ 𝑔_𝑖(𝑧).

Indifference onregular attributes from 𝑃𝐴_{⊆ 𝐶}𝐴_:

(𝑥, 𝑦)𝐼_𝑃𝐴(𝑤, 𝑧)⇔ ∀𝑔_𝑖 ∈ 𝑃𝐴: 𝑔_𝑖(𝑥) = 𝑔_𝑖(𝑤) ∧ 𝑔_𝑖(𝑦) = 𝑔_𝑖(𝑧).

For a nonempty set 𝑃 ⊆ 𝐶, 𝑃 = 𝑃𝑁 _{∪ 𝑃}𝑂_{∪ 𝑃}𝐴_:

(𝑥, 𝑦)𝐷𝑃(𝑤, 𝑧) ⇔

(13)

Dominance Relation for Pairs of Objects

Dominance onordinal criteriafrom 𝑃𝑂⊆ 𝐶𝑂_:

(𝑥, 𝑦)𝐷_𝑃𝑂(𝑤, 𝑧)⇔ ∀𝑔_𝑖∈ 𝑃𝑂 : 𝑔_𝑖(𝑥) ર 𝑔_𝑖(𝑤) ∧ 𝑔_𝑖(𝑦) ⪯ 𝑔_𝑖(𝑧).

Example of dominance on single (gain-type)ordinal criterion 𝑔𝑖:

(14)

Dominance Cones

For a pair of objects (𝑥, 𝑦) and a set of condition attributes 𝑃 ⊆ 𝐶:

𝑃 -positive dominance cone

𝐷_𝑃+(𝑥, 𝑦) = {(𝑤, 𝑧) ∈ 𝐵 : (𝑤, 𝑧)𝐷𝑃(𝑥, 𝑦)},

𝑃 -negative dominance cone

(15)

Approximation of Outranking and Non-Outranking Relations

Definitions of𝑃 -lower approximations:

𝑃 (𝑆) = {(𝑥, 𝑦) ∈ 𝐵 : 𝐷_𝑃+(𝑥, 𝑦) ⊆ 𝑆}, 𝑃 (𝑆𝑐) = {(𝑥, 𝑦) ∈ 𝐵 : 𝐷_𝑃−(𝑥, 𝑦) ⊆ 𝑆𝑐}.

Definitions of𝑃 -upper approximations: 𝑃 (𝑆) = ∪ (𝑥,𝑦)∈𝑆 𝐷+_𝑃(𝑥, 𝑦), 𝑃 (𝑆𝑐) = ∪ (𝑥,𝑦)∈𝑆𝑐 𝐷−_𝑃(𝑥, 𝑦). Definitions of𝑃 -boundaries: 𝐵𝑛𝑃(𝑆) = 𝑃 (𝑆) − 𝑃 (𝑆), 𝐵𝑛𝑃(𝑆𝑐) = 𝑃 (𝑆𝑐) − 𝑃 (𝑆𝑐). 15

(16)

Variable-Consistency DRSA (VC-DRSA)

Extension of lower approximations with objects that are almost consistent can solve some of difficulties encountered in the analysis of real problems.

Błaszczyński, Greco, Słowiński and Szeląg (2009) proposed to use the following parameterized definitions ofmonotonic 𝑃 -lower approximationsof 𝑆 and 𝑆𝑐:

𝑃𝛼𝑆_{(𝑆) =}{(𝑥, 𝑦) ∈ 𝑆 : 𝜖𝑃

𝑆(𝑥, 𝑦) ≤ 𝛼𝑆},

𝑃𝛼𝑆𝑐_(𝑆𝑐_{) =}{(𝑥, 𝑦) ∈ 𝑆𝑐_{: 𝜖}𝑃

𝑆𝑐(𝑥, 𝑦) ≤ 𝛼_𝑆𝑐}, where 𝑃 ⊆ 𝐶, 𝑥, 𝑦 ∈ 𝐴, 𝛼𝑆, 𝛼𝑆𝑐 ∈ [0, 1], and cost-typeobject

consistency measures𝜖𝑃_𝑆(𝑦) and 𝜖𝑃_𝑆𝑐(𝑦) are given by:

𝜖𝑃_𝑆(𝑥, 𝑦) = ∣𝐷 + 𝑃(𝑥, 𝑦) ∩ 𝑆𝑐∣ ∣𝑆𝑐_∣ , 𝜖 𝑃 𝑆𝑐(𝑥, 𝑦) = ∣𝐷_𝑃−(𝑥, 𝑦) ∩ 𝑆∣ ∣𝑆∣ .

(17)

Decision Rules

Decision rules are induced in order togeneralize preference information contained in PCT (i.e., discover logical patterns). Pairs of objects (𝑥, 𝑦) from lower approximations of 𝑆 and 𝑆𝑐

are the basis for induction ofcertain rules.

Pairs of objects (𝑥, 𝑦) from upper approximations of 𝑆 and 𝑆𝑐

are the basis for induction ofpossible rules.

Onlyminimal decision rules are considered. A decision rule assigning to 𝑆 (𝑆𝑐) is minimal, if there is no other rule assigning to 𝑆 (resp. 𝑆𝑐_{), which has weaker conditions.}

Decision rules constitute apreference modelof the DM.

(18)

Decision Rules

Exemplarycertaindecision rule, foroutranking relation 𝑆: If Δ𝑚𝑎𝑥𝑆𝑝𝑒𝑒𝑑(𝑥, 𝑦) ≥ 40 ∧

𝑐𝑜𝑚𝑓 𝑜𝑟𝑡(𝑥) ≥ 𝑚𝑒𝑑𝑖𝑢𝑚 ∧ 𝑐𝑜𝑚𝑓 𝑜𝑟𝑡(𝑦) ≤ 𝑙𝑜𝑤 then 𝑥𝑆𝑦.

Exemplarypossibledecision rule, fornon-outrankingrelation 𝑆𝑐_:

If Δ_{𝑚𝑎𝑥𝑆𝑝𝑒𝑒𝑑}(𝑥, 𝑦) ≤ 20 ∧

𝑐𝑜𝑚𝑓 𝑜𝑟𝑡(𝑥) ≤ ℎ𝑖𝑔ℎ ∧ 𝑐𝑜𝑚𝑓 𝑜𝑟𝑡(𝑦) ≥ 𝑚𝑒𝑑𝑖𝑢𝑚 then possibly 𝑥𝑆𝑐_𝑦.

As it can be seen from above, decision rules make use of ordinal properties of criteria only.

(19)

Decision Rules

Two sets of certain/possible decision rules can be considered:

1 _{minimal set} _{of rules,} 2 exhaustive setof rules.

Thus, a question arises – which decision rules resulting from lower/upper approximations of 𝑆 and 𝑆𝑐 _{should be considered as a}

preference model?

Theminimal set ofcertain/possiblerules is non-unique. The choice of such a set is arbitrary and non-trivial.

On the other hand, explicit induction of an exhaustive set of

certain/possiblerules is computationally hard and it results in large number of rules.

(20)

Decision Rules

Two proposed approaches:

Induction of a minimal set of rules by VC-DomLEM algorithm. This algorithm handles induction of certain and possible rules in DRSA as well as induction of certain rules in VC-DRSA. Usage of animplicit(virtual) exhaustive set of (robust) rules, resulting from adaptation of the idea of dominance-based rough set classifier without induction of decision rules (Dembczyński, Pindur, Susmaga (2003)).

(21)

Implicit Exhaustive Set of Decision Rules

There isno need to explicitly generateexhaustive set of certain/possible rules.

For example, in order to verify if there exists at least one certain rule for relation 𝑆 that covers pair (𝑒1, 𝑒2) ∈ 𝐴 × 𝐴, it is enough to

check if there exists at least one pair of objects (𝑥, 𝑦) ∈ 𝑃 (𝑆) that

contributesto assign pair (𝑒1, 𝑒2) to 𝑆.

Let 𝛿[(𝑒1, 𝑒2), (𝑥, 𝑦)] = {𝑔𝑖 ∈ 𝑃 : (𝑒1, 𝑒2)𝐷𝑔𝑖(𝑥, 𝑦)}, where 𝑃 ⊆ 𝐶 (set ofcompatible criteria).

Pair (𝑥, 𝑦) ∈𝑃 (𝑆) contributesto assign pair (𝑒1, 𝑒2) to 𝑆 ⇔

∀(𝑤, 𝑧) /∈𝑃 (𝑆): (𝑤, 𝑧) /∈ 𝐷_𝛿+(𝑥, 𝑦).

(22)

(23)

Application of Decision Rules

Application of certain/possible rules to all pairs of objects from 𝐴 × 𝐴 yields a preference structure(graph) in 𝐴, denoted by 𝒢.

For each pair of objects (𝑥, 𝑦) ∈ 𝐴 × 𝐴, if there exists at least one rule concluding 𝑥𝑆𝑦, then in 𝒢 there will be an𝑆-arc

between 𝑥 and 𝑦.

For each pair of objects (𝑥, 𝑦) ∈ 𝐴 × 𝐴, if there exists at least one rule concluding 𝑥𝑆𝑐_{𝑦, then in 𝒢 there will be an}_𝑆𝑐_-arc

between 𝑥 and 𝑦.

(24)

The 4-valued Outranking Relation

For each ordered pair of objects (𝑥, 𝑦) ∈ 𝐴 × 𝐴, from the application of decision rules there may arise one of the four situations:

𝑥𝑆𝑦 and ¬𝑥𝑆𝑐𝑦, that istrue outranking𝑥𝑆𝑇𝑦, ¬𝑥𝑆𝑦 and 𝑥𝑆𝑐_{𝑦, that is}_{false outranking} _𝑥𝑆𝐹_𝑦,

𝑥𝑆𝑦 and 𝑥𝑆𝑐_{𝑦, that is}_{contradictory outranking}_𝑥𝑆𝐾_𝑦,

¬𝑥𝑆𝑦 and ¬𝑥𝑆𝑐_{𝑦, that is}_{unknown outranking}_𝑥𝑆𝑈_𝑦.

The 4-valued outranking relation underlines the presence and the absence of positive and negative reasons of outranking.

(25)

Representation of the 4-valued Outranking Relation

According to Greco, Matarazzo, Słowiński and Tsoukiàs (1998), the 4-valued outranking relation can befaithfully represented by one

valued binary relationonly. This relation is denoted by ˆ𝑅4𝑣 and

defined as: ˆ 𝑅4𝑣(𝑥, 𝑦) = ⎧ ⎨ ⎩ 0 if 𝑥𝑆𝐹𝑦 1 2 if 𝑥𝑆𝑈𝑦 or 𝑥𝑆𝐾𝑦 1 if 𝑥𝑆𝑇𝑦 .

Binary relation ˆ𝑅4𝑣 on 𝐴 can be exploited using one ofranking

procedures(methods) proposed in the literature for exploitation of valued (fuzzy) binary relation, that yieldslinear ranking (weak order) orpartial preorder on 𝐴.

Exploitation of the ˆ𝑅4𝑣 relation constitutes the last step of the

proposed methodology for solving multi-criteria ranking problems. 25

(26)

Literature Review of Ranking Procedures

Desirableproperties of ranking procedures(explained, e.g., by Bouyssou and Vincke (1997)):

1 neutrality, 2 strict monotonicity, 3 independence of circuits, 4 faithfulness, 5 _ordinality, 6 greatest faithfulness.

Consideredranking procedures for valued binary relation:

yielding linear ranking (weak order) on 𝐴: Net Flow Rule, Min in Favor, (Downward) Iterated Min in Favor,

(27)

Desirable Properties of Ranking Procedures

Property/ Procedure NFR MiF It. MiF L/E

neutrality T T T T strict monotonicity T F F T independence of circuits T F F F faithfulness T F T T ordinality F T T F greatest faithfulness F T T – where:

T = presence of given property, F = lack of given property, – = property does not apply.

All four ranking procedures yield final ranking that respects the dominance relation on set 𝐴.

(28)

Lack of the “respect of data” property

In case of objectsincomparablew.r.t. the dominance relation on 𝐴, it is possible that the final rankingdoes not preservesome pairwise comparisonsgiven by the DM. For example, the DM may say 𝑥𝑆𝑦, but in the final ranking 𝑥 is ranked lower than 𝑦.

We have toaccept the lack of the “respect of data” property, since: we obtain a transitive relation (ranking), starting from

non-transitive relations 𝑆 and 𝑆𝑐_,

we generalize preference information concerning a small set of objects onto a larger set of objects,

we have only contextual information, i.e., pairwise comparisons.

(29)

Lack of the “respect of data” property

(B,A) (B,C) (B,E) (C,A) (C,B) (C,D) (C,E) (D,A) (D,B) (D,C) (D,E) (E,A) (E,B) (E,C) (E,D) -2 -1 0 1 2 3 4 5 6 7 8 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 d(y) Pairs of objects (A,B) (A,C) (A,D) (A,E) (B,D) -8 -7 -6 -5 -4 -3 d(x) 29

(30)

Net Flow Rule

According to Greco, Matarazzo, Słowiński and Tsoukiàs (1998), the

NFRis the only ranking procedure which isneutral,strictly monotonicandindependent of circuits.

Using the above theorem and presented table of properties, we choose theNet Flow Rulesince it satisfiesmost of considered desirable properties.

NFR involves calculation of the followingnet flow scorefor each 𝑥 ∈ 𝐴:

𝑆𝑁 𝐹(𝑥) =

∑

𝑦∈𝐴−{𝑥}

(31)

Net Flow Rule

Instead of switching to ˆ𝑅4𝑣 and calculating 𝑆𝑁 𝐹, it is possible to

obtain the same ranking directly from preference structure 𝒢. Let us define two auxiliaryexistence indices:

𝐸(𝑥𝑆𝑦) = 𝐼(there exists a rule covering (𝑥, 𝑦), conclud. 𝑥𝑆𝑦), 𝐸(𝑥𝑆𝑐𝑦) = 𝐼(there exists a rule covering (𝑥, 𝑦), conclud. 𝑥𝑆𝑐𝑦).

The followingscoreinduces the same linear ranking as 𝑆𝑁 𝐹:

S(𝑥) = ∑ 𝑦∈𝐴:𝑦∕=𝑥 [ 𝐸(𝑥𝑆𝑦)−𝐸(𝑦𝑆𝑥)+𝐸(𝑦𝑆𝑐𝑥)−𝐸(𝑥𝑆𝑐𝑦) ] . 31

(32)

Methodology for Solving Similarity-based Classif. Problems

(33)

Preference Information

The first step consists in creation ofsimilarity tables, one for each

reference object𝑥 ∈ 𝑅𝐸𝐹 (𝐴). At this stage chosenmarginal similarity functionsare used to calculate marginal similarities w.r.t. chosen reference objects.

Different marginal similarity functions can be used, depending on the domain 𝑉𝑐𝑖 of attribute 𝑐𝑖∈ 𝐶, 𝑖 ∈ {1, . . . , ∣𝐶∣}. The minimal requirement that each such function 𝜎𝑐𝑖 : 𝐴 × 𝐴 → [0, 1] must satisfy is that for all 𝑥, 𝑦 ∈ 𝐴,

𝜎𝑐𝑖(𝑦, 𝑥) = 1 ⇔ 𝑦 and 𝑥 have the same value of attribute 𝑐𝑖, 𝑐𝑖∈ 𝐶.

Marginal similarity functions create asimilarity space.

(34)

Marginal Similarity Functions

Numeric attribute𝑐𝑖 with values on interval or ratio scale –

similarity is defined using a mathematical function, e.g.: 𝜎𝑐𝑖 = 1 − ∣𝑐𝑖(𝑥)−𝑐𝑖(𝑦)∣ max_{𝑣𝑖∈𝑉𝑐𝑖}− min_{𝑣𝑖∈𝑉𝑐𝑖} 𝜎𝑐𝑖 = 1 ∣𝑐𝑖(𝑥)−𝑐𝑖(𝑦)∣+1 𝜎𝑐𝑖 = 1 (𝑐𝑖(𝑥)−𝑐𝑖(𝑦))2+1 . . .

Attribute 𝑐𝑖 withnominal values – similarity is defined using a

(35)

Similarity Table

Example for “fuzzy” IRIS problem – part of the similarity table created for reference object 𝑥 no. 36 (5.1,3.4,1.5,0.2|1.0,0.5,0.4), with 𝜎_𝑐_𝑖 = _∣𝑐 1

𝑖(𝑥)−𝑐𝑖(𝑦)∣+1, for 𝑖 ∈ {1, . . . , 4}:

(36)

Dominance Relation for Pairs of Objects

Thedominance relationbetween pairs of objects (𝑥, 𝑦) and (𝑤, 𝑧), w.r.t. set of condition attributes 𝑃 ⊆ 𝐶 is defined as:

(37)

Dominance Cones in the Similarity Space

Given 𝑃 ⊆ 𝐶 and 𝑦, 𝑥 ∈ 𝐴, let:

𝑃 -positive dominance cone𝐷+_𝑃(𝑦, 𝑥) = {𝑤 ∈ 𝐴 : (𝑤, 𝑥)𝐷𝑃(𝑦, 𝑥)},

𝑃 -negative dominance cone𝐷_𝑃−(𝑦, 𝑥) = {𝑤 ∈ 𝐴 : (𝑦, 𝑥)𝐷𝑃(𝑤, 𝑥)}.

In the pair (𝑦, 𝑥), 𝑥 is considered to be areference object, while 𝑦 is called alimit object, because it conditions the membership of 𝑤 in 𝐷+_𝑃(𝑦, 𝑥) or 𝐷_𝑃−(𝑦, 𝑥).

The cones are calculated in thesimilarity space.

(38)

Problem Decomposition

Each decision class (fuzzy set) is consideredseparately from the other classes. In the following, we consider decision class 𝑑_𝑗 ∈ 𝐷, characterized by the membership function 𝜇𝑑𝑗 : 𝐴 → [0, 1].

(39)

Dominance principle

“The more similar is object 𝑦 to object 𝑥 w.r.t to the considered attributes,the closer is 𝑦 to 𝑥 in terms of the membership to a given decision class 𝑑𝑗”.

(40)

Comprehensive Closeness Relations

Given fuzzy set 𝑑𝑗 ∈ 𝐷, we define two kinds of binary

comprehensive closeness relationon 𝐴:

𝑦 ≿𝛼 𝑥 ⇔ 𝜇(𝑦) ∈ [min(𝛼, 𝜇(𝑥)), max(𝛼, 𝜇(𝑥))], (1)

𝑦 ≾𝛼 𝑥 ⇔ 𝛼 ∈ [min(𝜇(𝑦), 𝜇(𝑥)), max(𝜇(𝑦), 𝜇(𝑥))], (2)

where 𝑦, 𝑥 ∈ 𝐴 and parameter 𝛼 ∈ [0, 1].

When 𝑦 ≿𝛼 𝑥, then value 𝜇(𝑦) is in between values 𝛼 and 𝜇(𝑥).

On the other hand, when 𝑦 ≾𝛼𝑥, then 𝛼 is in between values 𝜇(𝑦)

and 𝜇(𝑥).

(41)

Approximated Sets of Objects

We are interested in characterizing sets of objects 𝑦 ∈ 𝐴 being in either kind of comprehensive closeness relation with reference object 𝑥 ∈ 𝐴. Thus, we define:

𝛼-upward set:

𝑆_𝛼≿(𝑥) = {𝑦 ∈ 𝐴 : 𝑦 ≿𝛼 𝑥}, (3)

𝛼-downward set:

𝑆_𝛼≾(𝑥) = {𝑦 ∈ 𝐴 : 𝑦 ≾𝛼 𝑥}. (4)

Sets 𝑆𝛼≿(𝑥) and 𝑆𝛼≾(𝑥) are to be approximated using dominance

cones 𝐷+_𝑃 and 𝐷−_𝑃 in the similarity space.

(42)

Rough Approximation of 𝛼-upward/downward Set

Definitions of𝑃 -lower approximations:

𝑃 (𝑆_𝛼≿(𝑥)) = {𝑦 ∈ 𝐴 : 𝐷_𝑃+(𝑦, 𝑥) ⊆ 𝑆_𝛼≿(𝑥)}, (5) 𝑃 (𝑆_𝛼≾(𝑥)) = {𝑦 ∈ 𝐴 : 𝐷_𝑃−(𝑦, 𝑥) ⊆ 𝑆_𝛼≾(𝑥)}. (6)

Definitions of𝑃 -upper approximations:

𝑃 (𝑆_𝛼≿(𝑥)) = {𝑦 ∈ 𝐴 : 𝐷−_𝑃(𝑦, 𝑥) ∩ 𝑆_𝛼≿(𝑥) ∕= ∅}, (7) 𝑃 (𝑆_𝛼≾(𝑥)) = {𝑦 ∈ 𝐴 : 𝐷+_𝑃(𝑦, 𝑥) ∩ 𝑆_𝛼≾(𝑥) ∕= ∅}. (8)

Definitions of𝑃 -boundaries:

𝐵𝑛𝑃(𝑆𝛼≿(𝑥)) = 𝑃 (𝑆𝛼≿(𝑥)) − 𝑃 (𝑆𝛼≿(𝑥)), (9)

(43)

Decision Rules

Decision rulesare induced in order to identify similarity-based patternsin data.

Lower (or upper) approximations of sets 𝑆𝛼≿(𝑥), 𝑆𝛼≾(𝑥) are the

basis for induction of certain (or possible) decision rules. We distinguish two types of rules and give their formal syntax:

(1) at least rules:

“if 𝜎𝑐𝑖1(𝑦, 𝑥) ≥ 𝑡𝑖1 and . . . and 𝜎𝑐𝑖𝑝(𝑦, 𝑥) ≥ 𝑡𝑖𝑝, then (possibly) 𝑦 ≿𝛼 𝑥”,

(2) at most rules:

“if 𝜎𝑐𝑖1(𝑦, 𝑥) ≤ 𝑡𝑖1 and . . . and 𝜎𝑐𝑖𝑝(𝑦, 𝑥) ≤ 𝑡𝑖𝑝, then (possibly) 𝑦 ≾𝛼 𝑥”,

where {𝑐𝑖1, . . . , 𝑐𝑖𝑝} ⊆ 𝐶, marginal similarity thresholds

𝑡𝑖1, . . . , 𝑡𝑖𝑝∈ [0, 1], and the limiting level of membership 𝛼 ∈ [0, 1].

(44)

Decision Rules

According to definitions (1) and (2), the decision part of the rule of either type can be simplified depending on the relation between values 𝛼 and 𝜇(𝑥):

if 𝛼 ≤ 𝜇(𝑥), then the decision part boils down to:

(1) “then (possibly) 𝜇(𝑦) ∈ [𝛼, 𝜇(𝑥)]”,

(2) “then (possibly) 𝜇(𝑦) ≤ 𝛼”,

if 𝛼 ≥ 𝜇(𝑥), then the decision part boils down to:

(1) “then (possibly) 𝜇(𝑦) ∈ [𝜇(𝑥), 𝛼]”,

(45)

Decision Rules

Analogically to the case of learning from PCT, there is a problem which decision rules resulting from lower/upper approximations of sets 𝑆𝛼≿(𝑥) and 𝑆𝛼≾(𝑥) should be used for

classification.

Theminimal set ofcertain/possiblerules is non-unique. The choice of such a set is arbitrary and non-trivial.

On the other hand, explicit induction of an exhaustive set of

certain/possiblerules is computationally hard and it results in large number of rules.

(46)

Decision Rules

Two proposed approaches:

Induction of a minimal set of rules by VC-DomLEM algorithm. Usage of animplicit(virtual) exhaustive set of (robust) rules, resulting from adaptation of the idea of dominance-based rough set classifier without induction of decision rules (Dembczyński, Pindur, Susmaga (2003)).

(47)

Application of Decision Rules

A set of (induced) rules can be applied tonew objects in order to suggest their value of membership to considered decision class 𝑑𝑗.

During application of rules, there may arise someambiguity among suggestions (i.e., intervals for 𝜇) of the rules that cover a

considered object.

In order toresolvethis ambiguity, we advocate to employ the approach described in: J. Błaszczyński, S. Greco, R. Słowiński. Multi-criteria classification – a new scheme for application of dominance-based decision rules. European Journal of Operational Research, 181(3):1030–1044, 2007.

In this way, for each object classified using rules, one can obtain a

precise suggestion(i.e., a single number) concerning the value of membership to 𝑑𝑗.

(48)

Illustrative example

0 1 8 7 6 5 4 3 2 1 𝑐1 𝑐2 𝑦1 0.4 𝑦2 0.3 𝑥 0.5 𝑦3 0.6 𝑦4 0.7

Figure: Set of five objects described by two condition attributes 𝑐1, 𝑐2

The number inside a rectangle representing an object denotes the value of membership function 𝜇 for this object. Object 𝑥 is a reference object with 𝜇(𝑥) = 0.5. 𝑃 = 𝐶 = {𝑐1, 𝑐2}.

We assume that there are given two marginal similarity functions 𝜎𝑐1, 𝜎𝑐2 defined as:

(49)

Illustrative example

0 1.0 0.5 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 (𝑦4, 𝑥) 0.7 (𝑦1, 𝑥) 0.4 (𝑦2, 𝑥) 0.3 (𝑦3, 𝑥) 0.6 (𝑥, 𝑥) 0.5 𝐷+_𝑃(𝑦3, 𝑥) 𝐷−_𝑃(𝑦2, 𝑥) 𝜎𝑐1 𝜎𝑐2

Figure: Pairs of objects (⋅, 𝑥) in the similarity space created by 𝜎𝑐1, 𝜎𝑐2

The number inside a rectangle representing a pair of objects denotes the value of membership function 𝜇 for the first object in the pair.

(50)

Illustrative example

Positive and negative dominance cones in the similarity space: 𝐷_𝑃+(𝑦1, 𝑥) = {𝑦1, 𝑦2, 𝑥}, 𝐷−𝑃(𝑦1, 𝑥) = {𝑦1, 𝑦4},

𝐷_𝑃+(𝑦2, 𝑥) = {𝑦2, 𝑥}, 𝐷_𝑃−(𝑦2, 𝑥) = {𝑦1, 𝑦2, 𝑦4},

𝐷_𝑃+(𝑥, 𝑥) = {𝑥}, 𝐷−_𝑃(𝑥, 𝑥) = {𝑦1, 𝑦2, 𝑥, 𝑦3, 𝑦4},

𝐷_𝑃+(𝑦3, 𝑥) = {𝑦3, 𝑥}, 𝐷−_𝑃(𝑦3, 𝑥) = {𝑦3},

(51)

Illustrative example

Sets of objects 𝑆𝛼≿(𝑥) and 𝑆𝛼≾(𝑥), for 𝛼 ∈ {0.3, 0.4, 0.5, 0.6, 0.7}:

𝑆_0.3≿ (𝑥) = {𝑦1, 𝑦2, 𝑥}, 𝑆_0.3≾ (𝑥) = {𝑦2},

𝑆_0.4≿ (𝑥) = {𝑦1, 𝑥}, 𝑆_0.4≾ (𝑥) = {𝑦1, 𝑦2},

𝑆_0.5≿ (𝑥) = {𝑥}, 𝑆_0.5≾ (𝑥) = {𝑦1, 𝑦2, 𝑥, 𝑦3, 𝑦4},

𝑆_0.6≿ (𝑥) = {𝑥, 𝑦3}, 𝑆0.6≾ (𝑥) = {𝑦3, 𝑦4},

𝑆_0.7≿ (𝑥) = {𝑥, 𝑦3, 𝑦4}, 𝑆0.7≾ (𝑥) = {𝑦4}.

It is important to notice that in order to calculate the above sets, one only needs to take into account the values of membership function 𝜇.

(52)

Illustrative example

P-lower approximations of sets 𝑆𝛼≿(𝑥) and 𝑆𝛼≾(𝑥):

𝑃 (𝑆_0.3≿ (𝑥)) = {𝑦1, 𝑦2, 𝑥}, 𝑃 (𝑆0.3≾ (𝑥)) = ∅,

𝑃 (𝑆_0.4≿ (𝑥)) = {𝑥}, 𝑃 (𝑆_0.4≾ (𝑥)) = ∅,

𝑃 (𝑆_0.5≿ (𝑥)) = {𝑥}, 𝑃 (𝑆_0.5≾ (𝑥)) = {𝑦1, 𝑦2, 𝑥, 𝑦3, 𝑦4},

𝑃 (𝑆0.6≿ (𝑥)) = {𝑥, 𝑦3}, 𝑃 (𝑆0.6≾ (𝑥)) = {𝑦3, 𝑦4},

𝑃 (𝑆0.7≿ (𝑥)) = {𝑥, 𝑦3}, 𝑃 (𝑆0.7≾ (𝑥)) = {𝑦4}.

P-upper approximations of sets 𝑆𝛼≿(𝑥) and 𝑆𝛼≾(𝑥):

𝑃 (𝑆_0.3≿ (𝑥)) = {𝑦1, 𝑦2, 𝑥}, 𝑃 (𝑆0.3≾ (𝑥)) = {𝑦1, 𝑦2, 𝑦4}, 𝑃 (𝑆0.4≿ (𝑥)) = {𝑦1, 𝑦2, 𝑥}, 𝑃 (𝑆0.4≾ (𝑥)) = {𝑦1, 𝑦2, 𝑦4}, 𝑃 (𝑆0.5≿ (𝑥)) = {𝑥}, 𝑃 (𝑆 ≾ 0.5(𝑥)) = {𝑦1, 𝑦2, 𝑥, 𝑦3, 𝑦4}, 𝑃 (𝑆_0.6≿ (𝑥)) = {𝑥, 𝑦3}, 𝑃 (𝑆0.6≾ (𝑥)) = {𝑦3, 𝑦4}, 𝑃 (𝑆_0.7≿ (𝑥)) = {𝑦1, 𝑦2, 𝑥, 𝑦3, 𝑦4}, 𝑃 (𝑆0.7≾ (𝑥)) = {𝑦4}.

(53)

Illustrative example

Consequences of inconsistencies w.r.t. the dominance relation in the similarity space:

object 𝑦₁∈ 𝑆_0.4≿ (𝑥) does not belong to set 𝑃 (𝑆_0.4≿ (𝑥)) since pair (𝑦1, 𝑥) is dominated by pair (𝑦2, 𝑥) and 𝑦2 ∈ 𝑆/ 0.4≿ (𝑥),

object 𝑦₄∈ 𝑆_0.7≿ (𝑥) does not belong to set 𝑃 (𝑆_0.7≿ (𝑥)) since pair (𝑦4, 𝑥) is dominated by pairs (𝑦1, 𝑥), (𝑦2, 𝑥), and

𝑦1, 𝑦2∈ 𝑆/ 0.7≿ (𝑥),

object 𝑦2∈ 𝑆_0.3≾ (𝑥) does not belong to set 𝑃 (𝑆_0.3≾ (𝑥)) since

pair (𝑦2, 𝑥) dominates pairs (𝑦1, 𝑥), (𝑦4, 𝑥) and

𝑦1, 𝑦4∈ 𝑆/ _0.3≾ (𝑥),

objects 𝑦1, 𝑦2 ∈ 𝑆≾0.4(𝑥) do not belong to set 𝑃 (𝑆 ≾

0.4(𝑥)) since

pairs (𝑦1, 𝑥), (𝑦2, 𝑥) dominate pair (𝑦4, 𝑥) and 𝑦4∈ 𝑆/ _0.4≾ (𝑥).

(54)

Illustrative example

Exemplarycertain at leastrule, for 𝛼−upward set 𝑆_0.3≿ (𝑥):

“if 𝜎𝑐1(𝑦, 𝑥) ≥ 0.25 and 𝜎𝑐2(𝑦, 𝑥) ≥ 1, then 𝜇(𝑦) ∈ [0.3, 0.5]”.

This rule covers objects 𝑦1, 𝑦2, 𝑥; marginal similarity threshold of

0.25 results from 1/(∣𝑐1(𝑦1) − 𝑐1(𝑥)∣ + 1) = 1/(∣1 − 4∣ + 1).

Exemplarypossible at mostrule, for 𝛼−downwardset 𝑆_0.4≾ (𝑥):

“if 𝜎𝑐1(𝑦, 𝑥) ≤

1

3, then possibly 𝜇(𝑦) ≤ 0.4”.

This rule covers objects 𝑦1, 𝑦2, 𝑦4; marginal similarity threshold of 1

(55)

Summary and Conclusions (1)

DRSA is a flexible modeling method that allows to include

domain knowledgeand can handle possibleinconsistencies in data by calculating lower and upper approximations of sets. DRSA allows to work withheterogeneousattributes – nominal, ordinal and cardinal (no need of discretization).

DRSA can be applied to multi-criteria rankingand

multi-attribute similarity-based classification problems, that employ pairwise comparisons expressed in PCT and similarity table, respectively.

Rule model has many advantages, e.g., comprehensibility, generality, lack of aggregation operators, predictive power, resistance to irrelevant attributes.

(56)

Summary and Conclusions (2)

Approach withimplicit exhaustive set of certain/possible rules

is less arbitrarythan approach with explicit minimal set of decision rules while maintaining polynomial-in-time complexity.

Net Flow Rule appears to bethe most appropriate ranking procedure for exploitation of a 4-valued outranking relation, resulting from application of 𝑆/𝑆𝑐decision rules.

Presented method of case-based reasoning using DRSA exploits only ordinal propertiesof marginal similarity functions and membership functions of fuzzy sets.

Moreover, itavoids aggregation of marginal similarities into one comprehensive similarity (expressed by a real valued aggregation function), which would require specific axioms and is always arbitrary to some extent.

(57)

Future Work

Investigation of the “respect of data” property, understood as in: Luis C. Dias, Claude Lamboray, Extensions of the prudence principle to exploit a valued outranking relation, EJOR, 201(3):828–837, 2010.

Investigation of the monotonicityproperty.

Investigation of properties of the Repeated Net Flow Rule

ranking procedure.

Implementation of the proposed CBR-DRSA method in the jRS library.

Computationalexperiments.

(58)

References

Błaszczyński J., Greco S., Słowiński R., Szeląg M., Monotonic Variable Consistency Rough Set Approaches, International Journal of Approximate Reasoning, 50(7), 2009, pp. 979–999. Bouyssou D., Vincke Ph., Ranking alternatives on the basis of preference relations: A progress report with special emphasis on outranking relations, Journ. of MCDA, 6, 1997, pp. 77–85. Dembczyński K., Pindur R., Susmaga R., Dominance-based Rough Set Classifier without Induction of Decision Rules, Electr. Notes Theor. Comput. Sci., 82(4), 2003, pp. 84–95. Fortemps P., Greco S., Słowiński R., Multicriteria decision support using rules that represent rough-graded preference relations, EJOR, 188(1), 2008, pp. 206–223.

Greco S., Matarazzo B., Słowiński R., Tsoukiàs A., Exploitation of a Rough Approximation of the Outranking Relation in Multicriteria Choice and Ranking, Lecture Notes in Economics and Mathematical Systems, 465, 1998, pp. 45–60.