Variable Consistency Dominance-Based Rough Set Approach to Preference Learning in Multiriteria Ranking

(1)

Variable Consistency Dominance-Based Rough Set

Approach to Preference Learning in Multiriteria

Ranking

Marcin Szeląg

Institute of Computing Science, Poznań University of Technology, Poland November 13, 2012

(2)

Outline

1 Introduction

Formulation of Multi-Criteria Ranking Problem Motivations

2 Methodology for Dealing with Multi-Criteria Ranking Problems Preference Information

Variable Consistency Dominance-based Rough Set Approach Decision Rules

Application of Decision Rules Exploitation of Preference Graph Ranking Methods

Illustrative Example

(3)

(4)

Multi-Criteria Ranking Problem

Multi-criteria ranking problemis a decision problem in which a set ofobjects (alternatives) A described by a set ofcriteria

G = {g1, . . . , gn} has to be ordered, either completely (weak order) or partially (partial preorder).

Each criterion gi∈ G is modeled as a real-valued function gi: A → <,

with

cardinal scale (i.e., interval scale or ratio scale) or

ordinal scale (given a priori or resulting from an

order-preserving number-coding of non-numerical ordinal evaluations).

(5)

Multi-Criteria Ranking Problem

Cardinal criterion = criterion with cardinal scale: one can measure theintensity of preference(positive or negative) of object a over object b, taking into account evaluations gi(a), gi(b), a, b ∈ A, using any function

ki: <2→ <

non-decreasing w.r.t. the first evaluation, and non-increasing w.r.t. the second evaluation.

Greco S, Matarazzo B, Słowiński R, Rough sets theory for multicriteria decision analysis, European J. Operational Research 129(1), 2001, pp. 1–47.

For the sake of simplicity, we assume that

ki(gi(a), gi(b)) = ∆i(a, b) = gi(a) − gi(b).

Ordinal criterion= criterion with ordinal scale: differences of evaluations are not meaningful,

(6)

Multi-Criteria Ranking Problem Example

Car ranking problem

Order a given set of 14 cars from the best to the worst (with possible ties), taking into account the following criteria:

1 maximum speed in km/h (to be maximized),

2 comfort: low ≺ medium ≺ high (to be maximized),

3 price in EUR (to be minimized),

(7)

General Motivations

Multi-criteria ranking is an important, non-trivial, and practical problem.

Main difficulty consists in aggregation of different and usually conflicting criteria; usually such aggregation is performed arbitrary, using weights or aggregation operators like sum, average or distance metrics.

Need for multi-criteria modeling method that allows to: include domain knowledge,

handle possible inconsistencies w.r.t. dominance relation, avoid using aggregation operators.

(8)

Motivations for application of DRSA

Dominance-based Rough Set Approach (DRSA), introduced by Greco, Matarazzo and Słowiński in 1996:

handles inconsistenciesin data, resulting, e.g., from imprecise of incomplete information,

takes into accountdomain knowledge:

domains of attributes, i.e., sets of values that an attribute may take while being meaningful for user’s perception,

division of attributes into condition and decision attributes,

preference orderin the domains of attributes andmonotonic relationshipsbetween attributes,

works withheterogeneous attributes – nominal, ordinal and cardinal (no need of discretization),

enables to infer decision rule model

(9)

Motivations for Using Decision Rule Model

Advantages of decision rules:

comprehensible form of knowledge representation,

can represent any function (more general than utility functions or binary relations),

“resistant” to irrelevant attributes, do not require aggregation operators, support “backtracking”,

(10)

Methodology for Dealing with

Multi-Criteria Ranking Problems

(11)

Methodology for Dealing with M-C Ranking Problems

The only objective information concerning set A of objects is the

dominance relationD over A:

aDb ⇔ gi(a) gi(b) for all gi ∈ G.

However, usually this relation leaves many objects incomparable. In order to make the objects more comparable, the DM has to supplypreference informationin terms ofpairwise comparisonsof somereference objects (set AR_{), i.e., objects relatively well known} to the DM.

This information is used toinducea preference modelin terms of a set of “if . . . then . . . ” decision rules.

After acceptance by the DM, this model can be used to build a

(12)

Methodology for Dealing with M-C Ranking Problems

We consider two problem settings:

(1) set G is a consistent family of criteria, i.e., G satisfies the properties of completeness (all relevant criteria are considered), monotonicity (the better the evaluation of an object on considered criteria, the more it is preferable to another object), and non-redundancy (there is no criterion which could be removed without violating one of the previous two properties), (2) set G is anarbitrary set of criteria.

Setting (1), denoted in the following by s_{M CDA}, is typical for

Multiple Criteria Decision Aiding. Setting (2), denoted in the following by, sM L is typical forMachine Learning.

(13)

Pairwise Comparison Table (PCT)

Created by pairwise comparisons of reference objects. B ⊆ AR× AR _{= set of pairs of compared reference objects.} Given objects a, b ∈ AR, a 6= b, the DM can declare that:

“ais at least as good asb” (aoutranksb, denoted byaSb) or “ais NOT at least as good asb” (adoes not outrank b, denoted byaSc_b₎

or (s)he can abstain from any judgment. We fix aSa for every a ∈ AR_.

For sM CDA, we fix aSb for a, b ∈ AR such that aDb.

When comparing objects a, b ∈ AR _{on a cardinal criterion, one} puts in the corresponding column of PCT value

ki(gi(a), gi(b)) = ∆i(a, b).

When comparing objects a, b ∈ AR _{on an ordinal criterion, one} puts in the corresponding column of PCT ordered pair

(14)

Pairwise Comparison Table (PCT)

Exemplary PCT, where g₁ – cardinal criterion, g₂ – ordinal criterion:

Pair of ref. Evaluations of pair on criteria Preference

objects g1 . . . gn information

(a, b) ∆1(a, b) . . . (gn(a), gn(b)) aSb (b, a) ∆1(b, a) . . . (gn(b), gn(a)) bSca (b, c) ∆1(b, c) . . . (gn(b), gn(c)) bSc

. . . .

(15)

Dominance Relation for Pairs of Objects

Marginal dominance relationDi₂ for pairs (a, b), (c, d) ∈ B

For cardinal criterion gi ∈ G:

(a, b)D₂i(c, d) ⇔ ∆i(a, b) ∆i(c, d) For ordinal criterion gi∈ G:

(16)

Dominance Relation for Pairs of Objects

Dominance relationD2 for pairs (a, b), (c, d) ∈ B

(a, b)D2(c, d) if (a, b)D2i(c, d) for all gi∈ G, i.e., if a is preferred to b at least as much as c is preferred to d for all gi ∈ G.

(17)

Dominance Cones

For a pair of objects (a, b) ∈ B:

positive dominance coneD+₂(a, b) = {(c, d) ∈ B : (c, d)D2(a, b)},

(18)

(19)

Inconsistencies in the Preference Information

Preference information (pairwise comparisons of reference objects) may beinconsistentw.r.t. dominance relation D2 due to:

uncertainty of information – hesitation of the DM, unstable preferences,

incomplete determination of the set G of criteria,

granularityof information.

We handle the inconsistencies using adominance-based rough set approach. Before learning of a preference model of the DM, we

structurepairs of objects contained in a PCT by calculation of

lower approximationsof S and Sc. In this way, we restrict a priori the set of pairs of objects on which thepreference modelis build to a subset ofsufficiently consistent pairs of objects. Our goal is to obtain areliable preference model.

(20)

Dominance-based Rough Set Approach (DRSA)

Lower approximationsof S and Sc

S = {(a, b) ∈ B : D+₂(a, b) ⊆ S}, Sc= {(a, b) ∈ B : D−₂(a, b) ⊆ Sc}.

Upper approximationsof S and Sc

S = [ (a,b)∈S D₂+(a, b), Sc₌ [ (a,b)∈Sc D₂−(a, b). Boundariesof S and Sc Bn(S) = S − S,

(21)

Variable-Consistency DRSA (VC-DRSA)

Błaszczyński J, Greco S, Słowiński R, Szeląg M, Monotonic Variable Consistency Rough Set Approaches, International J. of Approximate Reasoning, 50(7), 2009, pp. 979–999.

Consistency is quantified using cost-typeconsistency measures

S, Sc : B → [0, 1], defined as: S(a, b) = |D+₂(a, b) ∩ Sc_| |Sc_| , Sc(a, b) = |D−₂(a, b) ∩ S| |S| .

Parameterized lower approximationsof S and Sc

S = {(a, b) ∈ S : S(a, b) ≤ θS}, Sc= {(a, b) ∈ Sc: Sc(a, b) ≤ θ_Sc},

(22)

Positive Regions

We definepositive regions of relations S and Sc _{as follows:} P OS(S) = [ (a,b)∈S D₂+(a, b), P OS(Sc) = [ (a,b)∈Sc D−₂(a, b).

Positive regions defined above contain pairs of objects sufficiently consistent, i.e., belonging to lower approximations of relation S or Sc, and can also contain some inconsistent pairs of objects which fall into dominance cones D+₂(·, ·) or D₂−(·, ·) originating in pairs of objects from lower approximations of relation S or Sc_{, respectively.}

(23)

Decision Rules

Decision rules are induced in order togeneralize description of sufficiently consistent pairs of objects from SP CT (i.e., pairs of objects from parameterized lower approximations of S and Sc_). Onlyminimal decision rules are considered. A decision rule suggesting assignment to S (Sc) is minimal, if there is no other rule suggesting assignment to S (resp. Sc_{), which has} not stronger conditions and not worse consistency.

Each rule is supported by at least one object from respective lower approximation and is allowed to cover only objects from respective positive region.

Decision rules constitute apreference modelof the DM who gave the pairwise comparisons of reference objects.

(24)

Decision Rules

Decision rules are induced usingVC-DomLEMsequential covering algorithm, which generates minimal set of decision rules.

Błaszczyński J, Słowiński R, Szeląg M, Sequential Covering Rule Induction Algorithm for Variable Consistency Rough Set

Approaches, Information Sciences, 181, 2011, 987-1002.

Rule consistency is measured by cost-typerule consistency measure

bT : RT → [0, 1] defined as: b T(rT) = kr_Tk ∩ ¬T |¬T | , where T ∈ {S, Sc_{}, R}

T = set of rules suggesting assignment to relation T , r_T ∈ R_T, kr_Tk = the set of pairs of objects covered by

(25)

Decision Rules

ExemplaryS-decision rule(induced from S): If ∆_maxSpeed(a, b) ≥ 25 ∧

comf ort(a) ≥ 3 ∧ comf ort(b) ≤ 2 then aSb.

“Ifcar a has max speed at least 25 km/h greater than car b

(cardinal criterion) andcar a has comfort at least 3while car b has

comfort at most 2(ordinal criterion),

then car a is at least as good as car b”.

ExemplarySc-decision rule, (induced from Sc_): If ∆maxSpeed(a, b) ≤ 20 ∧

comf ort(a) ≤ 2 ∧ comf ort(y) ≥ 1 then aSc_b.

As it can be seen from above, decision rules make use of ordinal properties of criteria only.

(26)

Application of Decision Rules

Application of induced decision rules on set A of objects to be ranked yields a specific preference structureon A.

Each pair of objects (a, b) ∈ A × A can be covered by some decision rules suggesting assignment to relation S and/or to relation Sc_{. It can be also not covered by any rule. In order to} address these possibilities, we define two relations over set A, denoted by S and Sc.

Definitions of relations S and Sc _{depend on the adopted} problem setting (sM CDA or sM L). Moreover, these relations can be defined as crisp or fuzzy.

We focus on the following two cases:

sM CDA + crisp relations,

(27)

Application of Decision Rules – Crisp Relations

sM CDA

S ={(a, b) ∈ A × A : (∃ rS∈ RS : rS covers (a, b)) or (aDb)}, Sc_{={(a, b) ∈ A × A : (∃ r}

Sc ∈ R_Sc : r_Sc covers (a, b))

and not (aDb)}.

sM L

S ={(a, b) ∈ A × A : (∃ r_S∈ R_S : rS covers (a, b)) or (a = b)}, Sc={(a, b) ∈ A × A : (∃ rSc ∈ R_Sc : r_Sc covers (a, b))

and not (a = b)}.

Relation S is reflexive and relation Scis irreflexive. Moreover, relations S and Sc _{are, in general, not transitive nor complete.}

(28)

Application of Decision Rules – Fuzzy Relations

1 We treat each rule r_T covering pair (a, b) as anargument

(piece of evidence) for assignment of this pair to relation T .

2 _{We take into account}_strength_{σ of each argument (rule r}_T₎

defined in the following way:

σ(rT) = 1 −bT(rT)cf (rT),

where cf (r_T) denotescoverage factor of rule r_T, defined as the ratio of the number of pairs of objects supporting rT and the cardinality of relation T .

3 We accumulate the strength of the arguments supporting assignment of pair (a, b) to relation T by taking maximum strengthof these arguments.

(29)

Application of Decision Rules – Fuzzy Relations

sM CDA

S(a, b) =

max{σ(rS) : rS ∈ RS, rS covers (a, b)}, if not aDb 1, if aDb

Sc(a, b) =

max{σ(rSc) : r_Sc ∈ R_Sc, r_Sc covers (a, b)}, if not aDb

0, if aDb

sM L

S(a, b) =

max{σ(rS) : rS ∈ RS, rS covers (a, b)}, if a 6= b 1, if a = b

Sc(a, b) =

max{σ(rSc) : r_Sc ∈ R_Sc, r_Sc covers (a, b)}, if a 6= b

0, if a = b

Relation S is reflexive and relation Scis irreflexive. Moreover, relations S and Sc _{are, in general, not transitive nor complete.}

(30)

Application of Decision Rules

Both relations S and Sc _{can be jointly represented by a directed} multigraph G calledpreference graph. Each vertex (node) va of G corresponds to exactly one object a ∈ A. G contains two types of arcs: S-arcs and Sc_-arcs.

In case ofcrisp relations, an S-arc (Sc_{-arc) from vertex v}

a to vertex vb indicates that aSb (resp. aScb).

In case offuzzyrelations, each S-arc (Sc-arc) from vertex va to vertex vb is assigned a weight equal to S(a, b) (resp. Sc(a, b)).

Afinal recommendationfor the multi-criteria ranking problem at hand, in terms of a weak order or partial preorder of all objects from

(31)

Exploitation of Preference Graph

We consider mainly two ways of exploitation of preference graph G:

1 direct exploitation of preference relations S and Sc by theNet Flow Score (NFS) procedure that employsscoring function

SN F : A → <, inducing a weak order over A, defined as: SN F(a) = X

b∈A\{a}

S(a, b) − S(b, a) − Sc_{(a, b) + S}c_{(b, a),}

2 _{transformation of preference graph}G to another graph G0 representing a fuzzy relationR over set A, then exploitation of this relation using a ranking method(RM) , i.e., a function assigning a partial preorder (A, R) over A to any finite set A and any fuzzy relation R over this set.

(32)

Exploitation of Preference Graph

Fuzzy relation Ris defined as:

R(a, b) = S(a, b) + (1 − S c_{(a, b))}

2 ,

where a, b ∈ A.

Scoring function SN F can be expressed in terms of R as:

SN F(a) = 2h X b∈A\{a}

R(a, b) − R(b, a)i.

Relation R is reflexive.

If relations S and Sc are crisp, then R(a, b) ∈ {0,1₂, 1}, for any (a, b) ∈ A × A; in such case, we call R a three-valued

(33)

Literature Review of Ranking Methods

Net Flow Rule (NFR) – yields weak order using scoring function SD : A → < defined as:

SD(a) = P

b∈A\{a}

R(a, b) − R(b, a).

Iterative Net Flow Rule(It.NFR) – yields weak order by iterative application of scoring function SD.

Min in Favor(MiF) – yields weak order using scoring function mF : A → < defined as: mF (a) = min

b∈A\{a}R(a, b).

Iterative Min in Favor (It.MiF) – yields weak order by iterative application of scoring function mF .

Leaving and Entering Flows (L/E) – yields a partial preorder being the intersection of two weak orders obtained using scoring functions SF and −SA, defined as:

SF (a) = P b∈A\{a}

R(a, b), −SA(a) = − P b∈A\{a}

(34)

Desirable Properties of Ranking Methods

3-valuedR arbitraryR

neutrality (N ) neutrality (N ) monotonicity (M ) monotonicity (M ) covering compatibility (CC) covering compatibility (CC)

discrimination(D) independence of non-discriminating objects (IN DO)

faithfulness (F ) independence of circuits (IC) data-preservation (DP ) ordinality (O) independence of non-discriminating continuity(C)

objects (IN DO)

independence of circuits (IC) faithfulness (F ) ordinality (O) data-preservation (DP ) greatest-faithfulness (GF ) greatest-faithfulness (GF ) Given priority order reflects relative importance of the properties.

(35)

Desirable Properties of Ranking Methods

(N ) – a ranking method does not discriminate between objects just because of their labels (or, in other words, their order in the considered set A),

(M ) – improving an object cannot decrease its position in the ranking and, moreover, deteriorating an object cannot improve its position in the ranking,

(CC) – when a “covers” b, b should not be ranked before a; in case of exploitation of fuzzy relation R, property CC of applied RM guaranties that the final ranking produced by this method respects dominance relation D over set A,

(D) – for each set of objects A there exists at least one fuzzy relation R over A such that the ranking obtained by a

considered RM is a complete order over set A,

(36)

Desirable Properties of Ranking Methods

(DP ) – when it is possible to obtain a partial preorder on the basis of given transitive crisp relation without deleting

information contained in this relation, a RM should do so, (IN DO) – when there is a subset of objects that compare in the same way to all other objects, the ranking of the other objects is not affected by the presence of this subset,

(IC) – the ranking is not affected by adding the same positive or negative value to the weights of all arcs in any cycle of G0, (O) – ordinality implies that a RM should not make use of the “cardinal” properties of exploited fuzzy relation,

(C) – “small” changes in an exploited fuzzy relation should not lead to radical changes in the final ranking produced by a RM, (GF ) – if there are some greatest elements of a given set A,

(37)

Desirable Properties of Ranking Methods – 3-valued R

Property/ RM N F R It.N F R M iF It.M iF L/E

N T T T T T M T F T F T CC T T T T T D T T F T T F T T F T T DP T T T T T IN DO T T F F T IC T F F F F O F F T T F GF F F T T T where:

T = presence of given property, F = lack of given property, bold– proof in the literature, italics – proven by the author. All considered ranking methods yield final ranking that respects the dominance relation on set A (since they have property CC).

(38)

Desirable Properties of Ranking Methods – arbitrary R

Property/ RM N F R It.N F R M iF It.M iF L/E

N T T T T T M T F T F T CC T T T T T IN DO T T F F T IC T F F F F O F F T T F C T F T F T F T T F T T DP T T T T T GF F F T T T where:

T = presence of given property, F = lack of given property, bold– proof in the literature, italics – proven by the author.

(39)

Desirable Properties of Ranking Methods

In view of the considered list of desirable properties,the best ranking methodfor exploitation of fuzzy relation R is theNet Flow Rulemethod. This is because it satisfies most (eight out of ten) of the properties (which is, however, true also for the L/E ranking method) and, moreover, satisfies the first eight/five properties.

N F R ranking method is attractive also because it represents an intuitive way of reasoning about relative worth of objects in set A, as it takes into account bothpositive and negative arguments

concerning each object (i.e. strength and weakness of each object). Exploitation of relation R using N F R ranking method yields the same ranking (weak order) as direct exploitation of relations S and Sc_{using scoring function S}N F_.

(40)

(41)

(42)

Summary and Conclusions

VC-DRSA is a flexible modeling method that allows to include

domain knowledgeand handles inconsistenciesin data by calculating lower approximations of sets.

VC-DRSA allows to work with heterogeneousattributes – nominal, ordinal, and cardinal (no need of discretization). Preference information in terms of pairwise comparisons of some reference objects isrelatively easy to elicit from the DM, Presented methodology involvesnon-statistical processing of preference information and induction of decision rules from decision examples (pairwise comparisons of reference objects).

Rule model has many advantages, e.g., comprehensibility, generality, lack of aggregation operators.

Net Flow Rule appears to bethe most appropriate ranking method for exploitation of a fuzzy relation over a set of

(43)

References

Szeląg M, Greco S, Słowiński R, Rule-Based Approach to Multicriteria Ranking, [in]: Doumpos M, Grigoroudis E (Eds.), Multicriteria Decision Aid and Artificial Intelligence: Links, Theory and Applications, Wiley, 2013, to appear.

Szeląg M, Greco S, Słowiński R, Variable Consistency

Dominance-Based Rough Set Approach to Preference Learning in Multicriteria Ranking, submitted to Machine Learning. Greco S, Matarazzo B, Słowiński R, Tsoukiàs A,Exploitation of a Rough Approximation of the Outranking Relation in Multicriteria Choice and Ranking, Lecture Notes in Economics and Mathematical Systems, 465, 1998, pp. 45–60.

Fortemps Ph, Greco S, Słowiński R, Multicriteria decision support using rules that represent rough-graded preference relations, EJOR, 188 (2008) 206-223.

Bouyssou D, Vincke Ph, Ranking alternatives on the basis of preference relations: A progress report with special emphasis on outranking relations, Journ. of MCDA, 6, 1997, pp. 77–85.

(44)

Questions and Discussion

(45)

Future Work

Comparison of effectiveness of crisp and fuzzy preference structures.

Comparison of the proposed methodology (using problem setting s_{M L}) with SVM-rank and RankBoost on benchmark data sets from UCI and possibly on LETOR data set.

(46)

Lack of the “respect of data” property

In case of objectsincomparablew.r.t. the dominance relation on A, it is possible that the final rankingdoes not preservesome pairwise comparisonsgiven by the DM. For example, the DM may say aSb, but in the final ranking a is ranked lower than b.

We have toaccept the lack of the “respect of data” property, since: we obtain a transitive relation (ranking), starting from

non-transitive relations S and Sc_,

we generalize preference information concerning a small set of objects to a larger set of objects,

we have only contextual preference information, i.e., pairwise comparisons.

(47)

Lack of the “respect of data” property

(B,A) (B,C) (B,E) (C,A) (C,B) (C,D) (C,E) (D,A) (D,B) (D,C) (D,E) (E,A) (E,B) (E,C) (E,D) -2 -1 0 1 2 3 4 5 6 7 8 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 d(y) Pairs of objects (A,B) (A,C) (A,D) (A,E) (B,D) -8 -7 -6 -5 -4 -3 d(x)