• Nie Znaleziono Wyników

Multi-Objective Search for Comprehensible Rule Ensembles

N/A
N/A
Protected

Academic year: 2021

Share "Multi-Objective Search for Comprehensible Rule Ensembles"

Copied!
48
0
0

Pełen tekst

(1)

Multi-Objective Search for Comprehensible

Rule Ensembles

Jerzy Błaszczyński Bartosz Prusak Roman Słowiński

Poznań University of Technology, Institute of Computing Science, Piotrowo 2, 60-965 Poznań, Poland

(2)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

6 Conclusions

(3)

Motivation

Take an ensemble rule model and make it comprehensible

while maintaining its predictive performance.

Use a combination of ILP (to find rule classifiers with desirable properties wrt size, support, ani-support and confirmation) and evolutionary optimization (to get an accurate and diversified ensemble).

(4)

Decision Rule Model

Decision rulesare known to be a simple and comprehensible representation of knowledge:

if conditions then decision (prediction).

Condition part of a rule is composed of elementary conditions.

Decision rule model is a set ofminimalrules that cover the whole training set.

Decision rule model is sometimes called aglass-box classifier.

(5)

Why Rule Ensemble Model?

Single decision rule model is unstable when it is induced by a heuristic (such as sequential covering, like VC-DomLEM1).

Unstability of rule model have consequences from both predictive as well as interpretability perspectives.

(6)

Why Rule Ensemble Model?

Standard way to improve predictive performance of an unstable model is to construct an ensemble(such as VC-bagging2,3, which does better than standard bagging).

Rule models that compose the ensemble are called

base classifiers.

2

J. Błaszczyński, R. Słowiński, J. Stefanowski, Variable consistency bagging ensembles, Trans. Rough Sets, XI (LNCS 5946), 40–52 (2010)

3

J. Błaszczyński, R. Słowiński, J. Stefanowski, Ordinal classification with monotonicity constraints by variable consistency bagging, RSCTC 2010, LNCS 6086, Springer, Berlin, 392–401 (2010)

(7)

Interpretability of Ensemble Rule Models

Record of a patient who needs diagnosis

Age Gender FLHAEM GAT . . .

(8)

Interpretability of Ensemble Rule Models

Diagnosis

Risk of glaucoma with 100 % of certainty

(9)

Interpretability of Ensemble Rule Models

Rules that support the diagnosis

49 classifiers: if f lare haemorrhage then Glaucoma, 31 classifiers: if GAT ≥ 21 then Glaucoma,

(10)

Interpretability of Ensemble Rule Models

Rules that support the diagnosis

6 classifiers:

if cumulative HRBP in 5 hours before sleep to sleep ≤ 1447 and slope of DAP in wake to 5 hours after wake ≤ −0.067

then Glaucoma, 2 classifiers:

if intercept of T F in wake to 5 hours after wake ≤ 23.23 and cumulative T F GAT in wake to 5 hours after wake ≥ 568.2 then Glaucoma,

. . .

(11)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

(12)

Comprehensible Classifier

Constructcomprehensible ensembles of rule classifiers, such that:

base classifiers will be composed of minimal setsofstrong

andconfirmatory rules covering a high percentage of the training set,

base classifiers will be maximally diversified within the ensemble, while the whole ensemble will be maximally

accurate.

The most accurate base classifier =comprehensible single classifier

(13)

Proposed Methodology

The methodology is composed of three elements performed on a data set divided into training and validation sets.

First element

a rule ensemble is constructed on the training set,

all rules that compose this ensemble are integrated as one initial set of rules.

(14)

Proposed Methodology

The methodology is composed of three elements performed on a data set divided into training and validation sets.

Second element

evolutionary bi-objective procedure NSGA-II4 is applied to evolve a population of comprehensible sets of rules covering all objects from training samples,

two considered objectives are accuracy of prediction and diversity, both estimated on the validation set.

4

Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.:, A fast and elitist multi-objective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6 (2), 182–197 (2002)

(15)

Proposed Methodology

The methodology is composed of three elements performed on a data set divided into training and validation sets.

Third element

members of the population are obtained in result of solving a series of ILP problems on the initial set of rules,

this iterative procedure leads to a population that constitutes the comprehensible ensemble classifier.

(16)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

6 Conclusions

(17)

Problem definition

Rall is the initial set of all rules from the first element ensemble,

divide Rall into two subsets: R0all, and R1all, composed of rules assigning objects to class Cl0 or to class Cl1, respectively, we are searching for comprehensible sets of rules RM C⊂ Rall, RM C is divided, analogously, into R0M C ⊂ R0all, and

R1M C ⊂ R1 all.

(18)

Notation

AR is a sample of training objects, where ai is the i-th training object,

for class Clj, j = 0, 1, rjk is a rule belonging to set RjM C, A(rkj) is a set of objects ai ∈ Clj covered by rule rkj,

v(rjk) ∈ {0, 1} is a binary variable taking value 1 when rule rjk belongs to RjM C, and 0 otherwise,

Tmax is the maximum number of times any object from AR

may be covered by rules from RM C.

(19)

Multi-objective ILP problem

We can find a comprehensible set of rules by solving the following multi-objective integer linear programming (ILP) problem.

(20)

Multi-objective ILP problem

minimize f1= X r0 k∈R0all v(rk0) + X r1 k∈R1all v(r1k), (1) maximize f2 = X r0 k∈R0all v(rk0) × sup(r 0 k) + X r1 k∈R1all v(r1k) × sup(r 1 k), (2) or fˆ2= min r0 k∈R0all,r1k∈R1all v(r0 k) × sup(r 0 k), v(r 1 k) × sup(r 1 k) , minimize f3= X r0 k∈R0all v(r0k) × asup(r 0 k) + X r1 k∈R1all v(rk1) × asup(r 1 k), (3) or fˆ3= min r0 k∈R0all,r1k∈R1all v(r0 k) × asup(r 0 k), v(r 1 k) × asup(r 1 k) , maximize f4= X r0 k∈R0all v(r0k) × cf ir(r 0 k) + X r1 k∈R1all v(r1k) × cf ir(r 1 k), (4) or fˆ4= min r0 k∈R0all,rk1∈R1all v(r0 k) × cf ir(r 0 k), v(r 1 k) × cf ir(r 1 k) , 13/33

(21)

Multi-objective ILP problem

subject to the following constraints:

X r0 k:ai∈A(rk0) v(rk0) ≥ 1 for all ai∈ Cl0⊂ AR (5) X r1 k:ai∈A(rk1) v(rk1) ≥ 1 for all ai∈ Cl1⊂ AR (6) X r0 k:ai∈A(rk0)

v(rk0) ≤ Tmax for all ai∈ Cl0⊂ AR (7)

X

r1 k:ai∈A(r

1 k)

v(rk1) ≤ Tmax for all ai∈ Cl1⊂ AR (8)

v(rk0) ∈ {0, 1} for all r0k ∈ R0 all, and v(r 1 k) ∈ {0, 1} for all r 1 k∈ R 1 all.

(22)

Single-objective ILP

Instead of performing a multi-objective optimization, at this stage, we aggregate all objectives into one goal function, which involves a kind ofregularizationof objective (1).

(23)

Single-objective ILP

minimize F = X r0 k∈R0all v(rk0) + X r1 k∈R1all v(r1k) (10) − λ1× X r0 k∈R0all v(rk0) × sup(r 0 k) + X r1 k∈R1all v(r1k) × sup(r 1 k) ! + λ2× X r0 k∈R0all v(rk0) × asup(r 0 k) + X r1 k∈R1all v(r1k) × asup(r 1 k) ! − λ3× X r0 k∈R0all v(rk0) × cf ir(r 0 k) + X r1 k∈R1all v(r1k) × cf ir(r 1 k) !

(24)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

6 Conclusions

(25)

Connection with ILP Solver

To obtain a population of n comprehensible sets of rules, we solve a series of n ILP problems (10),(5)-(9) for n training samples AR, associated with n vectors Λ = {λ1, λ2, λ3} and values of Tmax.

AR is a random stratified subset of the training set of fixed size (e.g., 90%).

(26)

Objectives

Base classifier i resulting from solution of the i-th ILP problem with training sample ARi , vector Λi= [λi1, λi2, λi3] and Tmaxi , is evaluated wrt two objectives:

1. Geometric mean of its sensitivity and specificity

G-meani = r T P T P + F N × T N T N + F P 19/33

(27)

Objectives

2. Yule’s Q statistic, transformed from the pairwise index

Qi,k = 1 −

N11N00− N01N10

N11N00+ N01N10, Qi,k ∈ [0, 2],

to the index expressing how diverse is base classifier i comparing to other classifiers in the population:

Qi = max

k {Qi,k} + α × Pn

k=1Qi,k

n ,

(28)

Bi-objective Optimization Problem

Construction of a comprehensible ensemble classifiers can be formulated as the following bi-objective optimization problem:

maximize {G-meani, Qi}

subject to ILP (10),(5)-(9), and λi1, λi2, λi3 ∈ [0, 1], Tmaxi ≥ 0, ARi .

We adopt the elitist non-dominated sorting genetic algorithm NSGA-II4 to perform the optimization.

4

Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.:, A fast and elitist multi-objective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6 (2), 182–197 (2002)

(29)

NSGA-II Search for Comprehensible Rule

Ensemble

Step 1: Generate an initial population Pt=0 of base classifiers for n randomly selected training samples AR

i of the same size, and n randomly chosen vectors Λi = [λi1, λi2, λi3], and values of Tmaxi =√N (i = 1, . . . , n), i.e., solve n ILP problems (10),(5)-(9).

Step 2: Apply each individual i from the population Pt on the validation set, and calculate G-meani and Qi.

(30)

NSGA-II Search for Comprehensible Rule

Ensemble

Step 3: Repeat the following steps for a given number of generations.

3.1: Use non-dominated sorting of all base classifiers into fronts.

3.2: Apply binary tournament selection, recombination and mutation on vectors composed of Λi and Ti

max to generate an offspring population P0

t of same size n from Pt.

3.3: Solve n ILP problems (10),(5)-(9) for different samples AR i , and vectors Λi and values of Tmaxi corresponding to Pt0, and evaluate the resulting base classifiers in the way of Step 2.

(31)

NSGA-II Search for Comprehensible Rule

Ensemble

3.4: Merge Pt and Pt0 into Rt, and perform non-dominated sorting of Rt. The sorting of individuals within each front is done according to the decreasing crowding distance, with extreme individuals sorted at the top.

3.5: Create new population Pt+1 by picking up the first n vectors Λi= [λi1, λi2, λi3], and Tmaxi from Rt.

3.6: Increment the generation counter t + 1 → t.

Step 4: Take the population of base classifiers from the last generation to the ensemble.

(32)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

6 Conclusions

(33)

Data Sets

Table:Characteristics of data sets used in experiment

data set objects attributes

1 arrhythmia-b 452 558 2 Australian 690 14 3 bank-g 1411 16 4 GermanCredit 1000 20 5 denbosch 119 8 6 Glaucoma 177 40 7 housing-b 506 13 8 windsor-b 546 10

(34)

Changes of G-mean in NSGA-II Generations

G-mean of the current ensemble on the validation set

(35)
(36)

Changes of G-mean in NSGA-II Generations

G-mean of the current ensemble on the validation set

(37)
(38)

Changes of G-mean in NSGA-II Generations

G-mean of the current ensemble on the validation set

(39)

Mean Number of Rules

Table:Mean number of rules composing comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 28 79 59.1 32 32.3 2 Australian 30 96 79.9 57 58.2 3 bank-g 25 60 46.5 30 29.6 4 GermanCredit 31 212 166 130 135 5 denbosch 8 11 9.4 8 7.54 6 Glaucoma 22 37 29.2 19 18.4 7 housing-b 19 42 30.8 19 18.7

(40)

Mean Number of Conditions

Table:Mean number of conditions in rules composing comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 1.79 1.92 1.74 2.22 2.14 2 Australian 3.27 3.06 3.01 3.26 3.39 3 bank-g 2.44 2.73 2.46 2.63 2.62 4 GermanCredit 3.1 2.88 2.66 3.21 3.14 5 denbosch 2.25 2.36 2.12 2.5 2.39 6 Glaucoma 2.23 2.3 1.99 2.16 2.29 7 housing-b 2.53 2.52 2.32 2.42 2.46 8 windsor-b 2.77 2.83 2.42 2.84 2.84 26/33

(41)

Mean Support

Table:Mean support of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 0.0466 0.0314 0.0342 0.0605 0.0604 2 Australian 0.0466 0.0343 0.0344 0.0415 0.0442 3 bank-g 0.177 0.186 0.154 0.166 0.181 4 GermanCredit 0.0208 0.0114 0.0108 0.0162 0.0155 5 denbosch 0.236 0.209 0.212 0.29 0.286 6 Glaucoma 0.0686 0.0536 0.0564 0.0865 0.089 7 housing-b 0.183 0.0928 0.135 0.175 0.186

(42)

Mean Anti-support

Table:Mean anti-support of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 0.00344 0.000126 0.00296 0.0054 0.00525 2 Australian 0.00365 0.00152 0.00235 0.00315 0.00316 3 bank-g 0.000988 0.000709 0.000796 0.00115 0.00133 4 GermanCredit 0.00107 0.000312 0.000869 0.000717 0.000661 5 denbosch 0.0127 0.0046 0.00534 0.0142 0.0135 6 Glaucoma 0.00924 0 0.00531 0.0107 0.00879 7 housing-b 0.00469 0.000565 0.00196 0.00515 0.00482 8 windsor-b 0.000747 0.0063 0.00248 0.00119 0.00108 28/33

(43)

Mean Bayesian Confirmation s

Table:Mean confirmation of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 0.471 0.501 0.424 0.446 0.451 2 Australian 0.417 0.517 0.437 0.444 0.444 3 bank-g 0.484 0.584 0.574 0.624 0.608 4 GermanCredit 0.467 0.478 0.403 0.491 0.49 5 denbosch 0.616 0.646 0.617 0.668 0.665 6 Glaucoma 0.401 0.528 0.447 0.453 0.466 7 housing-b 0.529 0.531 0.524 0.562 0.552

(44)

G-mean

Table:G-mean [%] of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 55.8 79.1 80.9 71.9 75.4 2 Australian 63.6 75.6 74.8 77.8 80.5 3 bank-g 81.7 80.4 89.2 85.2 88.2 4 GermanCredit 30.4 59.2 61.6 62.1 63.2 5 denbosch 87.2 82.2 84.9 89.9 87.5 6 Glaucoma 57.4 72.6 76.1 60.9 69.4 7 housing-b 82.1 83.8 87.8 80.5 83.1 8 windsor-b 58.5 62.6 66.3 64 63.2 30/33

(45)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

(46)

Conclusions

The ensemble of rule classifiers is obtained by solving a series of n ILP problems with the objective of minimal number of rules covering all objects from a training sample, augmented by a regularization component.

Regularization component of the ILP objective includes weighted total support, anti-support and Bayesian confirmation of rules entering the rule classifier.

The parameters of ILP (weights of regularization component and allowed number of times the rules cover a single training object) are tuned in an external loop, where predictive accuracy and diversity of rule classifiers are maximized using an evolutionary bi-objective optimization procedure of the NSGA-II type on a validation set.

(47)

Conclusions

In result one gets a population of n rule classifiers which, compared to traditional minimal-cover rule classifiers, have significantly smaller number of rules per classifier, and a higher mean support and Bayesian confirmation, while ensuring a good predictive accuracy of the ensemble they form.

Future work will concern generalization to multi-class classification and extension of computational experiments.

(48)

Cytaty

Powiązane dokumenty

 timber from forests industry (e.g. sawdust, tips of trees and branches and bark). Biomass combustion is less polluting than conventional methods of energy obtaining, such

Przypadł mi w udziale wielki zaszczyt promowania do tytułu doktora honoris causa Uniwersytetu Łódzkiego Pana Profesora Władysława Bartoszewskiego, wybitnego męża stanu,

34 Charakterystycznym przykładem tego typu w olnom yślicielstw a jest np.. System at nader dogodny dla zbrodniarzy i wszelkiego rodzaju przestępców, z których każdy

the nominal state, there is only one position with the value of W 2 [ 460,650 ] exceeding 0.3; there are two positions in both the degraded and repaired states: the first peak

Książkę tę da się czytać jako przyczynek do większej całości, którą można by zatytułować: Rzeczywistość przedstawiona w literaturze PRL, lub jako akademicką

Zaproponowano weryfikację bilansu zasobów polskich złóż węgla brunatnego i wy- kreślenie tych złóż, których eksploatacja ze względu na zagospodarowania powierzchni i

RuleFit (by Friedman and Popescu): based on Forward Stagewise Additive Modeling; the decision trees are used as base classifiers, and then each node (interior and terminal) of

The paper presents a computer system used for the storage and processing of information regarding the rock properties of shale formations, designed collectively by the Oil and