Multi-Objective Search for Comprehensible Rule Ensembles

(1)

Multi-Objective Search for Comprehensible

Rule Ensembles

Jerzy Błaszczyński Bartosz Prusak Roman Słowiński

Poznań University of Technology, Institute of Computing Science, Piotrowo 2, 60-965 Poznań, Poland

(2)

1 Introduction

2 Proposed Methodology for Constructing Comprehensible Ensemble Classifier

3 Finding Population of Comprehensible Sets of Rules

4 Evolutionary Bi-Objective Search for Comprehensible Ensemble Classifier

5 Experiments

6 Conclusions

(3)

Motivation

Take an ensemble rule model and make it comprehensible

while maintaining its predictive performance.

Use a combination of ILP (to find rule classifiers with desirable properties wrt size, support, ani-support and confirmation) and evolutionary optimization (to get an accurate and diversified ensemble).

(4)

Decision Rule Model

Decision rulesare known to be a simple and comprehensible representation of knowledge:

if conditions then decision (prediction).

Condition part of a rule is composed of elementary conditions.

Decision rule model is a set ofminimalrules that cover the whole training set.

Decision rule model is sometimes called aglass-box classifier.

(5)

Why Rule Ensemble Model?

Single decision rule model is unstable when it is induced by a heuristic (such as sequential covering, like VC-DomLEM1).

Unstability of rule model have consequences from both predictive as well as interpretability perspectives.

(6)

Why Rule Ensemble Model?

Standard way to improve predictive performance of an unstable model is to construct an ensemble(such as VC-bagging2,3, which does better than standard bagging).

Rule models that compose the ensemble are called

base classifiers.

2

J. Błaszczyński, R. Słowiński, J. Stefanowski, Variable consistency bagging ensembles, Trans. Rough Sets, XI (LNCS 5946), 40–52 (2010)

3

J. Błaszczyński, R. Słowiński, J. Stefanowski, Ordinal classification with monotonicity constraints by variable consistency bagging, RSCTC 2010, LNCS 6086, Springer, Berlin, 392–401 (2010)

(7)

Interpretability of Ensemble Rule Models

Record of a patient who needs diagnosis

Age Gender FLHAEM GAT . . .

(8)

Interpretability of Ensemble Rule Models

Diagnosis

Risk of glaucoma with 100 % of certainty

(9)

Interpretability of Ensemble Rule Models

Rules that support the diagnosis

49 classifiers: if f lare haemorrhage then Glaucoma, 31 classifiers: if GAT ≥ 21 then Glaucoma,

(10)

Interpretability of Ensemble Rule Models

Rules that support the diagnosis

6 classifiers:

if cumulative HRBP in 5 hours before sleep to sleep ≤ 1447 and slope of DAP in wake to 5 hours after wake ≤ −0.067

then Glaucoma, 2 classifiers:

if intercept of T F in wake to 5 hours after wake ≤ 23.23 and cumulative T F GAT in wake to 5 hours after wake ≥ 568.2 then Glaucoma,

. . .

(11)

1 Introduction

5 Experiments

(12)

Comprehensible Classifier

Constructcomprehensible ensembles of rule classifiers, such that:

base classifiers will be composed of minimal setsofstrong

andconfirmatory rules covering a high percentage of the training set,

base classifiers will be maximally diversified within the ensemble, while the whole ensemble will be maximally

accurate.

The most accurate base classifier =comprehensible single classifier

(13)

Proposed Methodology

The methodology is composed of three elements performed on a data set divided into training and validation sets.

First element

a rule ensemble is constructed on the training set,

all rules that compose this ensemble are integrated as one initial set of rules.

(14)

Proposed Methodology

Second element

evolutionary bi-objective procedure NSGA-II4 is applied to evolve a population of comprehensible sets of rules covering all objects from training samples,

two considered objectives are accuracy of prediction and diversity, both estimated on the validation set.

4

Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.:, A fast and elitist multi-objective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6 (2), 182–197 (2002)

(15)

Proposed Methodology

Third element

members of the population are obtained in result of solving a series of ILP problems on the initial set of rules,

this iterative procedure leads to a population that constitutes the comprehensible ensemble classifier.

(16)

1 Introduction

5 Experiments

6 Conclusions

(17)

Problem definition

Rall is the initial set of all rules from the first element ensemble,

divide Rall into two subsets: R0all, and R1all, composed of rules assigning objects to class Cl0 _{or to class Cl}1_{, respectively,} we are searching for comprehensible sets of rules RM C⊂ Rall, RM C is divided, analogously, into R0M C ⊂ R0all, and

R1_{M C} ⊂ R1 all.

(18)

Notation

AR is a sample of training objects, where ai is the i-th training object,

for class Clj, j = 0, 1, rj_k is a rule belonging to set Rj_{M C}, A(r_kj) is a set of objects ai ∈ Clj covered by rule r_kj,

v(rj_k) ∈ {0, 1} is a binary variable taking value 1 when rule rj_k belongs to Rj_{M C}, and 0 otherwise,

Tmax is the maximum number of times any object from AR

may be covered by rules from RM C.

(19)

Multi-objective ILP problem

We can find a comprehensible set of rules by solving the following multi-objective integer linear programming (ILP) problem.

(20)

Multi-objective ILP problem

minimize f1= X r0 k∈R0all v(rk0) + X r1 k∈R1all v(r1k), (1) maximize f2 = X r0 k∈R0all v(rk0) × sup(r 0 k) + X r1 k∈R1all v(r1k) × sup(r 1 k), (2) or fˆ2= min r0 k∈R0all,r1k∈R1all v(r0 k) × sup(r 0 k), v(r 1 k) × sup(r 1 k) , minimize f3= X r0 k∈R0all v(r0k) × asup(r 0 k) + X r1 k∈R1all v(rk1) × asup(r 1 k), (3) or fˆ3= min r0 k∈R0all,r1k∈R1all v(r0 k) × asup(r 0 k), v(r 1 k) × asup(r 1 k) , maximize f4= X r0 k∈R0all v(r0k) × cf ir(r 0 k) + X r1 k∈R1all v(r1k) × cf ir(r 1 k), (4) or fˆ4= min r0 k∈R0all,rk1∈R1all v(r0 k) × cf ir(r 0 k), v(r 1 k) × cf ir(r 1 k) , 13/33

(21)

Multi-objective ILP problem

subject to the following constraints:

X r0 k:ai∈A(rk0) v(r_k0) ≥ 1 for all ai∈ Cl0⊂ AR (5) X r1 k:ai∈A(rk1) v(r_k1) ≥ 1 for all ai∈ Cl1⊂ AR (6) X r0 k:ai∈A(rk0)

v(r_k0) ≤ Tmax for all ai∈ Cl0⊂ AR (7)

X

r1 k:ai∈A(r

1 k)

v(rk1) ≤ Tmax for all ai∈ Cl1⊂ AR (8)

v(r_k0) ∈ {0, 1} for all r0_k ∈ R0 all, and v(r 1 k) ∈ {0, 1} for all r 1 k∈ R 1 all.

(22)

Single-objective ILP

Instead of performing a multi-objective optimization, at this stage, we aggregate all objectives into one goal function, which involves a kind ofregularizationof objective (1).

(23)

Single-objective ILP

minimize F = X r0 k∈R0all v(rk0) + X r1 k∈R1all v(r1k) (10) − λ1× X r0 k∈R0all v(rk0) × sup(r 0 k) + X r1 k∈R1all v(r1k) × sup(r 1 k) ! + λ2× X r0 k∈R0all v(rk0) × asup(r 0 k) + X r1 k∈R1all v(r1k) × asup(r 1 k) ! − λ3× X r0 k∈R0all v(rk0) × cf ir(r 0 k) + X r1 k∈R1all v(r1k) × cf ir(r 1 k) !

(24)

1 Introduction

5 Experiments

6 Conclusions

(25)

Connection with ILP Solver

To obtain a population of n comprehensible sets of rules, we solve a series of n ILP problems (10),(5)-(9) for n training samples AR, associated with n vectors Λ = {λ1, λ2, λ3} and values of Tmax.

AR is a random stratified subset of the training set of fixed size (e.g., 90%).

(26)

Objectives

Base classifier i resulting from solution of the i-th ILP problem with training sample AR_i , vector Λi= [λi₁, λi₂, λi₃] and T_maxi , is evaluated wrt two objectives:

1. Geometric mean of its sensitivity and specificity

G-meani = r T P T P + F N × T N T N + F P 19/33

(27)

Objectives

2. Yule’s Q statistic, transformed from the pairwise index

Qi,k = 1 −

N11N00− N01_N10

N11_N00_{+ N}01_N10, Qi,k ∈ [0, 2],

to the index expressing how diverse is base classifier i comparing to other classifiers in the population:

Qi = max

k {Qi,k} + α × Pn

k=1Qi,k

n ,

(28)

Bi-objective Optimization Problem

Construction of a comprehensible ensemble classifiers can be formulated as the following bi-objective optimization problem:

maximize {G-meani, Qi}

subject to ILP (10),(5)-(9), and λi₁, λi₂, λi₃ ∈ [0, 1], T_maxi ≥ 0, AR_i .

We adopt the elitist non-dominated sorting genetic algorithm NSGA-II4 _{to perform the optimization.}

4

Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.:, A fast and elitist multi-objective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6 (2), 182–197 (2002)

(29)

NSGA-II Search for Comprehensible Rule

Ensemble

Step 1: Generate an initial population Pt=0 of base classifiers for n randomly selected training samples AR

i of the same size, and n randomly chosen vectors Λi = [λi₁, λi₂, λi₃], and values of T_maxi =√N (i = 1, . . . , n), i.e., solve n ILP problems (10),(5)-(9).

Step 2: Apply each individual i from the population Pt on the validation set, and calculate G-meani and Qi.

(30)

NSGA-II Search for Comprehensible Rule

Ensemble

Step 3: Repeat the following steps for a given number of generations.

3.1: Use non-dominated sorting of all base classifiers into fronts.

3.2: Apply binary tournament selection, recombination and mutation on vectors composed of Λi _{and T}i

max to generate an offspring population P0

t of same size n from Pt.

3.3: Solve n ILP problems (10),(5)-(9) for different samples AR i , and vectors Λi and values of T_maxi corresponding to P_t0, and evaluate the resulting base classifiers in the way of Step 2.

(31)

NSGA-II Search for Comprehensible Rule

Ensemble

3.4: Merge Pt and Pt0 into Rt, and perform non-dominated sorting of Rt. The sorting of individuals within each front is done according to the decreasing crowding distance, with extreme individuals sorted at the top.

3.5: Create new population Pt+1 by picking up the first n vectors Λi= [λi₁, λi₂, λi₃], and T_maxi from R_t.

3.6: Increment the generation counter t + 1 → t.

Step 4: Take the population of base classifiers from the last generation to the ensemble.

(32)

1 Introduction

5 Experiments

6 Conclusions

(33)

Data Sets

Table:Characteristics of data sets used in experiment

data set objects attributes

1 arrhythmia-b 452 558 2 Australian 690 14 3 bank-g 1411 16 4 GermanCredit 1000 20 5 denbosch 119 8 6 Glaucoma 177 40 7 housing-b 506 13 8 windsor-b 546 10

(34)

Changes of G-mean in NSGA-II Generations

G-mean of the current ensemble on the validation set

(35)

(36)

Changes of G-mean in NSGA-II Generations

(37)

(38)

Changes of G-mean in NSGA-II Generations

(39)

Mean Number of Rules

Table:Mean number of rules composing comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

data set Rand SoEns Ens CompS CompEns

1 arrhythmia-b 28 79 59.1 32 32.3 2 Australian 30 96 79.9 57 58.2 3 bank-g 25 60 46.5 30 29.6 4 GermanCredit 31 212 166 130 135 5 denbosch 8 11 9.4 8 7.54 6 Glaucoma 22 37 29.2 19 18.4 7 housing-b 19 42 30.8 19 18.7

(40)

Mean Number of Conditions

Table:Mean number of conditions in rules composing comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

1 arrhythmia-b 1.79 1.92 1.74 2.22 2.14 2 Australian 3.27 3.06 3.01 3.26 3.39 3 bank-g 2.44 2.73 2.46 2.63 2.62 4 GermanCredit 3.1 2.88 2.66 3.21 3.14 5 denbosch 2.25 2.36 2.12 2.5 2.39 6 Glaucoma 2.23 2.3 1.99 2.16 2.29 7 housing-b 2.53 2.52 2.32 2.42 2.46 8 windsor-b 2.77 2.83 2.42 2.84 2.84 26/33

(41)

Mean Support

Table:Mean support of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

1 arrhythmia-b 0.0466 0.0314 0.0342 0.0605 0.0604 2 Australian 0.0466 0.0343 0.0344 0.0415 0.0442 3 bank-g 0.177 0.186 0.154 0.166 0.181 4 GermanCredit 0.0208 0.0114 0.0108 0.0162 0.0155 5 denbosch 0.236 0.209 0.212 0.29 0.286 6 Glaucoma 0.0686 0.0536 0.0564 0.0865 0.089 7 housing-b 0.183 0.0928 0.135 0.175 0.186

(42)

Mean Anti-support

Table:Mean anti-support of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

1 arrhythmia-b 0.00344 0.000126 0.00296 0.0054 0.00525 2 Australian 0.00365 0.00152 0.00235 0.00315 0.00316 3 bank-g 0.000988 0.000709 0.000796 0.00115 0.00133 4 GermanCredit 0.00107 0.000312 0.000869 0.000717 0.000661 5 denbosch 0.0127 0.0046 0.00534 0.0142 0.0135 6 Glaucoma 0.00924 0 0.00531 0.0107 0.00879 7 housing-b 0.00469 0.000565 0.00196 0.00515 0.00482 8 windsor-b 0.000747 0.0063 0.00248 0.00119 0.00108 28/33

(43)

Mean Bayesian Confirmation s

Table:Mean confirmation of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

1 arrhythmia-b 0.471 0.501 0.424 0.446 0.451 2 Australian 0.417 0.517 0.437 0.444 0.444 3 bank-g 0.484 0.584 0.574 0.624 0.608 4 GermanCredit 0.467 0.478 0.403 0.491 0.49 5 denbosch 0.616 0.646 0.617 0.668 0.665 6 Glaucoma 0.401 0.528 0.447 0.453 0.466 7 housing-b 0.529 0.531 0.524 0.562 0.552

(44)

G-mean

Table:G-mean [%] of comprehesible ensembles, single comprehensible rule classifiers and other compared solutions

1 arrhythmia-b 55.8 79.1 80.9 71.9 75.4 2 Australian 63.6 75.6 74.8 77.8 80.5 3 bank-g 81.7 80.4 89.2 85.2 88.2 4 GermanCredit 30.4 59.2 61.6 62.1 63.2 5 denbosch 87.2 82.2 84.9 89.9 87.5 6 Glaucoma 57.4 72.6 76.1 60.9 69.4 7 housing-b 82.1 83.8 87.8 80.5 83.1 8 windsor-b 58.5 62.6 66.3 64 63.2 30/33

(45)

1 Introduction

5 Experiments

(46)

Conclusions

The ensemble of rule classifiers is obtained by solving a series of n ILP problems with the objective of minimal number of rules covering all objects from a training sample, augmented by a regularization component.

Regularization component of the ILP objective includes weighted total support, anti-support and Bayesian confirmation of rules entering the rule classifier.

The parameters of ILP (weights of regularization component and allowed number of times the rules cover a single training object) are tuned in an external loop, where predictive accuracy and diversity of rule classifiers are maximized using an evolutionary bi-objective optimization procedure of the NSGA-II type on a validation set.

(47)

Conclusions

In result one gets a population of n rule classifiers which, compared to traditional minimal-cover rule classifiers, have significantly smaller number of rules per classifier, and a higher mean support and Bayesian confirmation, while ensuring a good predictive accuracy of the ensemble they form.

Future work will concern generalization to multi-class classification and extension of computational experiments.

(48)