ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwo ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie Prawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopo

(1)

Probabilistic Intervals of Conﬁdence Interpretation of Adaptive Models

Norbert Jankowski

Department of Computer Methods, Nicholas Copernicus University

ul. Grudziądzka 5, 87–100 Toruń, Poland, phone: +48 56 6113307, fax: +48 56 621543

e-mail: Norbert.Jankowski@phys.uni.torun.pl, http://www.phys.uni.torun.pl/˜norbert

(2)

What is the goal?

• High accuracyshould not be the onlygoal of classiﬁcation

• Important are also: alternatives diagnoses and their probability, evaluation of conﬁdence

• Neural models — just the winner class — theywork as black boxes.

Probabilistic Conﬁdence Intervals helps to:

• evaluate the certainty of the winning class and the importance of alternative classes

• compare the inﬂuence of each feature in classiﬁcation of a given case, showing changes of the probabilityof all important classes

• visualize the class memberships of a given case and its neighborhood

(3)

Disadvantages of (crisp) logical rules

• Rules assign a given case to a class without anygradation which could give information on uncertaintyof such classiﬁcation

• Rules conditions use hyper-rectangular membership function and therefore shape of their decision borders are verylimited

• Because of rectangular shapes rules maynot cover the whole input space, leaving subspaces in which no classiﬁcation is done

• Rules mayalso overlap producing ambiguous classiﬁcation

• Logical rules are not reliable near decision borders

(4)

Incremental Network

(x, y ¹ ) IncNet 1

(x, y) .. . .. . Decision Module

C ¹ (x), . . . , C ^K (x)

C(x)

(x, y ^K ) IncNet K

Winning class:

C(x) = arg max

i C ⁱ (x)

Probability:

p(C ⁱ |x) = σ(C ⁱ (x) − ¹ ₂ )

K

j=1 σ(C ^j (x) − ¹ ₂ )

The IncNet network was used because of its good performance — network

structure is controlled bygrowing and pruning criterion to keep complexityof

network similar to the complexityof data.

(5)

Conﬁdence Intervals (CI)

Conﬁdence intervals — calculated individuallyfor a given input vector while

Logical rules are extracted for the whole training set.

In general such probabilitymaybe estimated byanytrustworthymodel.

Suppose that for a given vector x = [x ₁ , x ₂ , . . . , x _N ] the highest probability p(C ^k |x; M) is found for class k.

The conﬁdence interval [x ^r _min , x ^r _max ] for the feature r is deﬁned by x ^r _min = min

¯

x {C(¯x) = k ∧ ∀ ^x r >ˆ x>¯ x C(ˆ x) = k } (1) x ^r _max = max

¯

x {C(¯x) = k ∧ ∀ ^x r <ˆ x<¯ x C(ˆ x) = k } (2) where

¯

x = [x ₁ , . . . , x _r ₋₁ , ¯ x, x _r+1 , . . . , x _N ], x = [x ˆ ₁ , . . . , x _r ₋₁ , ˆ x, x _r+1 , . . . , x _N ] (3)

(6)

Conﬁdence intervals for a given vector x measure maximal deviation from the value x r , assuming all other feature values unchanged, that do not change

classiﬁcation of the vector.

Intervals with conﬁdence level

should guarantee that the winning class k is considerablymore probable than the most probable alternative class:

x ^r,β _min = min

¯ x

C(¯ x) = k ∧ ∀ ^x r >ˆ x>¯ x C(ˆ x) = k ∧ p(C ^k |¯x)

max _i _=k p(C ⁱ |¯x) > β

(4)

x ^r,β _max = max

¯ x

C(¯ x) = k ∧ ∀ ^x r <ˆ x<¯ x C(ˆ x) = k ∧ p(C ^k |¯x)

max _i _=k p(C ⁱ |¯x) > β

(5)

(7)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

1. "Na to trudno mi odpowiedzieæ"

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

2. Ocena stopnia szczeroœci osób badanych

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

3. Wykrywanie nietypowych i dewiacyjnych sposobów odpowiadania

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

4. Wykrywanie subtelniejszych prób zafa³szowania profilu

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

5. Hipochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

6. Depresja

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

7. Histeria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

8. Psychopatia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

9. Mêskoœæ

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

10. Paranoja

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

11. Psychastenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

12. Schizofrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

13. Mania

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobieñstwo

14. Introwersja spo³eczna

Figure 1: Reactive Psychosis.

(8)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

Feature value

Probability

1. Assessment of degree of sincerity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

2. Detection of atypical and deviational answering style

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

3. Detection of subtle trials of profile falsifing

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

4. Hypochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

5. Depression

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

6. Hysteria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

7. Psychopathy

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

8. Masculinity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

9. Paranoia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

10. Psychasthenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

11. Schizophrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

12. Manic

Figure 2: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).

(9)

Probabilistic Intervals of Conﬁdence (PIC)

For given vector x and feature r:

Class Probability. # class winner p(C(x) |¯x(z)) C(x)

alternative I p(C ^k ² |¯x(z)) k 2 = arg max i {p(C ⁱ |x), C ⁱ = C(x)}

alternative II p(C ^k ^M |¯x(z)) k M = arg max i {p(C ⁱ |¯x(z)), C ⁱ = C(x)}

x(z) = [x ¯ 1 , . . . , x r −1 , z, x r+1 , . . . , x N ]

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobie ñ stwo

11. Psychastenia

(10)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

psychopathy manic state

schizophrenia

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy neurosis

Feature value

Probability

4. Hypochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

5. Depression

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

6. Hysteria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm psychopathy

Feature value

Probability

7. Psychopathy

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

8. Masculinity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

Feature value

Probability

9. Paranoia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

neurosis

Feature value

Probability

10. Psychasthenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy

schizophrenia psychopathy

Feature value

Probability

11. Schizophrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

psychopathy narcomania

Feature value

Probability

12. Manic

Figure 3: Class: Psychopathy (prob. 0.97); alternative class: neurosis (prob. 0.002).

(11)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

neurosis

organic

neurosis

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia neurosis

organicschizophrenia

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

organic

deviational answering style 1

organic

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia organic

neurosis

Feature value

Probability

4. Hypochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

neurosis

Feature value

Probability

5. Depression

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia Feature value

Probability

6. Hysteria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

organicschizophrenia

organic

Feature value

Probability

7. Psychopathy

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

organic

Feature value

Probability

8. Masculinity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

neurosis organic

simulation

organic

Feature value

Probability

9. Paranoia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

neurosis organic

schizophrenia Feature value

Probability

10. Psychasthenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

neurosis criminality

organic

neurosis

Feature value

Probability

11. Schizophrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

neurosis organic

schizophrenia

Feature value

Probability

12. Manic

Figure 4: Organic (0.83), schizophrenia (0.062)

(12)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

schizophrenia

paranoia schizophrenia

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia paranoia

schizophrenia criminalityschizophrenia

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia paranoiaschizophrenia

neurosis

Feature value

Probability

4. Hypochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

Feature value

Probability

5. Depression

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

Feature value

Probability

6. Hysteria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Feature value

Probability

7. Psychopathy

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

paranoia

schizophrenia

Feature value

Probability

8. Masculinity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

paranoia criminality schizophrenia

Feature value

Probability

9. Paranoia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia paranoiaschizophrenia

Feature value

Probability

10. Psychasthenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

criminality schizophrenia

paranoia

Feature value

Probability

11. Schizophrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

schizophrenia

Feature value

Probability

12. Manic

Figure 5: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).

(13)

0 20 40 60 80 100 120 0

0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm alcoholism

Feature value

Probability

4. Hypochondria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1 norm

Feature value

Probability

5. Depression

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

6. Hysteria

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

psychopathy

Feature value

Probability

7. Psychopathy

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

8. Masculinity

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

criminality

Feature value

Probability

9. Paranoia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

10. Psychasthenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm schizophrenia

simulation

Feature value

Probability

11. Schizophrenia

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

norm

Feature value

Probability

12. Manic

Figure 6: Class: Norm (prob. 0.97); non alternative class.

(14)

Description of previous pictures

Figures 3, 4, 5 and 6 show probabilistic intervals of confidence for two quite different patients (the first and the last scale has been omitted, therefore only12 features are displayed). Little squares show the probabilityof the winning class corresponding to the measured input values of the psychometric scales. Figure 3 presents an easycase: the psychopathyhas a large probability0.97 and the case is quite far from anyother alternative classes. The whole range of values, 0-120, is shown and an alternative class appears for features 1, 4, 7 and 12, but the confidence intervals are quite broad.

Classiﬁcation does not depend on the precise values of some features r (for example features 2, 3, 5, 6, etc) since there are no alternative classes in the whole range of values ¯ x maytake.

The second set of plots, Fig. 4, is more complex. The winner class, organic, has probability0.83 while the alternative class, schizophrenia has probability0.06. The analysis of plots shows that the values for scales 4 and 7 are close to the border and therefore both diagnoses are probable, and scales 4 & 7 are veryimportant for diagnosis. Note that classiﬁcation is not so simple although the probabilityis 0.83, because considered case lies so close the border of feature 4.

Case on Figure 5 is ambiguous too. The winner class, paranoia, has probability0.68 while the alternative class, schizophrenia has probability0.28. The analysis of plots shows that the values for scales 7 and 11 are close to the border and therefore both diagnoses are probable, and scales 7 & 11 are crucial for considered case.

Figure 6 describe typical case which belong to the ”norm” class.

(15)

Psychometric data classiﬁcation

• Psychometric test: Minnesota Multiphasic Personality Inventory

• Test consist from over 550 questions

• 550 questions ➠ 14 features (control and clinic)

hypochondria, depression, hysteria, psychopathy, masculinity, paranoia, psychasthenia, schizophrenia, manic, social introversion

• 20, 27 or 28 nosological types (classes)

norm, neurosis, psychopathy, organic, schizophrenia, delusion, reactive psychosis, paranoia, manic state, criminality, alcoholism, etc.

• CV10 accuracytraining with IncNet network is 93% (CV5 – 95.5%).

(16)

Conclusions

• PIC are new and veryuseful tools to support the process of diagnosis

• Information on winner and alternative classes is continuous and veryprecise

• Conﬁdence interval shows neighboring alternative classes (if theyexist)

• The distance from the case considered to decision borders maybe analyzed in this way

• Analysis of complex cases, which often lie near the decision border, is much more reliable using probabilistic conﬁdence intervals than logical rules

ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwo ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie Prawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopodobie ñ stwoPrawdopo

Probabilistic Intervals of Conﬁdence Interpretation of Adaptive Models

Norbert Jankowski

Department of Computer Methods, Nicholas Copernicus University

ul. Grudziądzka 5, 87–100 Toruń, Poland, phone: +48 56 6113307, fax: +48 56 621543

e-mail: Norbert.Jankowski@phys.uni.torun.pl, http://www.phys.uni.torun.pl/˜norbert

What is the goal?

• High accuracyshould not be the onlygoal of classiﬁcation

• Important are also: alternatives diagnoses and their probability, evaluation of conﬁdence

• Neural models — just the winner class — theywork as black boxes.

Probabilistic Conﬁdence Intervals helps to:

• evaluate the certainty of the winning class and the importance of alternative classes

• compare the inﬂuence of each feature in classiﬁcation of a given case, showing changes of the probabilityof all important classes

• visualize the class memberships of a given case and its neighborhood

Disadvantages of (crisp) logical rules

• Rules assign a given case to a class without anygradation which could give information on uncertaintyof such classiﬁcation

• Rules conditions use hyper-rectangular membership function and therefore shape of their decision borders are verylimited

• Because of rectangular shapes rules maynot cover the whole input space, leaving subspaces in which no classiﬁcation is done

• Rules mayalso overlap producing ambiguous classiﬁcation

• Logical rules are not reliable near decision borders

Incremental Network

(x, y 1 ) IncNet 1

(x, y) .. . .. . Decision Module

C 1 (x), . . . , C K (x)

C(x)

(x, y K ) IncNet K

Winning class:

C(x) = arg max

i C i (x)

Probability:

p(C i |x) = σ(C i (x) − 1 2 )

K

j=1 σ(C j (x) − 1 2 )

The IncNet network was used because of its good performance — network

structure is controlled bygrowing and pruning criterion to keep complexityof

network similar to the complexityof data.

Conﬁdence Intervals (CI)

 Conﬁdence intervals — calculated individuallyfor a given input vector while

 Logical rules are extracted for the whole training set.

 In general such probabilitymaybe estimated byanytrustworthymodel.

Suppose that for a given vector x = [x 1 , x 2 , . . . , x N ] the highest probability p(C k |x; M) is found for class k.

The conﬁdence interval [x r min , x r max ] for the feature r is deﬁned by x r min = min

¯

x {C(¯x) = k ∧ ∀ x r >ˆ x>¯ x C(ˆ x) = k } (1) x r max = max

¯

x {C(¯x) = k ∧ ∀ x r <ˆ x<¯ x C(ˆ x) = k } (2) where

¯

x = [x 1 , . . . , x r −1 , ¯ x, x r+1 , . . . , x N ], x = [x ˆ 1 , . . . , x r −1 , ˆ x, x r+1 , . . . , x N ] (3)

Conﬁdence intervals for a given vector x measure maximal deviation from the value x r , assuming all other feature values unchanged, that do not change

classiﬁcation of the vector.

Intervals with conﬁdence level

should guarantee that the winning class k is considerablymore probable than the most probable alternative class:

x r,β min = min

¯ x

C(¯ x) = k ∧ ∀ x r >ˆ x>¯ x C(ˆ x) = k ∧ p(C k |¯x)

max i =k p(C i |¯x) > β

(4)

x r,β max = max

¯ x

C(¯ x) = k ∧ ∀ x r <ˆ x<¯ x C(ˆ x) = k ∧ p(C k |¯x)

max i =k p(C i |¯x) > β

(5)

Figure 1: Reactive Psychosis.

Figure 2: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).

Probabilistic Intervals of Conﬁdence (PIC)

For given vector x and feature r:

Class Probability. # class winner p(C(x) |¯x(z)) C(x)

alternative I p(C k 2 |¯x(z)) k 2 = arg max i {p(C i |x), C i = C(x)}

alternative II p(C k M |¯x(z)) k M = arg max i {p(C i |¯x(z)), C i = C(x)}

x(z) = [x ¯ 1 , . . . , x r −1 , z, x r+1 , . . . , x N ]

0 20 40 60 80 100 120

0 0.2 0.4 0.6 0.8 1

Wartoœæ cechy

Prawdopodobie ñ stwo

11. Psychastenia

Figure 3: Class: Psychopathy (prob. 0.97); alternative class: neurosis (prob. 0.002).

Figure 4: Organic (0.83), schizophrenia (0.062)

Figure 5: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).

Figure 6: Class: Norm (prob. 0.97); non alternative class.

Description of previous pictures

(x, y ¹ ) IncNet 1

C ¹ (x), . . . , C ^K (x)

(x, y ^K ) IncNet K

i C ⁱ (x)

p(C ⁱ |x) = σ(C ⁱ (x) − ¹ ₂ )

j=1 σ(C ^j (x) − ¹ ₂ )

Conﬁdence intervals — calculated individuallyfor a given input vector while

Logical rules are extracted for the whole training set.

In general such probabilitymaybe estimated byanytrustworthymodel.

Suppose that for a given vector x = [x ₁ , x ₂ , . . . , x _N ] the highest probability p(C ^k |x; M) is found for class k.

The conﬁdence interval [x ^r _min , x ^r _max ] for the feature r is deﬁned by x ^r _min = min

x {C(¯x) = k ∧ ∀ ^x r >ˆ x>¯ x C(ˆ x) = k } (1) x ^r _max = max

x {C(¯x) = k ∧ ∀ ^x r <ˆ x<¯ x C(ˆ x) = k } (2) where

x = [x ₁ , . . . , x _r ₋₁ , ¯ x, x _r+1 , . . . , x _N ], x = [x ˆ ₁ , . . . , x _r ₋₁ , ˆ x, x _r+1 , . . . , x _N ] (3)

x ^r,β _min = min

C(¯ x) = k ∧ ∀ ^x r >ˆ x>¯ x C(ˆ x) = k ∧ p(C ^k |¯x)

max _i _=k p(C ⁱ |¯x) > β

x ^r,β _max = max

C(¯ x) = k ∧ ∀ ^x r <ˆ x<¯ x C(ˆ x) = k ∧ p(C ^k |¯x)

max _i _=k p(C ⁱ |¯x) > β

alternative I p(C ^k ² |¯x(z)) k 2 = arg max i {p(C ⁱ |x), C ⁱ = C(x)}

alternative II p(C ^k ^M |¯x(z)) k M = arg max i {p(C ⁱ |¯x(z)), C ⁱ = C(x)}