Probabilistic Intervals of Confidence Interpretation of Adaptive Models
Norbert Jankowski
Department of Computer Methods, Nicholas Copernicus University
ul. Grudziądzka 5, 87–100 Toruń, Poland, phone: +48 56 6113307, fax: +48 56 621543
e-mail: [email protected], http://www.phys.uni.torun.pl/˜norbert
What is the goal?
• High accuracyshould not be the onlygoal of classification
• Important are also: alternatives diagnoses and their probability, evaluation of confidence
• Neural models — just the winner class — theywork as black boxes.
Probabilistic Confidence Intervals helps to:
• evaluate the certainty of the winning class and the importance of alternative classes
• compare the influence of each feature in classification of a given case, showing changes of the probabilityof all important classes
• visualize the class memberships of a given case and its neighborhood
Disadvantages of (crisp) logical rules
• Rules assign a given case to a class without anygradation which could give information on uncertaintyof such classification
• Rules conditions use hyper-rectangular membership function and therefore shape of their decision borders are verylimited
• Because of rectangular shapes rules maynot cover the whole input space, leaving subspaces in which no classification is done
• Rules mayalso overlap producing ambiguous classification
• Logical rules are not reliable near decision borders
Incremental Network
(x, y 1 ) IncNet 1
(x, y) .. . .. . Decision Module
C 1 (x), . . . , C K (x)
C(x)
(x, y K ) IncNet K
Winning class:
C(x) = arg max
i C i (x)
Probability:
p(C i |x) = σ(C i (x) − 1 2 )
K
j=1 σ(C j (x) − 1 2 )
The IncNet network was used because of its good performance — network
structure is controlled bygrowing and pruning criterion to keep complexityof
network similar to the complexityof data.
Confidence Intervals (CI)
Confidence intervals — calculated individuallyfor a given input vector while
Logical rules are extracted for the whole training set.
In general such probabilitymaybe estimated byanytrustworthymodel.
Suppose that for a given vector x = [x 1 , x 2 , . . . , x N ] the highest probability p(C k |x; M) is found for class k.
The confidence interval [x r min , x r max ] for the feature r is defined by x r min = min
¯
x {C(¯x) = k ∧ ∀ x r >ˆ x>¯ x C(ˆ x) = k } (1) x r max = max
¯
x {C(¯x) = k ∧ ∀ x r <ˆ x<¯ x C(ˆ x) = k } (2) where
¯
x = [x 1 , . . . , x r −1 , ¯ x, x r+1 , . . . , x N ], x = [x ˆ 1 , . . . , x r −1 , ˆ x, x r+1 , . . . , x N ] (3)
Confidence intervals for a given vector x measure maximal deviation from the value x r , assuming all other feature values unchanged, that do not change
classification of the vector.
Intervals with confidence level
should guarantee that the winning class k is considerablymore probable than the most probable alternative class:
x r,β min = min
¯ x
C(¯ x) = k ∧ ∀ x r >ˆ x>¯ x C(ˆ x) = k ∧ p(C k |¯x)
max i =k p(C i |¯x) > β
(4)
x r,β max = max
¯ x
C(¯ x) = k ∧ ∀ x r <ˆ x<¯ x C(ˆ x) = k ∧ p(C k |¯x)
max i =k p(C i |¯x) > β
(5)
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
1. "Na to trudno mi odpowiedzieæ"
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
2. Ocena stopnia szczeroœci osób badanych
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
3. Wykrywanie nietypowych i dewiacyjnych sposobów odpowiadania
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
4. Wykrywanie subtelniejszych prób zafa³szowania profilu
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
5. Hipochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
6. Depresja
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
7. Histeria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
8. Psychopatia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
9. Mêskoœæ
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
10. Paranoja
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
11. Psychastenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
12. Schizofrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
13. Mania
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobieñstwo
14. Introwersja spo³eczna
Figure 1: Reactive Psychosis.
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
Feature value
Probability
1. Assessment of degree of sincerity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
2. Detection of atypical and deviational answering style
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
3. Detection of subtle trials of profile falsifing
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
4. Hypochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
5. Depression
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
6. Hysteria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
7. Psychopathy
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
8. Masculinity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
9. Paranoia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
10. Psychasthenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
11. Schizophrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
Feature value
Probability
12. Manic
Figure 2: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).
Probabilistic Intervals of Confidence (PIC)
For given vector x and feature r:
Class Probability. # class winner p(C(x) |¯x(z)) C(x)
alternative I p(C k 2 |¯x(z)) k 2 = arg max i {p(C i |x), C i = C(x)}
alternative II p(C k M |¯x(z)) k M = arg max i {p(C i |¯x(z)), C i = C(x)}
x(z) = [x ¯ 1 , . . . , x r −1 , z, x r+1 , . . . , x N ]
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
WartoϾ cechy
Prawdopodobie ñ stwo
11. Psychastenia
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
psychopathy manic state
schizophrenia
Feature value
Probability
1. Assessment of degree of sincerity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
2. Detection of atypical and deviational answering style
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
3. Detection of subtle trials of profile falsifing
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy neurosis
Feature value
Probability
4. Hypochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
5. Depression
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
6. Hysteria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm psychopathy
Feature value
Probability
7. Psychopathy
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
8. Masculinity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
Feature value
Probability
9. Paranoia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
neurosis
Feature value
Probability
10. Psychasthenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy
schizophrenia psychopathy
Feature value
Probability
11. Schizophrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
psychopathy narcomania
Feature value
Probability
12. Manic
Figure 3: Class: Psychopathy (prob. 0.97); alternative class: neurosis (prob. 0.002).
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
neurosis
organic
neurosis
Feature value
Probability
1. Assessment of degree of sincerity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia neurosis
organicschizophrenia
Feature value
Probability
2. Detection of atypical and deviational answering style
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
organic
deviational answering style 1
organic
Feature value
Probability
3. Detection of subtle trials of profile falsifing
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia organic
neurosis
Feature value
Probability
4. Hypochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia neurosis
schizophrenia organic
neurosis
Feature value
Probability
5. Depression
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia organic
schizophrenia neurosis
schizophrenia Feature value
Probability
6. Hysteria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
organicschizophrenia
organic
Feature value
Probability
7. Psychopathy
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
organic
Feature value
Probability
8. Masculinity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia
neurosis organic
simulation
organic
Feature value
Probability
9. Paranoia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia organic
neurosis organic
schizophrenia Feature value
Probability
10. Psychasthenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
neurosis criminality
organic
neurosis
Feature value
Probability
11. Schizophrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia organic
neurosis organic
schizophrenia
Feature value
Probability
12. Manic
Figure 4: Organic (0.83), schizophrenia (0.062)
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
schizophrenia
paranoia schizophrenia
Feature value
Probability
1. Assessment of degree of sincerity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoia
schizophrenia criminalityschizophrenia
paranoia schizophrenia
Feature value
Probability
2. Detection of atypical and deviational answering style
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia
paranoia schizophrenia
Feature value
Probability
3. Detection of subtle trials of profile falsifing
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoiaschizophrenia
neurosis
Feature value
Probability
4. Hypochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoia
schizophrenia
Feature value
Probability
5. Depression
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoia
schizophrenia
Feature value
Probability
6. Hysteria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoia
Feature value
Probability
7. Psychopathy
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia
paranoia
schizophrenia
Feature value
Probability
8. Masculinity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia
paranoia criminality schizophrenia
Feature value
Probability
9. Paranoia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoiaschizophrenia
Feature value
Probability
10. Psychasthenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
criminality schizophrenia
paranoia
Feature value
Probability
11. Schizophrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
schizophrenia paranoia
schizophrenia paranoia
schizophrenia paranoia
schizophrenia
Feature value
Probability
12. Manic
Figure 5: Class: Paranoia (prob. 0.68); alternative class: schizophrenia (prob. 0.28).
0 20 40 60 80 100 120 0
0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
1. Assessment of degree of sincerity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
2. Detection of atypical and deviational answering style
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
3. Detection of subtle trials of profile falsifing
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm alcoholism
Feature value
Probability
4. Hypochondria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1 norm
Feature value
Probability
5. Depression
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
6. Hysteria
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
psychopathy
Feature value
Probability
7. Psychopathy
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
8. Masculinity
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
criminality
Feature value
Probability
9. Paranoia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
10. Psychasthenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm schizophrenia
simulation
Feature value
Probability
11. Schizophrenia
0 20 40 60 80 100 120
0 0.2 0.4 0.6 0.8 1
norm
Feature value
Probability
12. Manic