The Use of Some Pattern Recognition Algorithms to Classify Patients Undergoing CABG

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FO LIA O E C O N O M IC A 175, 2004

M a ł g o r z a t a M i s z t a l *

T H E U S E OK S O M E P A T T E R N R E C O G N IT IO N A L G O R IT H M S T O C L A S S IF Y P A T IE N T S U N D E R G O IN G C A B G

Abstract. The primary goal o f pattern recognition is supervised or unsupcrvised classification in order to solve decision - m aking problem s. M edical diagnosis brings about m any practical problems, which m ay be interpreted as pattern recognition tasks. M aking d iagn osis o f a given patient means to solve a classification problem - we must recognize patient’s disease on the basis on som e sym ptom s.

The aim o f the article is to present the results o f using selected pattern recognition algorithm s to classify patients with Coronary Artery D isease undergoing Coronary Artery Bypass G rafting (C A BG ).

Key words: pattern recognition algorithm s, classification trees, coronary artery disease.

1. IN T R O D U C T IO N

A lth o u g h p a tte rn reco g n itio n covers a very bo ard spectrum o f problem s, roughly it consists in assignm ent o f a p attern o r o bject to one class from the finite set o f classes:

К = { I , 2 ... k } (1)

In decision - theo retical p a tte rn recognition a p p ro a c h we assum e th a t p a tte rn is represented by a vector o f num bers x = [x t , x 2, x p]T - the so - called featu res values, e.g. o btained by scanning o f an im age at selected grid po in ts o r directly from m easurem ents. T his p ro m p ts us to intro d u ce a p a tte rn o r feature space X with as m any dim en sio ns as the nu m b er o f featu res an d to th in k o f an object as one p o in t x in this space (x e X ).

Generally, pattern recognition can be defined as an inform ation - reduction process involving m easu rem en ts o f the object to identify distinguishing

(2)

attributes, extraction o f the features for the defining attribu tes and assignm ent o f the object to a class based on these features.

T h e p attern recognition algorithm (or classification o r decision rule) V/ m aps the featu re space in to a set o f class labels, th a t is:

^ : X — К (2)

or equivalently p a rtitio n s space X into the so - called decision regions contain in g p oints x w hich are assigned by algorithm 4/ to th e sam e class.

In statistical p a tte rn recognition we assum e th a t the vector o f features describing the recognized p a tte rn x e X e R p and its class n um b er i e K are observed values o f a couple o f random variables X an d I so the m ain problem is to find such a decision rule which m inim izes the expected value o f loss function ev alu atin g the loss in the case o f m isclassification.

U sually we are lack o f exact know ledge o f the p ro b ab ility d istrib u tio n o f the features an d classes. T h a t’s why we are m ad e to apply the only source o f in fo rm a tio n , which is the set o f o b serv atio n s (called th e lear ning set):

^ = { ( *1, l' t ) . ( x2- *2) . •••> (*/v> *аг)}; x j e X : * j e K ( 3 )

where Xj d enotes the fe atu re vector o f the ;-th learning p a tte rn an d ij is its correct classification. A ddition ally, the set o f learning p a tte rn s from the i-th class is d eno ted by:

U, = { x u e X , 1 = 1 , 2 , ..., N,}, i e К (4)

T h e recognition algorithm based on the learning set U will be deno ted as v v(x).

P attern recognition m eth o d s have m any practical app licatio n s. O ne o f them is m edical diagnosis w here the learning set consists o f case records contain in g a d escription o f p a tie n t’s sym ptom s and co rresp o n d in g reliable diagnosis.

2. D E S C R IP T IO N O F R E C O G N IZ E D P O P U L A T IO N

In th e D e p a rtm e n t o f C ard io th o ra cic Surgery o f Ł ó d ź M edical A cadem y the set o f 762 case reco rd s o f patien ts undergoing C A B G d u rin g 1997-1999 w as collected. T h e d a ta form 1997-1998 c o n stitu te d th e le a rn in g set (N U = 407) and from 1999 - the test set (N T = 355).

(3)

O utco m c after C A B G is determ ined by the p reo p erativ e statu s o f the patient so th a t 13 prcopcrativc risk factors leading to p ostoperative m o rbidity and m o rtality w ere identified:

1. Age (in years);

2. BSA - body surface area;

3. R R s - systolic blood pressure (in m m H g); 4. R R d - d iasto lic blood pressure (in m m H g); 5. E F % - left v en tricu lar ejection fraction (in % );

6. A spA t - a sp a rta te am in o tran sferase (in U /L ); 7. F am ily history o f C A D (0 - no; 1 - yes);

8. D iabetes m ellitus (0 - no; 1 - yes);

9. A O - arte rial o b stru ctio n (0 - no; 1 - yes); 10. Left m ain stenosis > 75% (0 - no; 1 - yes); 11. H yp erth y ro id ism (0 - no; 1 - yes);

12. Previous card iac surgery (0 - no; 1 - yes);

13. P riority o f o p e ra tio n (1 - elective; 2 - urgent; 3 - em ergent). T h e ou tco m e afte r C A B G includes follow ing tw o classes:

1. G ood outcom e with no cardiac com plications (NU 1 = 350; N 'l'l = 340); 2. C a rd ia c c o m p lic a tio n s (m y o card ial in fa rc tio n a n d /o r low c a rd ia c o u tp u t) and d ea th (N U 2 = 57; N T 2 = 15).

O ne o f the p roblem s occurring in m edical diagnosis tasks is th a t som e feature values arc n o t available for every patien t. T h a t’s why we have co m p lete fe a tu re v ec to rs fo r N = 353 case re c o rd s (an d N U l = 149, NT1 = 146; N U 2 = 46, N T 2 = 15 respectively).

In o rd e r to stu d y th e usefulness o f som e reco gn itio n alg o rith m s to classify patien ts to the risk subg ro u p s we apply th e follow ing m eth od s:

1. T h e N earest N e ig h b o u r A lgorithm (N N ) - th a t classifies the u n k n o w n p attern vector x by calcu latin g the distances betw een the object an d all objects in the learning set and assigning it to the class th a t the nearest learning object belongs to. So:

3. S E L E C T E D PATTERN R EC O G N ITIO N A L G O R IT H M S

y/%N(x) = i if d(x; Xt't) = m i n d ( x ; x gJt)

(4)

2. T h e a-N carcst N eig h b o u rs A lgorithm (a -N N ) - th a t classifies the unknow n p a tte rn vector x by assigning it to the class th a t is m o st co m m on am o ng its a nearest neighbours:

^ |/ " NJV(x) = i if a, = m a x a ( , i e K (6) JtK

3. T h e D istance - Based A lgorithm (D B ) - th a t classifies the unk no w n pattern vector x to the class scoring the lowest value am ong the к classifying functions: vrf,D(x) - i if Bßßj(x) = m i n {DflOe(x)}, ie K (7) flc-K where:

ОВ° М = 1

I

d(X’ X J - - A 2 t

I >eK (8) ' ' i n = l m = l n = l

and d(*) is a distance m easure.

T h e distance m easu re applied in N N , a-N N an d D B alg o rith m s was (see C e s s i e , H o u w e l i n g e n 1995):

where p y is the n u m b er o f co n tin u o u s variables, p2 is the n u m b er o f categorical variables, cc is the n u m b er o f different categories fo r the c-th categorical variable and /{A} is the in d icato r function:

fl if th fe p ro p o sitio n inside the b rach ets is tru e

i(A1 = |o

o therw ise <l0 >

4. L inear C lassifying F u n ctio n s - w here к linear functions:

e , : X - > R , i e K (11)

one for each class, are defined. T h e unknow n p a tte rn x is classified to the class scoring the highest values am o ng the к classifying functions:

(5)

y/v(x) = i if e,(x) = m a x e e(x), re К (12) g e К

In th e p a rtic u la r case o f norm al class - co n d itio n al p ro b a b ility density functions we have fo u r follow ing form ulas for th e estim ato rs o f linear classifying functions: (í) é , ( x ) = — 2 d f

(x) + I n qlt

i e К (13) . 1N — k — p — 1 . . . I p , (2)e i( x ) = - j ---^ + 2 N + 9 i ’ i e K < з Л ( х ) = - 2d f ( x ) - ^ + lnq„ ie К (15) т Ф )

= -

j + ' ln [1 + N,(N, + I

) - ' ( N - k ) ~

‘í(x)] +

+ 2 ' П 1 < ! Г 1 + ' П, >' ( ,6 >

where df ( x ) = ( x - T C t)TS _1(x —X,); x ( is the vector o f m ean s for the i-th class, S is the variance - covariance m atrix and qt is a priori p ro bab ility th a t object x belongs to the i-th class. F o r m ore details see M . K r z y ś k o (1990).

5. T h e M a h a la n o b is D istance A lgorithm - th a t classifies the un k n o w n pattern vector x according to the decision rule:

4/ v (x) = I if MDf (x ) = m in{MDj2(x), i e К (17)

0eK where

MD f ( x ) = (x - X,)r S f ^ x - S i ) , i e К (18)

6. C lassification T rees - w hich are the rules for p red ictin g th e class o f an object from the values o f its predictor variables, constructed by recursively p artitio n in g the learning set. A t each node o f the tree we d o th e follow ing steps:

(6)

i. E xam ine every allow able split on each p red icto r variable. ii. Select and execute the “ b est” o f these splits.

iii. S top sp litting on a node when som e stopping rule is satisfied. C lassificatio n trees b uild ing m eth o d s arc n o n p a ra m c tric tech n iq u es dealing w ith different kinds o f variables (both ordered - c o n tin u o u s an d discrete ord in al, and categorical), including m issing values. F o r m o re details sec E. G a t n a r (2001), L. B r e i m a n el al. (1984) (C A R T algorithm - C lassification and R egression Trees), W .-V. L o h and V.-S. S h i h (1997) (Q U E ST algorithm - Q uick U nbiased Efficient S tatistical T ree), H. K i m and W .-V. L o h (2000) (C R U IS E algorithm - C lassification R ule w ith U nbiased In teractio n Selection and E stim ation).

Som e algorithm s are no t designed for categorical features so th a t for each categorical feature x, tak in g с values {clt c 2, ..., c c} we replaced it by (с — 1) - dim ensional vector ( z lt z2, ..., zc- i ) , such th a t z , = 1 if x = c t and z, = 0 otherw ise, fo r i = 1, 2, ..., c - 1. If x = cc, the vector consists o f all zeros.

In the co m p u ta tio n s we used S T A T IS T IC A PL package; the a u th o r’s own program m es in S T A T IS T IC A BASIC language for N N , a -N N , D B , M a h a la n o b is d is ta n c e an d lin e a r classify in g fu n c tio n s a lg o rith m s and classification trees b u ild in g algorithm s: Q U E S T (h ttp ://w w w .stat.w isc .c - d u / ~ l o h / q uest.h tm l) an d C R U IS E (h ttp ://w w w .w p i.e d u /~ h k im /c r u is e /) .

4. T H E R E S U L T S OF CLA SSIFIC A TIO N

T h e results o f ap p lica tio n o f selected recognition alg o rith m s are su m m arized in T ab . 1.

T a b l e 1

R esults o f patients’ classification

Pattern recognition algorithm

Frequency o f incorrect diagnosis (%) class learning set test set

1 2 3 4 N N good outcom e X 15.75 deaths X 75.00 total X 20.25 5-N N good outcom e X 4.11 deaths X 83.33 total X 10.13

(7)

T able 1 (condt.) 1 2 3 4 7 - NN good outcom e X 2.74 deaths X 75.00 total X 8.23 9-N N good outcom e X 2.74 deaths X 91.67 total X 9.49 l l - N N _{good outcom e} _X _1.37 deaths X 100.00 total X 8.86

M ahalanobis D istances good outcom e 12.75 30.82

deaths 30.43 16.67

total 16.92 29.75

Linear classifying functions - (13) good outcom e 5.37 9.59

deaths 60.87 91.67

total 18.46 15.82

deaths 60.87 91.67

total 18.46 15.82

deaths 60.87 91.67

total 18.46 15.19

deaths 60.87 91.67

total 18.46 15.19

D istance - based algorithm good outcom e X 37.67

deaths X 16.67

total X 36.08

CA RT • m isclassification costs o f good outcom e 26.17 31.51

predicting class “ D eath s” deaths 36.96 33.33

as class “G o o d outcome": 3:1;

• estim ated priors; • univariate splits; • stopping rule - 1SE; • constructed tree - see Fig. 1

total 28.72 31.65

C R U IS E • m isclassification costs o f good outcom e 28.19 32.28

predicting class “D ea th s” deaths 23.91 33.33

as class “G o o d outcome": 4:1;

■ estim ated priors; • univariate splits; • stopping rule - 1SE; • constructed tree - see Fig. 2

(8)

1 2 3 4

Q UEST learning set with missing good outcom e 18.59 26.18

values; deaths 43.86 26.67

• m isclassification costs o f predicting class “ D eaths" as class “G o o d outcom e” : 4:1;

univariate splits; stopping rule - 1SE; constructed tree - see Fig. 3

total 22.36 26.20

C R U ISE learning set with missing good outcom e 10.29 24.33

values; deaths 43.86 26.67

• m isclassification costs o f predicting class “ D eath s” as class “G o o d outcom e” : 4:1;

linear com bination splits; stopping rule - 1SE; constructed tree - see Fig. 4

total 14.99 26.76

S o u r c e : author’s calculations.

T he classification rule from the C A R T algorithm (sec Fig. 1) is very simple:

• Age < 62,5 years => G o o d outcom e; • Age > 62,5 years => D eaths.

(9)

Fig. 2. Classification tree Гог patients undergoing CA BG - the C R U IS E m ethod

Fig. 3. C lassification tree for patients undergoing C A B G - the Q U E ST m ethod (for learning set with missing values)

(10)

T he classification rules using the C R U IS E algorithm (sec Fig. 2) arc m ore com plicated an d for dead p atients can be described as follows:

=> Age < 55.68 years л H yperthyroidism = YES;

=> E F % ^ 64.6 % л {(Age > 59.07 years) o r (Age e (55.68; 59.07] years л A O = “ Y E S ” )};

=> Age e (55.68; 59.07] years л E F % < 43.36% .

T h e decision rules fo r patien ts who died, con structed using the Q U E S T algorithm for the learn in g set with m issing values, can be described as (see Fig. 3):

=> Previous card iac surgery = “ Y E S ” or: => Ejection fractio n E F % < 43.05% or:

=> F em o ral popliteal vascular disease = “ Y es” and BSA < 1.88 or: => P riority o f o p eratio n = “ E M E R G E N T ” or:

=> Ejection fractio n E F % e (43.05; 58.16]% and Age > 64.58 years.

Fig. 4. Classification tree for patients undergoing C A BG - the C R U IS E m ethod with linear com bination splits (for learning set with m issing values)

T h e use o f linear co m b in atio n splits requires the tra n sfo rm a tio n o f every categorical variable in to an ordered one - the sam ple values tak en by the categorical variable are m ap p ed into 0 - 1 dum m y vectors and th e d um m y vectors are projected o n to th eir largest d iscrim inate co o rd in a te (called C R IM C O O R D ; see e.g. L o h and S h i h 1997). T h e C R 1 M C O O R D values for categorical p re o p erativ e risk factors are presented in T ab . 2.

T a b l e 2 C R IM C O O R D values for categorical risk factors

Variable Crim coords

C A D no -0.0 4 9 5 9

yes 0.04959

D iabetes m ellitus no -0.0 7 2 5 0

(11)

Table 2 (condt.)

Variable Crim coords

AO no -0.0 9 7 9 2

yes 0.09792

Hyperthyroidism no -0 .1 3 5 9

yes 0.1359

Previous cardiac surgery no -0.1601

yes 0.1601

Left main stenosis no -0.07903

yes 0.07903

Priority o f operation elective -0 .1 6 0

urgent -0 .09205

emergent 0.2529

S o u r c e : author’s calculations.

T h e classification rule (see Fig. 4) can be described as follows: we go to the co rresp o n d in g no d e if th e d iscrim inant score is the m axim u m . T h e values o f d iscrim in an t coefficients arc presented in T ab . 3.

T a b l e 3 Discrim inant coefficients

Variable C oefficients - node 1 -

good outcom e C oefficients - nod e 2 - deaths

Constant -176.40 -1 7 3 .3 0 Age 1.134 1.201 BSA 79.50 78.60 RRs 0.0372 0.0138 RRd 0.9052 0.9293 EF% 0.5846 0.5288 AspAt 0.1154 0.1243

Fam ily history o f C A D 0.7253 8.71

D iabetes m ellitus -47.31 -46.61

AO -4 0 .0 2 -34.61

Hyperthyroidism -20.73 -1 7 .0 8

Previous cardiac surgery -40.01 -2 8 .2 0

Left main stenosis 18.76 19.03

Priority o f operation -5 4 .0 9 -4 4 .4 6

(12)

5. C O N C L U SIO N S

T h e follow ing conclusions m ay be draw n from the Tab. 1 and Fig. 1 4. T h e d a ta set is no t easy to classify. T h e set o f case records describing patients u n derg oing C A B G has tw o typical for m edical diagnosis tasks properties - a lot o f m issing values and a great d isp ro p o rtio n in the num ber o f patien ts in classes.

Som e p attern recognition algorithm s, i.e. N N , a-N N and linear classifying functions arc very good a t prediction the “ G oo d o u tco m e” class b u t the classification o f the d e a th s is incorrect.

T he distance - based algorithm im proves the recognition o f objects from the class with co m plications and d eath , but the best results we o b tain ed using the M ah a la n o b is distance rules.

C lassification trees building procedures were used assum ing unequal m isclassification costs. T h e higher m isclassification cost for the class o f the d eath s m akes the p a tte rn recognition task m ore realistic. D ecision rules based on classification trees are easy to interp ret and the frequencies o f incorrect p redictions - n o t to o high. T ree con stru cted classification provides the researcher with u n d ersta n d in g and insight o f the d a ta .

It is p ro p e r to add th a t (for trees with un ivariate splits) we can classify a new p atien t know ing the values only for a few risk factors.

R EFEREN CES

B r c i m a n L., F r i e d m a n J., O l s h e n R. , S t o n e C. (1984), C lassification a n d Regression Trees, C R C Press, L ondon.

C e s s i e S., H o u w e l i n g e n H. C. (1995), Testing the F it o f a Regression M o d el via Score Tests in R andom E ffe cts M odels, “ Biom etrics” , 51, 2, 600-614.

O a t n a r F,. (2001), N ieparam etryczna m etoda dyskrym inacji i regresji, P W N , W arszawa. K. i m H., L o h W .-Y. (2000), C R U IS E User M anual, version 1.05, http ://w w w .w pi.edu /

hkim /cruise/.

K r z y ś k o M. (1990), A naliza dyskrym inacyjna, W N T, Warszawa.

K u r z y ń s k i M. (1997), Rozpoznaw anie obiektów . M e to d y sta tysty c zn e, O ficyna W ydawnicza Politechniki W rocławskiej, Wrocław.

L o h W .-Y ., S h i h Y .-S. (1997), S p lit Selection M eth o d s f o r C lassification Trees, “ Statistica Sinica” , 7, 815-840.

(13)

M a łg o r z a ta M i s z t a l

Z A S T O S O W A N IE W Y BR A N Y C H A LG O R Y TM Ó W R O Z P O Z N A W A N IA O B R A Z Ó W D O K LA SY FIK A C JI P A C JE N T Ó W Z C H O R O B Ą W IEŃ C O W Ą

L E C Z O N Y C H O PE R A C Y JN IE

Przedmiotem badań w rozpoznawaniu obrazów są m etody wspom agania procesów pod ej m owania decyzji, przy czym przez obraz rozumiemy ilościow y opis przedm iotu, zdarzenia lub zjawiska.

O gólnie zadanie teorii rozpoznaw ania polega na określeniu przynależności rozm aitego typu obiektów do pewnych klas. Jeżeli mamy d o czynienia z zadaniem rozpoznaw ania, w którym występuje k klas: К = 1, 2, .... к - to celem klasyfikacji jest przypisanie rozpoznawanem u obiektowi numeru klasy ie K na podstaw ie wartości p wybranych cech obiektu.

W referacie przedstaw iono przykłady zastosow ań wybranych algorytm ów rozpoznaw ania w diagnostyce medycznej. Obiektam i podlegającymi klasyfikacji są pacjenci z chorobą n ied o krwienną serca, zakw alifikow ani d o leczenia operacyjnego, opisani za p om ocą wektora cech oceniających ich stan przed i w trakcie zabiegu, a także przebieg leczenia około- i pooperacyjnego.

K lasyfikacji pacjentów d o w yodrębnionych grup ryzyka operacyjnego d o k o n a n o za pom ocą reguł decyzyjnych bazujących na pojęciu minimalnej odległości (algorytm y najbliższego sąsiada i a najbliższych sąsiadów oraz funkcje klasyfikujące oparte na odległościach), liniow ych i kwadratowych funkcji klasyfikacyjnych oraz algorytm ów tworzących drzewa klasyfikacyjne (CA RT, Q U E ST , C R U IS E ).