• Nie Znaleziono Wyników

Classification of Patients With Respect to Some Group of Factors


Academic year: 2021

Share "Classification of Patients With Respect to Some Group of Factors"


Pełen tekst


A C T A U N I V E R S I T A T I S L O D Z I E N S I S FO LIA O E C O N O M IC A 228, 2009______________

E w a N o w a k o w s k a -Z a jd e l* M a łg o rz a ta M uc-W ierzg o ń *+* . , _ j **** G rażyna T rzpiot , A lic ja G a n cza rek



Abstract. In this paper a classification o f exam ined patients was carried out based on results o f m ultivariate analysis using classification trees. T he aim o f th e analysis w as to identify characteristic factors describing groups o f patients suffering from colorectal cancer w ith different stage o f disease. Clinical data from m edical docum entation o f the patients w ith colon cancer w ere analyzed. Q ualitative variables such as sex, clinical stage, histopathology type o f cancer and m alignancy, w eight class, glucose level class and coexistence w ith other illnesses w ere used in the analysis.

Key words: C lassification trees, severity, type and histopathology m alignancy, body m ass index, glucose level.


Colorectal cancer ranks the second place in regard to occurrence among men and women and is the third cause o f death amongst oncological patients in Po-land. Each year new cases o f colon cancer account for 11 000 while total number o f patients newly diagnosed with neoplasm is 110 000 (Nowacki (2006)). Rea-sons for the disease are still unknown. The most important are genetic predispo-sitions as up 10-15% cases are familial. Environmental factors as poor diet, insufficient physical activity and metabolic disorders (obesity, diabetes) play important role as risk factors for colon cancer (Chang C.K., Ulrich C.M. (2003), Wadden T.A., Brownell K.D., Foster G.D. (2002)). Nowadays many epidemiol-ogical research have stated relationships between age, BMI (Body Mass Index),

* Ph.D., Department of Internal Diseases, Medical University o f Silesia, Katowice. " Professor, Department of Internal Diseases, Medical University of Silesia, Katowice.

Professor, Department of Statistics, The Karol Adamiecki University of Economics, Katowice. Ph.D., Department o f Statistics, The Karol Adamiecki University of Economics, Katowice.


fasting level o f glucose and other growth factors as TNF alpha, leptin, insulin like growth factors, adiponectin and others. It is interested to know what medical data concerning colorectal cancer could play important role in division on the separate groups.

The aim o f this paper was to identify characteristic factors describing groups o f patients suffering from colorectal cancer with different stage o f disease.


A lot o f the empirical results show that the economic and sociological vari-ables (Gatnar, Walesiak (2004)) do not have normal distribution and are very often described by nominal values, while distributions o f variables often have incommunicable values and outlier observations. Thus, it is often the case that we cannot apply the classical methods to the classification o f the empirical vari-ables. In this paper, we applied the nonparametric method - the classification trees - to the classification. This method is based on the recursive partitioning o f the m-dimension space Xm into homogenous subsets concerning dependent vari-able y. When dependent varivari-able у is nominal, equation (1) is a classification tree (Breiman, Friedman, Olsen and Stone (1984), Gatnar (1998), Gatnar, Wale-siak (2004), Misztal (2007):


y = Y j akIí^ \ e R k) о )

k - \ where:

Xj - multivariate variable, element o f Xm,

R k - k = 1, K, are disjoint regions in the m-dimensional feature space, seg-ment o f Xm,

ak - the parameters,

I(q) - an indicator function:: J(q) 1 if q is true 0 if q is false


In this part o f the paper we analyzed the data from medical documentation o f out-patient clinic o f Cancer Ward o f 4th Hospital in Bytom and Department o f Internal Diseases in Bytom, o f Medical University o f Silesia from 2000 to 2007. We had information on 316 patients about:


- sex (Male, F em ale),

- Body Mass Index (BMI) (Normal weight, Abnormal weight), - glucose level (Normal glucose, Abnormal glucose),

- coexistence illnesses (cardiovascular diseases, diabetes, other malignant neoplasms),

- tumor location (colon, rectum),

- histopathologic type (mucous adenocarcinoma, adenocarcinoma, adeno-carcinoma with necrosis),

- clinical stage o f diseases (I, II, III, IV), - histopathology malignancy (G l, G2, G3).

We analyzed model (1) where the tumor location is the dependent variable y. In model (1) we took into consideration only the qualitative variables which we recognized as factors describing groups o f patients suffering from colorectal cancer with different stage o f diseases. We used the C&RT (Classification and Regression Trees) recursive partitioning method proposed by Breiman et al. (1984) and available in the STATISTICA PL package. To stop the recursive partitioning, we used three pruning methods: cost-complexity pruning, one Stan-dard Error (1SE) rule and FACT-Fast Algorithm for Classification Trees. The choice results o f our classifications are presented in table 1 and in figures 1-3.

Table 1. Results of classifications of tumor location

Fig. Recursive

parti-tioning method Pruning

Classification’s error Cross-validation Standard deviations

. Fig-L C&RT 1SE rule 0.45 0.44 0.03

Fig.2. C&RT FACT 0.43 0.43 0.03

—iiiir.A-.. C&RT 1SE rule 0.53 0.45 0.02

Unfortunately, we obtained significant errors o f classification (table 1). They probably resulted from a very high volatility o f the factors. However, when we analyzed the trees (figure 1-3), we were able to find a few important depend-ences.

For example, based on the first tree (figure 1), we can say, that the most im-portant factor in the tumor location is weight, the second imim-portant factor is sex and the third most important factor is histopathology type. The patient whose tumor is located in colon in many cases has mucous adenocarcinoma. If that was not mucous adenocarcinoma, the majority o f cases that were diagnosed con-cerned women suffering from abnormal weight (figure 1) or patients (women and men) with abnormal glucose (figure 2).


R a n k in g

---colon Classification tree-tumor localization



S . Щ V i

:• '



Щ Щ ' F ,''V4 Í



fr š 00 E 5 ac tt 8. t' u í ä E o P re d ic to r


P red ictor


Based on the third tree (figure 3), we can say that the histopathology malig-nancy is a more important factor in tumor location than the histopathology type. The patients whose tumor was located in colon, in many cases have mucous adenocarcinoma. Otherwise, they had G3 histopathology malignancy (figure 3).


■ rectum Classification 1ree-tumor localization

histopathology malignancy G2 ___ / ... ... ^ ___ colon colon 28 histopathology malignancy G1,G2 ____ ____ \ ____ rectum rectum 1 0 0 80 g 60 2Z H is t o p a t h o lo g y m a llg a n c y T y o e h P r e d ic to r



The results o f the research are not satisfactoiy. One point is, that the medical factors were volatile and heterogeneous. The examined group of patients was not homogenous with respect to other environmental factors such as diet, physical activity, smoking, genetic predisposition, non specific chronic colitis and pre-cancer states. The use o f nonparametric method (classification tree) to classify patients suffering from the colorectal cancer lets us identify important factors as: weight, sex, histopathology malignancy. All the factors were connected with tumor localization.

The study emphasizes the important explanatory role o f overweight and obe-sity for cancer, which was earlier mentioned in many research. The classification tree could help to illustrate relationships between the factors. The investigation should be carried on for strongly represented group o f patients. However, the multifactorial conditioning o f neoplasmatic disease makes the trial difficult.


Breiman L., Friedman J., Olshen R., Stone C. (1984), Classification and Regression Trees, CRC Press, London.

Chang C.K., Ulrich C.M. (2003), Hyperinsulinaemia and hyperglycaentia: possible risk factors o f colorectal cancer among diabetic patients. Diabetologia, 46, 595-607. Gatnar E. (1998), Symboliczne metody klasyfikacji danych, PWN, Warszawa.

Gatnar E., Walesiak M. (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych, AE, Wroclaw.

Misztal M. (2007), Wybrane metody analizy i prognozowania czasu pobytu na OlOM pacjentów z choroba wieńcowa, „Taksonomia 14. Klasyfikacja i analiza danych - teoria i zastosowania”, edited by K. Jajuga and M. Walesiak, AE Wrocław, 1169, 288-296. Nowacki M.P. (2006), Rak jelita grubego, „Onkologia kliniczna” edited by M. Krzakowski, Borgis Wydawnictwo Medyczne, Warszawa.

Trzpiot G., Ganczarek A. The classification o f risk on the Polish Power Exchange, „Ekonometria”, edited by J. Dziechciarz, AE Wroclaw, in press.

Wadden T.A., Brownell K.D., Foster G.D. (2002), Obesity: responding to the global epidemic. J Consult Clin Psychol, 70, 510-525.


Ewa Nowakowska-Zajdel, Małgorzata Muc-Wierzgoń Grażyna Trzpiot, Alicja Ganczarek


Bazując na wynikach analiz metod statystyki wielowymiarowej przeprowadzono klasyfikację grupy badanych pacjentów ze względu na grupę badanych cech.

Celem analizy jest próba wyodrębnienia charakterystycznych grup czynników wśród pacjentów chorujących na raka jelita grubego w różnym stopniu zaawansowania klinicznego.

Analizie poddano wybrane dane epidemiologiczne pochodzące z dokumentacji medycz-nej chorych z ustalonym rozpoznaniem - rak jelita grubego. Do analizy wykorzystano zmienne jakościowe: płeć, stopień zaawansowania klinicznego choroby, typ i złośliwość histopatologiczną, podział na osoby z wagą prawidłową, nadwagą i otyłością, podział ze względu na stężenie glukozy na czczo w surowicy krwi oraz współistnienie występo-wania innych chorób.


Powiązane dokumenty

U 1 383 chorych na raka p³uca, którzy w latach 1986–1995 byli le- czeni w zak³adach opieki zdrowot- nej województwa w³oc³awskiego ba- dano zwi¹zek zachorowalnoœci na ten nowotwór

Conclusions: This study shows that older age, lower hemoglobin level, and high positive/total lymph node ratio were inde- pendent risk factors for mortality among colorectal

Celem pracy by³a analiza wp³ywu wy- branych czynników prognostycznych, ta- kich jak: wiek, p³eæ, przebyte leczenie ra- ka p³uca, obecnoœæ z³amania patologicz- nego,

Wyniki badañ nad przyczynami opóŸnieñ w leczeniu u chorych na raka piersi z prób¹ oceny wp³ywu czynników

Pacjentów korzystających z  kolonoskopii jako ba- dania skriningowego dla wczesnego wykrywania raka jelita grubego charakteryzowało wysokie narażenie na modyfikowalne czynniki

showed that the presence of BRAF mutations in patients treated with chemotherapy combined with cetuximab is even a weaker predictive factor for this type of treatment than a mutation

stwierdzili zależność między wystąpieniem wariantu Cys/Cys a podwyższonym ryzykiem wystąpienia RJG [11]. Wykazali, że u osób młodych układ Cys/Cys przyczynia się do pod-

Ce ell p prra accyy:: Oznaczenie i porównanie ak- tywności arginazy i stężenia L-argininy w surowicy chorych na raka jelita grube- go i z przerzutami tego nowotworu do wątroby