A C T A U N I V E R S I T A T I S L O D Z I E N S I S
FO LIA O E C O N O M IC A 175, 2004
R o b e rt P ie tr z y k o w s k i* , W o jciech Z ie liń s k i* *
Л N E W P R O C E D U R E O F M U L T IV A R IA T E M U L T IP L E C O M P A R IS O N S
Abstract. In the paper a statistical procedure for dividing a set { ц ,, ..., ц к} o f vector o f m eans o f к normal p-variate distributions into hom ogenous groups is proposed. It appears that proposed procedure has a high probability o f correct decision.
Key words: m ultivariate norm al distribution, M A N O V A , m ultiple com parisons.
1. IN T R O D U C T IO N
C o n sid er к n o rm al p-variate distrib u tio n s N (ц 1( E ), N p(pk, £ ). T he
problem is to divide a set o f vectors o f m eans {ц15 щ } in to h o m o geneous groups on the basis o f к sam ples X^, j = 1, np i = 1, k.
A subset {ц(1, n , J is called a hom ogeneous g ro u p if ц г, = ••• = any o f rem ainin g vectors equals ц,, W. Z i e l i ń s k i (1991). In u n iv ariate "case, i.e. p = 1 several proced u res o f dividing the set o f m ean s in to h o m og eno us g ro u p s are k n o w n . T h e com m only used in p ra ctical ap p lic a tio n s are T ukey, Scheffé, B o n ffe rro n i an d Least Significance Difference. In th e m ultiv ariate case th ere are a lot o f different procedures w hich divide a set o f m eans in to g roups. Som e o f them are described in P. R. K r i s h - n a i a h (1966), C. R. R a o (1973), H. A h r e n s a nd J. L ä u t e r (1979), T. G a l i ń s k i et al. (1979), T. C a l i ń s k i and M. L e j e u n e (1998), M. K r z y ś k o (2000). T h o se proced ures m ainly are based on distances o f sam ple m eans, bu t no n e o f them is a full an a lo g o f on e dim ensional procedures. In w hat follow s there is a proposition o f a p ro c ed u re which is an extension o f one dim ensijonal M . Z i e l i ń s k i (1998) p ro c ed u re to m u ltiv ariate case.
* P hD ., D epartm ent o f M athem atical Statistics and E xperim entation, U niversity o f Agriculture o f W arsow, e-mail: pietrzyk@ dela.sggw .w aw .pl.
** Prof., D epartm ent o f M athem atical Statistics and Experim entation, University o f Agriculture o f W arsow, e-mail: w ojtek.zieliński@ om ega.sggw .w aw .pl.
In the p ap e r we restrict ourselves to the sim plest situ atio n . W e assum e, th a t we have sam ples w ith the sam e nu m ber o f o b serv ation s and th a t the sam ples are independent. A lso we assum e ew uality o f covariauce m atrices o f co m pared distrib u tio n s.
2. PR O C E D U R E
Let — [/A'i> k) and let Xy [Xnj. Xipj] (j 1, и,, i = l , к). Let N = j ] , t ni denotes the num ber o f all observed vectors and
X« = -
t
X ‘J = .... X i p l x = I Z t x ‘J = [ * I . 0 )n i j = l ™ i = l J=1
denote sam ple m ean fo r i-th sam ple and overall sam ple m ean respectively. L et / ( r ) = { /j, /,} be a division o f a set o f {1, ..., k} and let
where
H / ď ) — [ ^ y ] < J = i ...p ( 2 )
T he m ean is the sam ple m ean o f i-th v ariate in th e g ro u p I m. Let E = [e y ]jj= i.... p be a stan d ard m atrix o f ran do m errors:
eij = ^ - r Z Ž (* -.« - X mi) ( X mJI - X mj) (5)
™ Km=1 1 = 1
Let ý * = { /J, ..., I*} be a division into r d isjoint subsets such ta h t t r H / . ^ j E- 1 = m i n t r H /(r)E- 1 (6)
/ w
Let a = ( a j, a 2, a ^ - i ) be a sequence o f num bers such th a t a r e (0 , 1). P rocedure starts w ith r = I and r is increased till
where F(xr, a, b) is the critical value o f the F d istrib u tio n with (a, b) degress o f fredom : a + 2 ( N - k + r - p - l ) ( N - k - \ ) a , pr- b = 4 + _ j В = ( N _ k _ p _ 3 ) ( N _ k „ p ) ’ ; a(b - 2) ° ~ b ( N - k - p - \ I) (8)
I f the p ro ced u re stops and if the inequality t r H ^ E - 1 ^ cF(<xr,a, b) holds we decide th a t we have the follow ing division o f th e vectors o f m eans:
...{ ц , : 1 ' е / г* } } , (9)
otherw ise
{{Hi}. W } - (1 0)
3. CR ITERION
Let © = {0l , 0 2, ...,} d enotes the set o f all possible divisions o f the set o f vectors o f m ean s in to hom ogenous groups. E lem ents o f the set 0 are disjoint subsets o f (R p)* and for every (ц15 ...,n Jk) e ( R ',)i th ere exists only one 0 e 0 such th a t (ji^ ...,fik) e 0 . N ote th a t 0 is a finite set. T h e elem ents o f the set 0 are com m only called “ states o f n a tu re ” .
T he aim o f any m ultiple com parison procedu re is to “d e te c t” the tru e state o f n ature. Let S be a set o f all dicisions w hich can be m ad e on the basis o f ob servations. T h e elem ents o f the set are called “decisions” . We assum e th a t Э Э 0 .
W e define the loss function in the follow ing m ann er:
U d , 0 ) = \ ° ’ ^ ’ for d e 9 and 0e0 (1 1)
[1, if d ^ O ,
This loss function gives penalty o f one when o u r decision is n o t correct. I f we d en o te by SC the space o f all observations, th an the functijon
& . Э С — is called a “decision rule” . A ny o f the above m entioned procedures
o f m ultiple co m p ariso n s m ay be described as a decision rule.
A decision rule ô is ch aracterized by its risk function, i.e., average loss. Let (n t , ..., ц * )е0 . T h en the risk function o f the rule ô equals
^0*1» •••> ł1«;) — Ли,...-.)№) ^
(12)
N o te th a t in general the risk depends on the distances o f vectors (ilf ц*. T h e risk o f the rule ó is the p ro b ab ility o f the false decision. T his probab ility should be as small as possible. In o u r investigation we arc interested in a p ro b a b ility o f a correct decision which is equal to 1 — R y
We are interested in the prob ab ility o f the correct decision o f the described procedure. T h e probab ility is very difficult to calculatc even in the sim ple case of к = 3. So we perform a M o n te C arlo experim ent to
estim ate the probability.
4. M O N T E CARLO E X PER IM EN T
T o estim ate the p ro b a b ility o f the correct decision, a M o n te C arlo experim ent was perform ed. In the experim ent p = 4, к = 8, n = 30 were taken. M eans w ere tak en in the ranges
H j e ( —3;3); ц2е ( - 2 ; 2 ) : ц3e ( - 2 .5 ; 2 .5 ) ; ц4е(0 .5 ; 0.5) and the coveriance m atrix
"l 0.35 0.35 0 .3 5 ' „ _ 1 0.35 0.35 1 0.35
1
P ara m e ter a was chosen as a t = ••• = a*_x = 0.05. H ence th e follow ing critical values were o btained in T ab . 1.
T a b l e 1 Critical values r a b с F( an a , b ) c F ( a r, a, b) 1 4 229.0000000 0.017467 2.411066 0.042115 2 8 324.8255159 0.035025 1.966947 0.068893 3 12 396.4590164 0.052597 1.776629 0.093445 4 16 452.0349345 0.070173 1.665917 0.116902 5 20 496.4083770 0.087751 1.591755 0.139678 6 24 532.6563615 0.105330 1.537818 0.161978 7 28 562.8235294 0.122910 1.496415 0.183924
In case o f к = 8 th ere are 21 possibilities o f dividing a set o f vectors o f m eans into disjoint hom ogenou s groups. All possible states o f n a tu re are show n in the T ab . 2. N o ta tio n (il5 i2...i j m ean s m grou ps w ith 'i> *2» v ec to rs. It is assu m ed j th a t i i < i 2 < ••• < im an d
»1+ «2 +• • • + >« = 8. F o r exam ple, (1 ,2 ,5 ) m eans th e division in to th ree hom ogeneous groups: {i^}, {ц2, ц 3}, {|Ч ,Ц 5,Ц6,Ц 7,Ц 8}, »e. / = {{!}, {2,3}, { 4 ,5 ,6 ,7 ,8 } } .
T a b l e 2
States o f nature for к = 8 State o f nature 8) 7), (2, 6), (3, 5), (4, 4) 1, 6), (1, 2, 5), (1, 3, 4), (2, 2, 4), (2, 3, 3) 1, 1, 5), (1, 1, 2, 4), (1, 1, 3, 3), (1, 2, 2, 3), (2, 2, 2, 2) 1, 1, 1, 4), (1, 1, 1, 2, 3), (1, 1, 1, 2, 2) 1, 1, 1, 1, 3), ((1, 1, 1, 1, 2, 2) 1, 1, 1, 1, 1, 2)
In a M o n te C arlo xperim ent one has to choose values o f m eans. It is diffcult to m a k e a “ p la n n e d ” experim ent in the sense o f cho osin g the m ean values. T h is values were tak en in a ran d o m way. F o r exam ple, fo r the state (1, 7) there were random ly generated:
'! *21> h i e ( - 2 ; 2); * 3 ii *326 ( - 2 . 5 to U i> (»2 1*3 1*5 1*6 1*7 1*8 »11 h i h i h i h i h i h i h i h i h i h i h i h i h i h i h i *3. h i h i h i h i h i h i h i h i h i h i h i h i h i h i *■К»
Such p ro ced u re was applied 100 tim es for each state.
A t each generated p o in t (щ , ..., ц 8) there were m ad e 1000 d ra w n o f eight sam ples o f th irty from norm al p o p u latio n s w ith m ean s respectively. T o each sam ple the p ro ced u re was applied and the n u m b er o f division consistent w ith “ reality ” w as noted.
5. R ESU LTS
R esults are presented on Fig. 1 and 2 for chosen co n fig u ratio n s ol m eans. F o r a given state o f n atu re the generated co n fig u ratio n s o f vectors were ord ered d u e to the estim ated value o f prob ab ility o f th e correct decision. O n x axis there is a n u m b er (divided by 100) o f generated configuration and on у axis there is an ap p ro p ra tc value o f the estim ated probability. In the F ig. 1 results for the state (1, 7) are presented an d in the Fig. 2 a p p ro p ria te results for (1, 1, 1, 1, 1, 1 , 2 ) state. T h e results for o th er states were sim ilar.
l.ö-i ... ... ... ... 0.9 - •**"" ***’ ** ... 0.8 - 0.7 -0.6- 0.5 - * 0 . 4 - •• 0.3 - 0.2 -0.1 -• 0 . 0 - f --- I--- 1---1---1--- I---1---1---1--- 1---1 0 .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Fig. 1. Results for (1, 7)
O n the basis o f sim ulations follow ing conclusions m ay be fo rm u lated . 1. In general th e p ro ced u re has a q u ite high p ro b a b ility o f a correct decision. In average, a b o u t 80% o f configuratio ns o f vectors o f m ean s were correctly dcteted with p ro b ab ility a t least 0.90.
2. R esults m ay be in terp reted in the folow ing way. If, for exam ple, the pro cedúre give a division ( 1 , 7 ) then with confidence a t least 90% we m ay be alm ost sure (m ore th a n 9 5 % ) th a t obtained division coincide w ith reality.
3. We m ad e investigations only for one case (k = 8 an d p = 4), b u t it m ay be expected, th a t fo r o th er k's and p's results will be sim ilar. Inves tig ations for o th e r к and p are in progress.
REFEREN CES
A h r e n s H. , 1. ä u t e r J. (1979), W ielow ym iarow a analiza wariancji, P W N , Warszawa. C a l i ń s k i T. , L e j e u n e M. (1998), D im ensionality in M A N O V A T ested b y C losed Testing
Procedure, “Journal o f M ultivariate A nalysis” , 65, 181-194.
C a l i ń s k i T. , D y c z k o w s k i A. , S i t e k M. (1979), Procedury testów jed n o czesn ych w wielo- zm iennej analizie wariancji, “M atem atyka S tosow ana” , 14, 5-31.
K r i s h m a i a h P. R. (1966), M ultivariate A nalysis, A cadem ic Press, N ew Y o rk -L o n d o n . K- r z у ś к о М . (2000), W ielowymiarowa analiza statystyczna, Uniwersytet im. Adam a M ickiewicza,
Poznań.
R a o C. Radhakrishna (1973), Linear S ta tistica l Inference and Its A p p lic a tio n s, W iley & Sons, N ew York.
Z i e l i ń s k i W. (1991), N o w a procedura porów nań w ielokrotnych, W ydaw nictw o SG G W , W arszawa.
Z i e l i ń s k i W. (1998), On a Procedure o f M ultiple Comparisons, “Biometrical Letters” , 35, 67-76.
Robert Pietrzykowski, Wojciech Zielinski
N O W A W IE L O W Y M IA R O W A PR O C E D U R A P O R Ó W N A Ń W IE L O K R O T N Y C H
W pracy rozw ażano problem podziału wektorów średnich pochodzących z к populacji o w ielow ym iarowych rozkładach norm alnych na grupy jednorodne. Jako kryterium jakości procedury porów nań w ielokrotnych przyjęto praw dopodobieństw o podjęcia popraw nej decyzji (PC D ), to znaczy p raw d opod obień stw o uzyskania podziału zgodnego z rzeczywistym układem badanych zbiorow ości. T akie podejście d o problemu jest pewnym rozwinięciem prac W. Z i e l i ń s k i e g o (1991, 1998) dla przypadku jednow ym iarow ego. Jako statystykę testow ą w ykorzystano statystykę T 2 L a w leya-H otellin ga. W pracy przed staw ion o wyniki badań symulacyjnych dla ośm iu populacji czterocechowych.