A C T A U N I V E R S I T A T I S L O D Z I E N S I S F O L IA O E C O N O M IC A 194, 2005 Janusz Wywiał* ON ESTIM ATIO N OF D O M IN A N T O F M U L T ID IM E N SIO N A L RA N D O M VARIABLE Abstract
T h e p ro b lem o f e stim atio n o f the m ode o f a c o n tin u o u s d istrib u tio n fu n ctio n o f m u lti d im en sio n al ra n d o m variable is considered. T h e e stim a to r o f the m ode is th e vector o f m eans fro m a p p ro p ria te ly tru n ca te d sam ple. T h e tru n ca tio n sam ple is o b tain ed th ro u g h rejecting the o b serv atio n in such a w ay th a t the m easu re o f skew nees o f m u ltid im en sio n al v ariab le takes value as close zero as possible. W e can expect th a t th ro u g h successful tru n c a tio n o f the sam ple the vector o f sam p le m ea n s a p p ro a c h to vector o f m odes o f m ultid im en sio n al v ariable. T h e estim a to r co n stru cted in such a w ay is usually biased e stim ato rs o f the m ode. M o reo v er, the biased estim a to rs o f values o f m o d al regressions are p ro p o sed . T h e well k n o w n “jac k k n ife ” p ro c ed u re is p ro p o se d to ev alu ate the m ean sq u a re e rro rs o f the estim a to rs.
Key words: m o m en ts o f tru n c a te d d istrib u tio n , e stim atio n o f d istrib u tio n m ode.
I. E S T IM A T IO N O F M O D E IN O N E D IM E N S IO N A L C A S E
Let Х щ ^ X (2) ^ ••• ^ -^(n) be the sequence o f the o rd e r statistics from
a sim ple sam ple d raw n from one-dim ensional d istrib u tio n . T h e th ird central m o m en t from the right hand truncated simple sam ple is defined by the expression: W = l l № i ) - ^ 3. 1 < k ^ n к i=1 1 * __ ] " where: X k = X w . Particularly, X = - = Х „. к i=1 i=1
Let us assum e th a t M (3)( X (n)) = M 3 > 0. T h e sam ple q uasi-m o de Dq is defined as follows:
Ge — X e, if М ъ( Х (е)) ^ 0 and M 3( X ik)) > 0 for k = e + 1, n.
T h e statistic D q will be usually a biased estim ato r o f the m o de y. Theorem 1 (Jo h n so n and R ogers, 1951). Let the density function o f a ran d o m variable X be one-m odal end it is concave from th e left side o f the d o m in a n t у and it is convex from the right side o f the d o m in a n t y. M oreover, let D 2( X ) > 0 . There exists one dim ensional probability distribution if and only if (E (A ) - y)2 < 3 D 2(X ).
H ence, the well know n Pearson coefficient o f skewnees is bounded. Let us assum e th a t prob ability d istributio n o f X is righ t side truncated in the p o in t a and E (X |a ) = y{a), D 2( X \ a ) and t}2( X \ a ) be the expected
value, the variance and the third central moment o f the truncated distribution, respectively. T h e theorem 1 lead to the following. I f t j ^ X ^ ) —*• 0 and
D (A |fl) d o n o t increases then the (E (X |a ) — y)2 do n o t increase. Hence, the a p p ro p ria te tru n c a tio n o f th e p ro b a b ility d istrib u tio n can lead to decreasing the distance between m ode and the expected value o f the truncated prob ability distrib u tio n . T his lead to the conclusion th a t the estim ato r Ge is closer to the m ode th an the com m on m ean Y from n o n -tru n c ated simple sam ple. T h a t is why the m ean from the tru n cated sam ple can be used to estim ation the m ode.
II. E S T IM A T IO N О I* M O D E IN M U L T ID IM E N S IO N A L C A S E
Let f ( x t , ..., x k) = f(x) where x = [ x ^ .- x j, be a density fun ctio n o f an /с-dim ensional ran d o m variable defined in R \ T h e vectors o f expected values and variances are denoted by ц = I/*i ■••/**] and a 2 — [сг2...а*]. T h e m ode of th e fc-dim ensional ra n d o m v aria b le is d e n o te d by y = [yi---yk] and
f ( y ) = m axim um . T h e central m om ents o f the o rd er 3 o f the r-dim ensional
variable are d enoted by: 0 0 CO
tluwziXt, Xj, X g) = j ... j (*, - Mi)u( X j- f i j ) v( x g- Hg)9f ( x 1, ..., x k) d x v ..dxk,
— 00 — 00
Vu(Xd = rjuooiXi, X P X g)
Let us consider the truncated m ultidim ensional variable o f the following density function:
/ ( * ... , х , И ) = / ( * ‘...х ‘» P ( [ X ,... * J e A )
where A is a convex region and A ę R * . H ence, f ( x l t ..., x*|A ) is density function o f tru n ca ted distributio n. T h e vectors o f expected values and variances o f the truncated distributions arc as follows: ц(А ) = [/Xj(A).../^(A)], <r2(A ) = [ct?(A)...ct?(A)], where:
oo oo /*i(A) = J - Í x J ( x u . . . , x k\ A ) d x l ...dxk, - oo — 00 oo oo
erf(
A ) = f ... í ( x i - n l( A ) ) zf ( x l, . . . , x k\ A ) d x l ...dxk, — 00 ~ 00 rjuwz( X i, X j , X g\ A) = oo oo= j ... Í
(x, —/*,(А))“(х^
— Hj(A))v(xg — мв( ^ ) У Л х1» —»x k\ A ) d x v ..dxk. - ao — ooLet us in tro d u ce the follow ing notation:
0 = [Oj 02 ß3] is o f dim ensions 1 x ( k 2 + k(k - l )( k — 2 )/6),
w here Oj = t]3( Z j ) ... г/3( Х к)] o f dim ensions 1 x k, Q г — \ f l \ г ( Х и X 2) rj2l( X l , X 2) ... rj2l( X k- l , X k) r/2l( X k, X k- j)] o f dim ensions 1 x k ( k — 1), 03 = [ Tt nÁXt , X 2, X 3) rji i i ( X i , X 2, X 4) ... Ц щ ( Х к- 3, X k~ lt X k) rjl l l ( X k^ 2, X k- l , X k)] o f dim ensions 1 x - k ( k - l ) ( k - 2 ) ,
T he vector can be estim ated by the vector L = [L1 L 2 L 3], where: L 1 = [C 30( X 1, X 2) C03( X 2, X,) ... С о з ^ - х Д Л ,
L2 = IC12( XuX 2) C21( X 1, X 2) ... С21( Х к^ , Х к) C2l( X k, X k- i ) l
L 3 = [Ci n ( X u X 2, X 3) C i ц ( Х 1, Х 2, Х 4) ... С щ ( Х к~ 3, Х к- 1гХ к) С П1( Х к- 2, Х к. t , X k)].
Sim ilarly, wc define the vectors o f m om ents o f tru n cated d istrib ution: 0(A ) = [0,(A ) 0 2(A) 0 3(A)],
where
0 ,(A ) = № , | A ) ] , 0 2(A) = [|/12(ЛГ,,Ху I A)] and
вз(А ) = [7т ( ^ , а д | А ) ] .
It is well know n th a t if a d istrib u tio n fun ction is sym m etric, then all central m om ents o f the order 3 o f the m arginal one o r tw o dim ensional d istrib u tio n s are equal to zero and 0 = 0. In the case w hen 0 ^ 0 wc can find such set A = { A : A ę R ‘ and 0(A) = 0}. T h e vector o f m ean values K A ) = I/<i(A).../it (A)] from tru n cated distrib u tio n wc define as the A -m ode o f this d istrib u tio n and it will be denoted by Л/(А ). It is obvious th a t the А -m ode can n o t necessary be equal to the m o d e o f the en tire d istribu tion . T he co n d itio n s o f such equality can be stated q u ite sim ple only in th e case o f one-dim ensional d istrib u tio n , see W yw iał (2000a-2000b). T h a t is why the p aram eter y(A) can be called a quasi-m ode as in one-dim ensional case.
T h e sim ple sam ple o f size n is denoted by X = [X*b where:
X u
X*, = ... , i = 1 ,..., n. T h e sam ple m om ents are as follows:
Cuv:(XiyXj,Xg)
=I i ( x il- x - in x JI- x jr ( x gl- x ey, x , =
1i x it
n,( = 1L et X„.(A) (h = n , ..., 1) be such vector th at
d( X*w , X*(A)) = m ax im u m { 6 i,A}
where:
F w = { i : t = 1 ,..., n and X ^ ^ X * ^ for p > h ] ö i«*) = (X *1 - X * (A))r (X *i - X * (A)),
x *w = 1 Z x *<> nw = C a rd { Fw }
“ (*) teFm
— [-У 1(л)--Х t(A)], x *(rt) = X = - Y, X*,. ft t = 1 A tru n ca ted sam ple is identified by the set F {h).
Let
C ^ X f i X J h ) = — I ( Xlt - Y lwr ( X j t - x m n x el - x i(h)y , "(*) l«rw
L(/i) = [L x (A) L2(a) i j 3(A)] = [^i<A) ^ 2 (Л) ••• Av(*)J» where w = 1 x (к 2 + k(k - 1 )(k - 2)/6) and
Llw = [C30(^i, X 2\h) C03( X 2, X t \h) ... C0i ( Xk- u X k\h)l
U m ^ l C n i X t , X 2\h) C21(X lt В Д ) ... Cai( * » - ,, В Д Ca i(X* X k. m
L 3W = [ c m № , * 2. j f 3 |fc) c i n ( * i , * 2. З Д )
Схи(Хц_з, X*-!,
X k \h) C n i ( X k - 2, X k - i ,А"лI/г)].
T h e vector p aram eter у can be estim ated bz m ean s o f the statistic G,(u) determ ined bz the follow ing procedure:.
Let
£(*) — [Li(h) L 2ihy . .L w(h)] L *()I) = m axim um {Lfw }
1= 1, ...,W
G i W = X *(u): и = max{/i: h = n,..., 1 and L*(A)< e}
w here e > 0. It seems th a t e should be assigned in such a way th a t the size nM o f the tru n ca ted sam ple is sufficiently large.
T h e next estim ato r is based on m easure o f m u ltiv ariate skewness defined by M a rd ia (1970). L et a ‘j(A) be an (i, j ) elem ent o f the m atrix £ - 1 (A) where Ц А ) = [oy(A)], Оу(A ) = t i u ( X „ Xy|A), is the variance-covariance m atrix o f the truncated distribution. In the case o f truncated distribution the skewness coefficient can be w ritten in the following way.
ß ( A ) = £ £ ^ ' H A W ' H ^ W ' H V r i ^ i X ^ X tl, X tl\A)tl l i l
{ i i . i j . b } i h . h . h )
( X j í t X h , X h I A )
w here {iA, i2, i3} and {ju j 2, ; 3} are variatio n s w ith replications. Each v ariatio n is determ ined on the basis o f sequence: 1, 2, ..., k.
T h e coefficient o f skewness from the truncated sam ple Fw is as follows: B(*) = I F I ((X*i — X*(A))TS(Ä)1(X*i — Х*(Л)))3 "(A) ijBFm where: S(/,) = [C 11(A)(X j, Xj)], C U(A)(X „ X j ) = Z ( Xu — X nh)) ( X Jt- X m ). ” <*> in f,,,
T h e second estim ato r o f quasi-m ode is as follows:
G 2(„) = X*(u): v = max{/j: h = n, ..., 1 and Bw < e}
The statistics G i (u) and G I(u) can be biased estim ato rs o f the m ode y. T h eir variances can be estim ated using the well know n m ethod o f jackknife.
Let us assum e th a t 0 , = [^ (.Y j) rj3(A J ... 7 3(Arlt) ] > 0 for the values x g B c R * o f a m ultidim ensional random variable X . Let A # <=B be such the sup-set th a t ОДА#) = 0. If D 2(X ,) ^ D ^ X ^ A J for each i = 1, ..., к and
D 2( X j ) > D 2( ^ | A #) for at least one index j where j = 1 then 1 (е( * ;|а ^ - у ^ ^ Ь е д - у , . ) 2.
i= i i=i
It m eans th a t the a p p ro p riate tru n ca tio n o f the m u ltiv ariate probability distrib u tio n can lead to decreasing the distance betw een m o d e and the expected value o f the tru n cated probability distrib u tio n . T h is leads to the conclusion th a t in special cases the expected values o f the above introduced estim ators can be closer to the m ode th an the vector o f m eans from n o n -tru n c ated sim ple sam ple.
Som e sim ulation m eth o d s should be used to analysis o f the accuracy of the proposed estim ators. T he sim ilar estim ators can be co n stru cted on the basis o f o th er coefficients o f m u ltivariate skewness e.g. like those proposed by W yw iał (1983, 1985).
III. M O D A L R K G R E S S IO N
Let us consider the follow ing regression m odel: Y = X B + U
w here Y 7 = [ Y t ... Yn] is the vector o f independent ra n d o m variables. T he m atrix o f n o n -ran d o m values o f ex planatory variables is d eno ted by X and it is o f dim ension n x r. T h e vector o f param eters В has dim ensions r x 1.
M oreover, E (U ) = 0 and variancc-covariariancc m atrix o f U is d iag o n al and D 2([/.) = o-2 and 7 з ( ^ ) > 0 for each i = 1 T h e m o d al values o f the variables = [ y , . . . y j will be denoted by yT = and the m odal values o f the variables U 7 = [ U . . U n] arc the sam e and equal to к. T he m odal regression is defined in the follow ing way:
у = XB + kJ„
w here J„ is the vector o f dim ensions n x 1.
U nder these assum ption Е(У) = XB can be estim ated by m ean s o f the statistic:
Ý = XB
where В is unbiased estim ator o f В obtained by the well k no w n m ethod o f least squares and
В = (Х Г Х Г 1 Xr Y It is obvious th a t E(Ý ) = E(Y)
T h e vector o f residual is as follows:
Ú = Y - Ý = M Y where
M = In —X (X 7 X )“ 1 X ', In is unit m atrix o f degree n.
T h e problem is determ ining the vector у o f m o d al values. Firslty, as suggested Paw łow ski (1973), the m o d e к should be estim ated by m eans o f a statistic G which is a function o f the residuals Ü. T h is leads to the follow ing estim ato r o f the regression
Ý = Ý + G J„ = XB + G J„
w here J„ is th e vector o f dim ensions n x l . In o rd e r to sim plify the co n sid eratio n let us assum e th a t к > 0. T he statistic G can tak e form o f the estim ato r o f m ode proposed in the first. Let Ü a i7(2)^ . . . < l / (n) be the sequence o f the o rd e r statistics section. T h e third central m o m en t from the right han d truncated sam ple is defined by the expression:
M 3( t f » ) ) = ! I O ? « - # # * ) 3, K k ś n 1= i
where:
v#k = t Z 0 » i= i
W hen we assum e th a t M 3(í7(n)) = M 3(í?) > 0, the sam ple quasi-m ode G is defined as follows:
0 = t / #c, if M 3( t f #e) < 0 and M 3( [ /#L) > 0 fo r k = c + 1
T h e statistic Ý is biased estim ato r o f у because G is usually biased estim ato r o f к as it was discussed in the previous p arag ra p h .
T h e idea o f co n stru c tio n o f the next estim ato r is based o n th e truncated least-square m eth o d proposed by R u p p crt and C arro lla (1980). Let Y(e) is consisted o f those variables o f the vector Y which indexes arc equal to ap p ro p ria te indexes o f residuals in the set {I?,: i = i7(c)} where evaluation o f Ü(e) was showed above. In the sam e way th e sub m atrix X(c) o f the o bservatio ns m atrix X is determ ined. N ext, we again evaluate the p aram eters o f the linear regression:
B(c) — (XÍ) x (c)) lX£)Y(c)
If e > r, the next estim ato r o f the m odal regression is as follows: Ý(c) = X(e)B(L,
In general case the statistic Y(c) is no t unbiased estim ato r o f the m odal regression.
Let us note th a t estim ators Ý and Ý(c) can be easy ad o p ted to evaluation co nditional prognosis o f the variable under study fo r som e fixed values of auxiliary variables. Let x be row vector (o f dim ensions r x 1) o f values of auxiliary variables. T he tw o predictors o f the value y(x) o f m o d al regression to r given vector x are as follows:
Y(x) = xB + G or Y(c)(x) = xB (e)
T h e m ean sq u are e rro r o f the estim ators o r predictors o f the m odal regression values and the param eters can be evaluated on the basis o f the jack k n ife m eth o d . T h e accuracy o f the b o th estim ato rs o r predicto rs o f the m o d al regression can be analysed on the basis o f a p p ro p ria te sim ulation studies.
R E F E R E N C E S
Jo h n so n N .L ., R o g ers С .Л . (1951), T h e m o m en t p ro b lem fo r u n im o d al d istrib u tio n s, Annals o f M a th em a tica l S ta tistic s, 22, 433-439.
M ardia K.V. (1970), M easures o f m ultivariate skcwneess and kurlosis with application, Biom etrika, 57, 3, 519-530.
Paw łow ski Z . (1973), P rognozy ekonom etryczne, P W N , W arszaw a.
R u p p c rt D ., C a rro ll R .J. (1980), T rim m ed least sq u ares e stim a tio n in the lin ear m o d el, Journal o f the A m erican S ta tistica l Association, 75, 828-838.
W yw iał J. (1983), N orm alized coefficients o f d eviation from m u ltin o rm al d istrib u tio n (in Polish). P rzegląd S ta ty sty c z n y , 30, 77-83.
W yw iał J. (1985), T est norm alności dla wielowym iarowej zm iennej losow ej dla dużych prób, (A test for n o rm ality o f a m ultid im en sio n al ra n d o m v a ria b le in the larg e sam ple case; in Polish), P rzegląd S ta ty sty c z n y , 32, 355-364.
W yw iał J. (2000a), E stim atio n o f d istrib u tio n fu n c tio n m ode on the basis o f sam ple m om ent o r sam ple m ed ian , Badania Operacyjne i D ecyzje, n o 2, 89 -98.
W yw iał J. (2000b), E stim a tio n o f m o d e on the basis o f a tru n c a te d sam ple, A cta U niversitätis Ijodziensis. Folia O econom ica, 152, 73-81.
Janusz Wywiał
O E S T Y M A C JI D O M IN A N T Y W IE L O W Y M IA R O W E J Z M IE N N E J L O S O W E J
Streszczenie
P raca dotyczy p roblem u estymacji dom in an ty rozkładu p ra w d opodobieństw a wielowymiarowej zm iennej losow ej. Z ajm o w an o się pro b lem em estym acji d o m in a n ty zm iennej losow ej ciągłej. A n alizo w an o jed n o m o d a in e ro z k ła d y p ra w d o p o d o b ień stw a.
W niniejszej p racy an alizo w an o głów nie klasę ro z k ła d ó w p ra w d o p o d o b ie ń stw a zm iennych losow ych ch ara k te ry zu ją cą się tym , że z ach o d zą d la nich pew ne nierów ności m iędzy w artościam i oczekiw anym i i d o m in a n tą ro zk ład u fu n k cją jeg o trzecich m o m en tó w cen traln y ch . D la takiej k lasy ro z k ła d ó w są w p ro w ad zo n e d w a esty m a to ry d o m in a n ty , k tó ry ch w yznaczanie w p raktyce m a c h a ra k te r iteracy jn y . Z g ru b sza rzecz biorąc, wyliczenie w a rto ści e sty m a to ra d o m in a n ty w iąże się z sukcesyw nym o bcinaniem obserw acji p ró b y d o chw ili, gdy z o sta n ą spełnione pew ne w arunki, a w szczególności, że pew na funkcja trzech m ieszanych m om entów centralnych rozkładu uciętego osiągnie w a rto ść zero. W ów czas oceną p u n k tu b ędącego d o m in a n tą ro z k ła d u w ielo w ym iarow ego są w łaśnie średnie z uciętych ro z k ład ó w brzegow ych z p ró b y . P ro p o n o w an e e sty m ato ry m o g ą być obciążone. W n iek tó ry ch p rz y p a d k a ch d a ło się oszaco w ać m aksym alny p o zio m tak ie g o obciążenia. P ro p o n o w an e są rów nież d w a e sty m a to ry regresji m odálnej.