• Nie Znaleziono Wyników

On Estimation of Dominant of Multidimensional Random Variable

N/A
N/A
Protected

Academic year: 2021

Share "On Estimation of Dominant of Multidimensional Random Variable"

Copied!
9
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S F O L IA O E C O N O M IC A 194, 2005 Janusz Wywiał* ON ESTIM ATIO N OF D O M IN A N T O F M U L T ID IM E N SIO N A L RA N D O M VARIABLE Abstract

T h e p ro b lem o f e stim atio n o f the m ode o f a c o n tin u o u s d istrib u tio n fu n ctio n o f m u lti­ d im en sio n al ra n d o m variable is considered. T h e e stim a to r o f the m ode is th e vector o f m eans fro m a p p ro p ria te ly tru n ca te d sam ple. T h e tru n ca tio n sam ple is o b tain ed th ro u g h rejecting the o b serv atio n in such a w ay th a t the m easu re o f skew nees o f m u ltid im en sio n al v ariab le takes value as close zero as possible. W e can expect th a t th ro u g h successful tru n c a tio n o f the sam ple the vector o f sam p le m ea n s a p p ro a c h to vector o f m odes o f m ultid im en sio n al v ariable. T h e estim a to r co n stru cted in such a w ay is usually biased e stim ato rs o f the m ode. M o reo v er, the biased estim a to rs o f values o f m o d al regressions are p ro p o sed . T h e well k n o w n “jac k k n ife ” p ro c ed u re is p ro p o se d to ev alu ate the m ean sq u a re e rro rs o f the estim a to rs.

Key words: m o m en ts o f tru n c a te d d istrib u tio n , e stim atio n o f d istrib u tio n m ode.

I. E S T IM A T IO N O F M O D E IN O N E D IM E N S IO N A L C A S E

Let Х щ ^ X (2) ^ ••• ^ -^(n) be the sequence o f the o rd e r statistics from

a sim ple sam ple d raw n from one-dim ensional d istrib u tio n . T h e th ird central m o m en t from the right hand truncated simple sam ple is defined by the expression: W = l l № i ) - ^ 3. 1 < k ^ n к i=1 1 * __ ] " where: X k = X w . Particularly, X = - = Х „. к i=1 i=1

(2)

Let us assum e th a t M (3)( X (n)) = M 3 > 0. T h e sam ple q uasi-m o de Dq is defined as follows:

Ge — X e, if М ъ( Х (е)) ^ 0 and M 3( X ik)) > 0 for k = e + 1, n.

T h e statistic D q will be usually a biased estim ato r o f the m o de y. Theorem 1 (Jo h n so n and R ogers, 1951). Let the density function o f a ran d o m variable X be one-m odal end it is concave from th e left side o f the d o m in a n t у and it is convex from the right side o f the d o m in a n t y. M oreover, let D 2( X ) > 0 . There exists one dim ensional probability distribution if and only if (E (A ) - y)2 < 3 D 2(X ).

H ence, the well know n Pearson coefficient o f skewnees is bounded. Let us assum e th a t prob ability d istributio n o f X is righ t side truncated in the p o in t a and E (X |a ) = y{a), D 2( X \ a ) and t}2( X \ a ) be the expected

value, the variance and the third central moment o f the truncated distribution, respectively. T h e theorem 1 lead to the following. I f t j ^ X ^ ) —*• 0 and

D (A |fl) d o n o t increases then the (E (X |a ) — y)2 do n o t increase. Hence, the a p p ro p ria te tru n c a tio n o f th e p ro b a b ility d istrib u tio n can lead to decreasing the distance between m ode and the expected value o f the truncated prob ability distrib u tio n . T his lead to the conclusion th a t the estim ato r Ge is closer to the m ode th an the com m on m ean Y from n o n -tru n c ated simple sam ple. T h a t is why the m ean from the tru n cated sam ple can be used to estim ation the m ode.

II. E S T IM A T IO N О I* M O D E IN M U L T ID IM E N S IO N A L C A S E

Let f ( x t , ..., x k) = f(x) where x = [ x ^ .- x j, be a density fun ctio n o f an /с-dim ensional ran d o m variable defined in R \ T h e vectors o f expected values and variances are denoted by ц = I/*i ■••/**] and a 2 — [сг2...а*]. T h e m ode of th e fc-dim ensional ra n d o m v aria b le is d e n o te d by y = [yi---yk] and

f ( y ) = m axim um . T h e central m om ents o f the o rd er 3 o f the r-dim ensional

variable are d enoted by: 0 0 CO

tluwziXt, Xj, X g) = j ... j (*, - Mi)u( X j- f i j ) v( x g- Hg)9f ( x 1, ..., x k) d x v ..dxk,

— 00 — 00

Vu(Xd = rjuooiXi, X P X g)

Let us consider the truncated m ultidim ensional variable o f the following density function:

(3)

/ ( * ... , х , И ) = / ( * ‘...х ‘» P ( [ X ,... * J e A )

where A is a convex region and A ę R * . H ence, f ( x l t ..., x*|A ) is density function o f tru n ca ted distributio n. T h e vectors o f expected values and variances o f the truncated distributions arc as follows: ц(А ) = [/Xj(A).../^(A)], <r2(A ) = [ct?(A)...ct?(A)], where:

oo oo /*i(A) = J - Í x J ( x u . . . , x k\ A ) d x l ...dxk, - oo — 00 oo oo

erf(

A ) = f ... í ( x i - n l( A ) ) zf ( x l, . . . , x k\ A ) d x l ...dxk, — 00 ~ 00 rjuwz( X i, X j , X g\ A) = oo oo

= j ... Í

(x, —

/*,(А))“(х^

— Hj(A))v(xg — мв( ^ ) У Л х1» —»x k\ A ) d x v ..dxk. - ao — oo

Let us in tro d u ce the follow ing notation:

0 = [Oj 02 ß3] is o f dim ensions 1 x ( k 2 + k(k - l )( k — 2 )/6),

w here Oj = t]3( Z j ) ... г/3( Х к)] o f dim ensions 1 x k, Q г — \ f l \ г ( Х и X 2) rj2l( X l , X 2) ... rj2l( X k- l , X k) r/2l( X k, X k- j)] o f dim ensions 1 x k ( k — 1), 03 = [ Tt nÁXt , X 2, X 3) rji i i ( X i , X 2, X 4) ... Ц щ ( Х к- 3, X k~ lt X k) rjl l l ( X k^ 2, X k- l , X k)] o f dim ensions 1 x - k ( k - l ) ( k - 2 ) ,

T he vector can be estim ated by the vector L = [L1 L 2 L 3], where: L 1 = [C 30( X 1, X 2) C03( X 2, X,) ... С о з ^ - х Д Л ,

L2 = IC12( XuX 2) C21( X 1, X 2) ... С21( Х к^ , Х к) C2l( X k, X k- i ) l

L 3 = [Ci n ( X u X 2, X 3) C i ц ( Х 1, Х 2, Х 4) ... С щ ( Х к~ 3, Х к- 1гХ к) С П1( Х к- 2, Х к. t , X k)].

(4)

Sim ilarly, wc define the vectors o f m om ents o f tru n cated d istrib ution: 0(A ) = [0,(A ) 0 2(A) 0 3(A)],

where

0 ,(A ) = № , | A ) ] , 0 2(A) = [|/12(ЛГ,,Ху I A)] and

вз(А ) = [7т ( ^ , а д | А ) ] .

It is well know n th a t if a d istrib u tio n fun ction is sym m etric, then all central m om ents o f the order 3 o f the m arginal one o r tw o dim ensional d istrib u tio n s are equal to zero and 0 = 0. In the case w hen 0 ^ 0 wc can find such set A = { A : A ę R ‘ and 0(A) = 0}. T h e vector o f m ean values K A ) = I/<i(A).../it (A)] from tru n cated distrib u tio n wc define as the A -m ode o f this d istrib u tio n and it will be denoted by Л/(А ). It is obvious th a t the А -m ode can n o t necessary be equal to the m o d e o f the en tire d istribu tion . T he co n d itio n s o f such equality can be stated q u ite sim ple only in th e case o f one-dim ensional d istrib u tio n , see W yw iał (2000a-2000b). T h a t is why the p aram eter y(A) can be called a quasi-m ode as in one-dim ensional case.

T h e sim ple sam ple o f size n is denoted by X = [X*b where:

X u

X*, = ... , i = 1 ,..., n. T h e sam ple m om ents are as follows:

Cuv:(XiyXj,Xg)

=

I i ( x il- x - in x JI- x jr ( x gl- x ey, x , =

1

i x it

n,( = 1

L et X„.(A) (h = n , ..., 1) be such vector th at

d( X*w , X*(A)) = m ax im u m { 6 i,A}

where:

F w = { i : t = 1 ,..., n and X ^ ^ X * ^ for p > h ] ö i«*) = (X *1 - X * (A))r (X *i - X * (A)),

x *w = 1 Z x *<> nw = C a rd { Fw }

“ (*) teFm

— [-У 1(л)--Х t(A)], x *(rt) = X = - Y, X*,. ft t = 1 A tru n ca ted sam ple is identified by the set F {h).

(5)

Let

C ^ X f i X J h ) = — I ( Xlt - Y lwr ( X j t - x m n x el - x i(h)y , "(*) l«rw

L(/i) = [L x (A) L2(a) i j 3(A)] = [^i<A) ^ 2 (Л) ••• Av(*)J» where w = 1 x (к 2 + k(k - 1 )(k - 2)/6) and

Llw = [C30(^i, X 2\h) C03( X 2, X t \h) ... C0i ( Xk- u X k\h)l

U m ^ l C n i X t , X 2\h) C21(X lt В Д ) ... Cai( * » - ,, В Д Ca i(X* X k. m

L 3W = [ c m № , * 2. j f 3 |fc) c i n ( * i , * 2. З Д )

Схи(Хц_з, X*-!,

X k \h) C n i ( X k - 2, X k - i ,

А"лI/г)].

T h e vector p aram eter у can be estim ated bz m ean s o f the statistic G,(u) determ ined bz the follow ing procedure:.

Let

£(*) — [Li(h) L 2ihy . .L w(h)] L *()I) = m axim um {Lfw }

1= 1, ...,W

G i W = X *(u): и = max{/i: h = n,..., 1 and L*(A)< e}

w here e > 0. It seems th a t e should be assigned in such a way th a t the size nM o f the tru n ca ted sam ple is sufficiently large.

T h e next estim ato r is based on m easure o f m u ltiv ariate skewness defined by M a rd ia (1970). L et a ‘j(A) be an (i, j ) elem ent o f the m atrix £ - 1 (A) where Ц А ) = [oy(A)], Оу(A ) = t i u ( X „ Xy|A), is the variance-covariance m atrix o f the truncated distribution. In the case o f truncated distribution the skewness coefficient can be w ritten in the following way.

ß ( A ) = £ £ ^ ' H A W ' H ^ W ' H V r i ^ i X ^ X tl, X tl\A)tl l i l

{ i i . i j . b } i h . h . h )

( X j í t X h , X h I A )

w here {iA, i2, i3} and {ju j 2, ; 3} are variatio n s w ith replications. Each v ariatio n is determ ined on the basis o f sequence: 1, 2, ..., k.

(6)

T h e coefficient o f skewness from the truncated sam ple Fw is as follows: B(*) = I F I ((X*i — X*(A))TS(Ä)1(X*i — Х*(Л)))3 "(A) ijBFm where: S(/,) = [C 11(A)(X j, Xj)], C U(A)(X „ X j ) = Z ( Xu — X nh)) ( X Jt- X m ). ” <*> in f,,,

T h e second estim ato r o f quasi-m ode is as follows:

G 2(„) = X*(u): v = max{/j: h = n, ..., 1 and Bw < e}

The statistics G i (u) and G I(u) can be biased estim ato rs o f the m ode y. T h eir variances can be estim ated using the well know n m ethod o f jackknife.

Let us assum e th a t 0 , = [^ (.Y j) rj3(A J ... 7 3(Arlt) ] > 0 for the values x g B c R * o f a m ultidim ensional random variable X . Let A # <=B be such the sup-set th a t ОДА#) = 0. If D 2(X ,) ^ D ^ X ^ A J for each i = 1, ..., к and

D 2( X j ) > D 2( ^ | A #) for at least one index j where j = 1 then 1 (е( * ;|а ^ - у ^ ^ Ь е д - у , . ) 2.

i= i i=i

It m eans th a t the a p p ro p riate tru n ca tio n o f the m u ltiv ariate probability distrib u tio n can lead to decreasing the distance betw een m o d e and the expected value o f the tru n cated probability distrib u tio n . T h is leads to the conclusion th a t in special cases the expected values o f the above introduced estim ators can be closer to the m ode th an the vector o f m eans from n o n -tru n c ated sim ple sam ple.

Som e sim ulation m eth o d s should be used to analysis o f the accuracy of the proposed estim ators. T he sim ilar estim ators can be co n stru cted on the basis o f o th er coefficients o f m u ltivariate skewness e.g. like those proposed by W yw iał (1983, 1985).

III. M O D A L R K G R E S S IO N

Let us consider the follow ing regression m odel: Y = X B + U

w here Y 7 = [ Y t ... Yn] is the vector o f independent ra n d o m variables. T he m atrix o f n o n -ran d o m values o f ex planatory variables is d eno ted by X and it is o f dim ension n x r. T h e vector o f param eters В has dim ensions r x 1.

(7)

M oreover, E (U ) = 0 and variancc-covariariancc m atrix o f U is d iag o n al and D 2([/.) = o-2 and 7 з ( ^ ) > 0 for each i = 1 T h e m o d al values o f the variables = [ y , . . . y j will be denoted by yT = and the m odal values o f the variables U 7 = [ U . . U n] arc the sam e and equal to к. T he m odal regression is defined in the follow ing way:

у = XB + kJ„

w here J„ is the vector o f dim ensions n x 1.

U nder these assum ption Е(У) = XB can be estim ated by m ean s o f the statistic:

Ý = XB

where В is unbiased estim ator o f В obtained by the well k no w n m ethod o f least squares and

В = (Х Г Х Г 1 Xr Y It is obvious th a t E(Ý ) = E(Y)

T h e vector o f residual is as follows:

Ú = Y - Ý = M Y where

M = In —X (X 7 X )“ 1 X ', In is unit m atrix o f degree n.

T h e problem is determ ining the vector у o f m o d al values. Firslty, as suggested Paw łow ski (1973), the m o d e к should be estim ated by m eans o f a statistic G which is a function o f the residuals Ü. T h is leads to the follow ing estim ato r o f the regression

Ý = Ý + G J„ = XB + G J„

w here J„ is th e vector o f dim ensions n x l . In o rd e r to sim plify the co n sid eratio n let us assum e th a t к > 0. T he statistic G can tak e form o f the estim ato r o f m ode proposed in the first. Let Ü a i7(2)^ . . . < l / (n) be the sequence o f the o rd e r statistics section. T h e third central m o m en t from the right han d truncated sam ple is defined by the expression:

(8)

M 3( t f » ) ) = ! I O ? « - # # * ) 3, K k ś n 1= i

where:

v#k = t Z 0 » i= i

W hen we assum e th a t M 3(í7(n)) = M 3(í?) > 0, the sam ple quasi-m ode G is defined as follows:

0 = t / #c, if M 3( t f #e) < 0 and M 3( [ /#L) > 0 fo r k = c + 1

T h e statistic Ý is biased estim ato r o f у because G is usually biased estim ato r o f к as it was discussed in the previous p arag ra p h .

T h e idea o f co n stru c tio n o f the next estim ato r is based o n th e truncated least-square m eth o d proposed by R u p p crt and C arro lla (1980). Let Y(e) is consisted o f those variables o f the vector Y which indexes arc equal to ap p ro p ria te indexes o f residuals in the set {I?,: i = i7(c)} where evaluation o f Ü(e) was showed above. In the sam e way th e sub m atrix X(c) o f the o bservatio ns m atrix X is determ ined. N ext, we again evaluate the p aram eters o f the linear regression:

B(c) — (XÍ) x (c)) lX£)Y(c)

If e > r, the next estim ato r o f the m odal regression is as follows: Ý(c) = X(e)B(L,

In general case the statistic Y(c) is no t unbiased estim ato r o f the m odal regression.

Let us note th a t estim ators Ý and Ý(c) can be easy ad o p ted to evaluation co nditional prognosis o f the variable under study fo r som e fixed values of auxiliary variables. Let x be row vector (o f dim ensions r x 1) o f values of auxiliary variables. T he tw o predictors o f the value y(x) o f m o d al regression to r given vector x are as follows:

Y(x) = xB + G or Y(c)(x) = xB (e)

T h e m ean sq u are e rro r o f the estim ators o r predictors o f the m odal regression values and the param eters can be evaluated on the basis o f the jack k n ife m eth o d . T h e accuracy o f the b o th estim ato rs o r predicto rs o f the m o d al regression can be analysed on the basis o f a p p ro p ria te sim ulation studies.

(9)

R E F E R E N C E S

Jo h n so n N .L ., R o g ers С .Л . (1951), T h e m o m en t p ro b lem fo r u n im o d al d istrib u tio n s, Annals o f M a th em a tica l S ta tistic s, 22, 433-439.

M ardia K.V. (1970), M easures o f m ultivariate skcwneess and kurlosis with application, Biom etrika, 57, 3, 519-530.

Paw łow ski Z . (1973), P rognozy ekonom etryczne, P W N , W arszaw a.

R u p p c rt D ., C a rro ll R .J. (1980), T rim m ed least sq u ares e stim a tio n in the lin ear m o d el, Journal o f the A m erican S ta tistica l Association, 75, 828-838.

W yw iał J. (1983), N orm alized coefficients o f d eviation from m u ltin o rm al d istrib u tio n (in Polish). P rzegląd S ta ty sty c z n y , 30, 77-83.

W yw iał J. (1985), T est norm alności dla wielowym iarowej zm iennej losow ej dla dużych prób, (A test for n o rm ality o f a m ultid im en sio n al ra n d o m v a ria b le in the larg e sam ple case; in Polish), P rzegląd S ta ty sty c z n y , 32, 355-364.

W yw iał J. (2000a), E stim atio n o f d istrib u tio n fu n c tio n m ode on the basis o f sam ple m om ent o r sam ple m ed ian , Badania Operacyjne i D ecyzje, n o 2, 89 -98.

W yw iał J. (2000b), E stim a tio n o f m o d e on the basis o f a tru n c a te d sam ple, A cta U niversitätis Ijodziensis. Folia O econom ica, 152, 73-81.

Janusz Wywiał

O E S T Y M A C JI D O M IN A N T Y W IE L O W Y M IA R O W E J Z M IE N N E J L O S O W E J

Streszczenie

P raca dotyczy p roblem u estymacji dom in an ty rozkładu p ra w d opodobieństw a wielowymiarowej zm iennej losow ej. Z ajm o w an o się pro b lem em estym acji d o m in a n ty zm iennej losow ej ciągłej. A n alizo w an o jed n o m o d a in e ro z k ła d y p ra w d o p o d o b ień stw a.

W niniejszej p racy an alizo w an o głów nie klasę ro z k ła d ó w p ra w d o p o d o b ie ń stw a zm iennych losow ych ch ara k te ry zu ją cą się tym , że z ach o d zą d la nich pew ne nierów ności m iędzy w artościam i oczekiw anym i i d o m in a n tą ro zk ład u fu n k cją jeg o trzecich m o m en tó w cen traln y ch . D la takiej k lasy ro z k ła d ó w są w p ro w ad zo n e d w a esty m a to ry d o m in a n ty , k tó ry ch w yznaczanie w p raktyce m a c h a ra k te r iteracy jn y . Z g ru b sza rzecz biorąc, wyliczenie w a rto ści e sty m a to ra d o m in a n ty w iąże się z sukcesyw nym o bcinaniem obserw acji p ró b y d o chw ili, gdy z o sta n ą spełnione pew ne w arunki, a w szczególności, że pew na funkcja trzech m ieszanych m om entów centralnych rozkładu uciętego osiągnie w a rto ść zero. W ów czas oceną p u n k tu b ędącego d o m in a n tą ro z k ła d u w ielo­ w ym iarow ego są w łaśnie średnie z uciętych ro z k ład ó w brzegow ych z p ró b y . P ro p o n o w an e e sty m ato ry m o g ą być obciążone. W n iek tó ry ch p rz y p a d k a ch d a ło się oszaco w ać m aksym alny p o zio m tak ie g o obciążenia. P ro p o n o w an e są rów nież d w a e sty m a to ry regresji m odálnej.

Cytaty

Powiązane dokumenty

The relationship between the spectral properties of the transfer operator corresponding to a wavelet refinement equation and the L p -Sobolev regularity of solution for the equation

Keywords: radial basis functions, multivariate density estimation, dimension reduction, normal random projection, novelty

Definitions and terminology from the domain of hypergraph theory correspond to those given by Zykov in his survey article [ I I ]... Every maximal connected

 an undertaking occupying a dominant position on the relevant market for medicinal products which, in order to put a stop to parallel exports carried out by certain wholesalers

This means that the test we are using has almost no ability of distinguishing a value of p = 1 2 from a value of p = 0.51: if the true value of the parameter were p = 0.51, the value

The following examples show that if a metric space (S, %) has any of other listed geometrical properties then the space of random elements with a given met- ric equivalent

A sequence {Xn,n &gt; 1} of random elements Xn G X vaguely converges to a random element X G X, (Xn X, n —* oo) if the sequence {PXn, n &gt; 1} of generalized probability

In the following by N we shall denote a positive integer-valued random variable which has the distribution function dependent on a parameter 2(2 &gt; 0) i.e.. We assume that