• Nie Znaleziono Wyników

Estimation of Mode on the Basis of a Truncated Sample

N/A
N/A
Protected

Academic year: 2021

Share "Estimation of Mode on the Basis of a Truncated Sample"

Copied!
9
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S FOLIA OECONOMICA 152, 2000

II. STATISTICAL INFERENCE

J a n u s z W y w i a ł *

E S T IM A T IO N O F M O D E O N T H E BASIS O F A T R U N C A T E D S A M P L E

Abstract. The problem of estimation of the mode of a continuous distribution (unction is considered. The estimation of the mode based on estimators of the density function is well known, see e.g. H ä r d l c (1991) and P a r z e n (1962). New parameters of the continuous distribution function will be defined: the quasi-mode and mean median. They are parameters of the appropriately truncated random variable. Next, the estimators ol the mode, such as the sample quasi-mode or sample mean-median, are determined. These statistics are usually biased estimators o f the mode. Well known “jackknife” procedure is adapted to estimate Iheir mean square error. The accuracy of the mode estimation is studied on the basis of computer simulation.

1. THE BASIC DEFINITIONS AND NOTATION

The m ode, the m edian and the m ean o f a co n tin u o u s d istrib u tio n fu n c tio n are d en o ted by d, m and /t, respectively. T h e th ird ce n tral m o m en t is denoted by >/3. Let A + be the set o f such c o n tin u o u s and one-m odal density functions th a t the inequalities d ^ m ^ ß an d tj3 ^ 0 are fulfilled. T h e set A + can be treated as the class o f rig ht skewed co n ­ tin u o u s d istrib u tio n functions. Sim ilarly, let A _ be such a class o f co n ­ tin u o u s one-m odal density functions th a t d ^ m ^ n an d 0. H ence, the set A _ will be called the class o f the left skewed d istrib u tio n fun c­ tions.

L et us define the follow ing m o m en ts o f th e tru n ca ted distrib u tio n :

K a - h , )

=

t f ( x ) d x

(l)

where: t = 0, 1, 2, ..., a0 = a, h0 = b, F(b) = 1, F(a) = 0 and

(2)

u

F(u) = f f ( x ) d x . “ 00 M o reo v er, let us assum e:

bl)

J x f ( x ) d x , if tj3(a„ h , ) > 0 (2) where: a, + i = a„ bt+l = м(а„ b,) and

, . . i b;

Г,^ а" b,) = F(bt) - F(a,) f “ ^ (ар /,,))3dx (3) Í x f ( x ) d x , if ri3(a„ b , ) < 0 (4) where: а, +1 = /i(at, í»,) and bí+ i = b,.

Definition 1. Let / ( x ) be the density function in the interval I = <a, b> and let a, an d b, be such tru n ca tio n points th a t a„ b , e l and

T h e p aram eter g t = g ^ a , , b,) will be called the quasi-m ode o f th e co n tin u o u s d istrib u tio n function.

H ence, the qu asi-m ode is the m ean value o f a d istrib u tio n fun ctio n ap p ro p riately tru n ca ted in such a way th a t its third central m o m e n t takes the value zero or the left and right tru n ca ted po in ts are equal to each other.

T h e m edian m(a„ ht) o f the tru n cated d istrib u tio n in the po in ts a, < b, is determ ined in the follow ing way:

L et the expected value o f the tru n ca ted d istrib u tio n be defined as follows: gi(a„ bt) = n(at, h t), if rj3(at, bt) = 0, t = 0, 1, 2, ... (5)

it.Dt) i

J f ( x ) d x f ( x ) d x = F t(m(a„ b,)) = - (6)

t - 1

J x f ( x ) d x , if m(a„ bt) < ß(a„ bt) (7)

where: al + l = a„ bl+l = m(a„ b,),

l *'

J x f ( x ) d x , if m(a„ bt) < /i(a t, b,) (8)

(3)

Definition 2. Let f ( x ) be the density fun ctio n in the in terval I = <a, b> and let a, and h, be such tru n ca tio n points th a t a„ b , e l and

g,(a„ b,) = n(at, bt), if /1(0,, bt) = m(a„ bt), t = 0 , 1 , 2 , . . . (9) T h e p aram eter g2 = g2(a„ ht) will be called the m ean-m edian o f the d ist­ rib u tio n function.

T h e estim ato rs o f the p aram eters d(u) and m(u) are going to be defined in th e next p a ra g ra p h s. T hey can be used to e stim ate th e m o d e o f c o n tin u o u s d istrib u tio n functions.

2. ESTIMATION OF THE MODE BY MEANS OF THE AVERAGE FROM A TRUNCATED SAMPLE

Let Х ц) < X (2) < . . . ^ X (n) be the sequence o f the o rd e r statistics. T h e th ird central m o m ent from the tru n ca ted sim ple sam ple is defined by the expression: 1 _ C 3( a , b ) = l Y J( X (l)- X ( a , h ) ) \ \ < k = b - a + \ (10) K l = a where: X (a, b) = -K l-a Let us define the follow ing statistics:

^ з ( а1+1> bt+j) =

ÍC j(fl,, b, — Й), if C 3(a,, ht) > 0 and X ^ - ^ ^ X ( a t, bt) < Xb,-i,+1), = } c 3( a „ b t), if C 3(a„ bt) = 0,

[ c3(e t + h, bt), if C3(a t, bt) < 0 and A!"(&,+*—1 > ^ X ( o t, bt) < (11)

T h en , the sam ple quasi-m o d e Gx is defined as follows:

G, = X (a,, bt), if C3(flt, bt) = 0 Ф C3(flf-i, b ,_ i) (12) T h e statistic Gj will be usually a biased estim ato r o f th e m o d e d. Its bias can be reduced by m ean s o f the w ell-know n jack k n ife m eth o d . Let G\°, i = l , . . . , n, be th e q u asi-m o d e fro m th e sam ple w ith o u t th e i-th o b serv ation. T h e pseudovalues are determ ined by th e expression:

(4)

Z, = nGi — (л — 1)GP.

H encc, the jackknife type estim ato r o f the m o de d is as follows:

o . / = ž = í £ z , (1 3)

n 1 = 1

T h e estim ato rs o f the m ean -sq u are erro r arc defined by the follow ing expressions:

<i4>

s2( G v ) = i i b ) | , ( z ‘ - G v , ! <, s >

3. E STIM A TIO N O F T H E M O D E BY M EANS O F T H E M EA N-M ED IA N FR O M T H E TRU N CA TED SA M PLE

Let us define the m edian from a tru ncated sam ple in the follow ing way:

M(a„ b,) = X (e), w here e = Г -* ~ ?» + Л + i (] 6)

1,

M( a„ b, — h), if M(a,, bt) < X ( a t, bt) and X (b, - h) < M ( a t, bt) < X (bt- h+i),

= < M( a„ bt), if M(a„ bt) = X ( a„ b,), (17)

M( a, + h , b t), if M( a„ bt) > X( a„ bt) and X (ai+h)> M ( a t, bt) > X lttl+h- iy T h e follow ing estim ato rs o f the m ean -m ed ian o r the m od e o f a skewed d istrib u tio n can be defined. T he sam ple m ean-m edian G2 o f th e skewed d istrib u tio n are determ ined by the expression:

G 2(a„ bt) = X ( a „ bt), if M( a„ bt) = X ( a t, bt), t = 0, 1, 2, ... (18) T h en , the sam ple m edian M (a „ bt) is the m edian o f th e tru n ca ted sam ple determ ined by the follow ing sequence o f the ord er statistics: X M , X (bl) and the sam ple m ean-m edian G2(at, bt) is th e average o f these statistics.

(5)

U sually, both statistics G t and G2 are biased estim ato rs o f th e m o de d o f th e co n tin u o u s d istrib u tio n function. Ju st like in the case o f the statistics G j, we can try to reduce this bias by m eans o f the jackknife m ethod.

4. ESTIM A TIO N O F A M O D E OF A M U LTID IM EN SIO N A L D ISTRIB U TIO N

Let f ( x lt ..., xf) be a density function o f an r-dim ensional ra n d o m variable. T he central m om ents o f the o rd er 3 o f the r-dim ensional variable arc d enoted by:

00 00

Пu Ä X j ) = \ I ( x ^ E i X m X j - E i X j W f i x , , x r) d x l ... d x r

- 00 — 00

(19)

and the sam ple m om ents:

Cuv( X lt X j ) = l i ( X u - X i Y i X j , - X j Y , X , = 1 £ X lt (20)

n t =i n t= i

where и + v = 3 and u, v = 0 ,1 ,2 ,3 . Let us introd uce the follow ing vectors: ® — [ 01» • • •» ^и>] = [ I Чзо> X i , X 2) I I ^7о з ( i > г)1 • * • I *71г ( Х r - i , X r) |

\ti21( Xr, X r^ ) \ ] and L = [ l„ .... 1J = tlC30(A^j, X 2)\ IC03( X 2, X,)\... Cn ( Xr^ , X r)\ \ C2i ( X r, X r - i ) |] , w here w = r 2.

It is well know n th a t if a d istrib u tio n function is sym m etric, th en all central m om ents o f the o rd e r 3 o f the m arginal one o r tw o dim ensional d istrib u tio n s are equal to zero. Hence: 0 = 0.

In o rd e r to sim plify o u r analysis, let us consider a tw o-dim ensional ran d o m variable. T h e d a ta observed in the sim ple sam ple will be d eno ted by {(x„ y,)}, i = l , ..., n. F ro m the geom etrical p o in t o f view (x „ y t) are th e co o rd in ates o f a p o in t A i ( i = l , ..., n) in a tw o dim ensional plane. Let P i, P 2, P h, h < n , be p oints selected from th e set A = {Л(} in such a way th a t they are apexes o f a convex polygon and the p o in ts o f the set A are inside this polygon. T h e set P = { P ^ P 2, ..., P h} d eterm in e s th e polygon P . H ence the edge o f the polygon P can be treated as a convex envelope o f the points {Л,}.

Let us construct the polygons: P = P (0) э P (1) э P (2)... э Р (1). T h e polygon P(0 is o b tained tro u g h rejecting one apex P ^ e P “ " 1), г = 1, 2, ... T h e cen tral m o m en ts o f the d a ta creating the co ordinates o f p oints belonging to the polygon P (,) are as follows:

C uv( X i, X J\ B ( t , z ) ) = - ± r £ ( x l t - x m x j e - x f y (21) n 1 ( e X , e « r . г,)}

(6)

where: X Г = - Ц I *,e> П [ ( e : A. e B( l , z ) B(t, z) — — P z and P z e P w , t = 1, 2, ... and B(0, z) = P. M oreover: L(t, z) = [1,(1, z) ... IJt , z)] = [|C30,(X i, X 2\B(t, z))\...\C2i( Xr. u X r\B(t, z))|] T he polygon P (') is determ ined th ro u g h d ro p p in g th e p o in t Pq from the polygon р (' -1) in such a way th a t

l(t, q) = m in m ax {If t, z)} (22)

T his pro cedure leads to tru n ca tio n o f the tw o dim ensional sam ple. T bis algo rithm is stopped if l(t, q) = 0 or t = n — 2 and G = (G„ ,G j) = (X j'\ X (j0) are estim ators o f d o m in an ts (d!t dj) o f tw o-dim ensional ra n d o m variables ( X t, X j ) . Hence:

G = ( X\ ,\ X f ) if l ( t , q ) = 0 and l ( t - l , q ) > 0 (23) T h e presented m ethod o f estim ation o f the d o m in an ts o f a tw o dim ensional variable can be easily generalized on a case o f d istrib u tio n o f m o re th an tw o variables. W y w i a ł (1998) considered a one-dim ensional case o f this m eth o d including sim ulation study o f estim ation precision.

5. SIMULATION ANALYSIS

In o rd e r to study the basic properties o f the intro du ced estim ato rs o f the m o d e, a sim ulation analysis is developed. T h e estim ation o f the m o d e o f the follow ing trian g u lar d istrib u tio n is co n sid ered 1:

(7)

/00

=

J 2(x + b — 2a) b 2 — 2a2 2x + 2b b2 — 2 a 2 b2 — 2 a 2 fo r x < 0 v x > b > 2a, fo r 0 < x < a, fo r a < x < b . (24)

T h e d istrib u tio n function is as follows:

F(x) = 10 for x < 0, x2 2 (b - -2a) for 0 < x < a, b2 — 2a2 b2 — 2 a2 , ( * - b ) 2 b2 — 2a2 for a < x < b , 0 for x > b .

T h e fu n ction inverse to the d istrib u tio n fu nction is as follows:

F- l (y) =

j

2 — b + y / ( b - 2 ) 2 + (b 2 — 2)y for

b - y j ( b 2 - 2 ) ( \ - y ) for where:

у о = F(a) = (2b — 3 a)a b2 - 2 a 2 ' T h e m om ents are as follows:

r m _ fe3~ 2fl3 F (Y > \ b * ~ 2 a * F ( X ) <j/l3(b2 - 2a2)2 ~ e t u i л-гч 6(b2 - 2a2) E(X>) (25) (26) b 5 - 2a 5 I0(b2 — 2 a 2)

(8)

In the sim ulation experim ent we assum ed th a t a = 1 and h = 5. H ence, th e m o d e d = 1.0, E ( X ) = 1.7826, D \ X ) = 1.3368, t]3( X ) = 0.7646 and the skewness coefficient ß , = 0.4947.

T h e p seudo-random values o f th e trian g u lar d istrib u tio n are generated. Values o f the estim ators are determ ined on the basis o f 2000 sam ples o f a fixed size. T he sim ulation experim ent was developed on the basis o f the well know n SPSS statistical package.

Let us assum e th a t the expected value E(.) and the variance D 2(.) are estim ated on the basis o f the co m p u ter sim ulation. Hence:

E (G .) = ‘ M * , » ; i m > - i 1 ( 9 , ( M ) - E ( G , ) ) \ k - 1 , 2 ,

1У (*|) 1У (X,)

M S E ( G k) = D 2(Gk) + [ E ( G k) - d \ \

e(Gk) = ( E i S i G ^ / s / M S E i G J - 1, b(Gk) = 100%(E(G*) - á )/M S E (G J , where: к = 1,2, gk( x t) is the value o f the estim ator determ ined o n the basis o f the sim ple sam ple {x(} o f size n and the n u m b er o f such sam ples is d eno ted by N. T h e m ean square e rro r o f the estim ato r Gk is d en oted by M S E (G J. T he results o f the sim ulation study o f the q u asi-m od e d istrib u tio n are show n in the T ab . 1 and 2.

T a b l e 1

The simulation results for the distribution of the estimator G, in the case of the left side truncated triangular distribution for the parameters a = 1 and b - 5

n £(G.) 0 2(G.) Vm s e c g,) й(С,)% e(Glf )%

10 1.591 1.146 1.289 21.0 113.3

20 1.373 0.983 1.051 12.6 246.4

30 1.296 0.857 0.907 10.7 218.6

50 1.235 0.673 0.713 10.9 295.1

(9)

T a b l e 2

The simulation results for the distribution of the estimator G2 in the case of the left side truncated triangular distribution for the parameters a = 1 and b = 5

n E(G2) D2(Gj) v/mSE(G3) b(G2)% e(Ga/)%

10 1.950 0.943 1.339 50.4 - 1 1 9

20 1.669 0.928 1.144 34.2 20.5

30 1.516 0.826 0.974 28.1 54.6

50 1.134 0.655 0.669 4.0 126.3

100 1.147 0.516 0.537 7.5 1778

S o u r c e : the author’s own elaboration.

T h e analysis o f the T ab . 1 and 2 lead to the follow ing conclusions: T h e bias o f b o th estim ators are ra th e r large. T h e bias o f th e e stim a to r G L is larger then the bias o f the G2 only for the size o f the sam ple n = 50. Sim ilarly, the relative efficiency b(G x) is larger th an h(G2) fo r th e stan d ard deviation s, an d the m ean sq u are erro rs o f b o th estim ato rs decrease w hen the sam ple size increases. T h e relative biases e(Gf ) and e(G2) increase when the sam ple size increases. Especially, the bias o f the estim ato r S^Gy) is to o large. In conclusion, neither statistic S(G X) n o r S( G2) are useful as estim ato rs o f the m ean square errors v /M S E (G j) and > /M S E (C j), respectively. G enerally how ever, the estim ato r G2 is slightly b etter th an G t

REFERENCES

H ä r d l e W. (1991), Smoothing Techniques, Springer Verlag, New York-Berlin-Heidel- berg-London-Paris-Tokyo-Hong Kong-Barcelona.

H e l l w i g Z. (1995), Elementy rachunku prawdopodobieństwa i statystyki matematycznej, PWN, Warszawa.

P a r z e n E. (1962), On Estimation o f a Probability Density and Mode, “ Annals of Mathematical Statistics” , 35, 1005-1076.

W у w i l l J. (1998), Estimation o f Distribution Function Mode on the Basis o f Sample Moment or Sample Median, Submitted to Badania Operacyjne i Decyzje.

Cytaty

Powiązane dokumenty

[r]

Here, we bench- mark a set of commonly used GFP variants to analyze gene expres- sion in the low-GC-rich Gram-positive model organisms Bacillus subtilis, Streptococcus pneumoniae,

nadbudow ana nad L, m a m odel, a jako teoria drugiego rzędu - nie. M iano­ wicie: m ożna w sposób niesprzeczny nadbudow ać teorię mnogości nad logiką pierwszego rzędu,

64 AAN, PRM, sygn. Wytyczne polityki wobec mniejszości niemieckiej; A. 2241, Ściśle tajna notatka z konferencji międzyministerialnej z dnia 15 listopada 1937 pod przewodnictwem

Czytałem to dzieło z uznaniem dla autorów i z pewną zazdrością, gdyż nie dostrzegam szans, by analogicz­ na publikacja (choćby nawet nie tak obszerna i starannie

Nasyce- nie sie˛ problemami prawdziwie wielkimi w trakcie wielokrotnej lektury pism Sienkiewicza doprowadziło do tego, z˙e w szkole s´redniej, a póz´niej w Wyz˙- szym

Początek Dekady EZR oraz ogłoszenie strategii EZR nastąpiły w  2005  r. Działania te podjęła Organizacja Narodów Zjednoczonych, będąca przedstawicielstwem wszystkich

Dlatego od momentu powstania i przy- jęcia przez Radę Ministrów Programu Polityki Prorodzinnej Państwa pojawiło się wiele zarzutów i wniosków o ograniczenie liczby kierunków