• Nie Znaleziono Wyników

Application of Simulation Methods to Estimation of Variance of Nonparametric Sequential Estimator of Mean

N/A
N/A
Protected

Academic year: 2021

Share "Application of Simulation Methods to Estimation of Variance of Nonparametric Sequential Estimator of Mean"

Copied!
9
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

F O L IA O E C O N O M IC A 194, 2005

D o r o t a Pe k a s i e w ic z *

APPLIC A TIO N OF SIM U L A T IO N M E T H O D S TO ESTIM A TIO N O F VARIANCE OF

N O N PA R A M E T R IC SEQ U E N T IA L E STIM A TO R OF M EAN

Abstract

N o n p a ram e tric sequential m ethods allow to estim ate u n k now n param ete r o f ra n d o m variable d istrib u tio n , w hen th e d istrib u tio n o f the variable is u n k n o w n . W e c an ap p ly these m eth o d s to d iffere n t sam p lin g designs.

T h is p ap er c o n ta in s a p ro p o sal o f ap plying sim ulation m eth o d s to estim a te the variance o f a n o n p a ra m e tric e stim a to r o f m ean. A n a p p lic atio n o f b o o ts tra p m eth o d s to estim ate the varian ce o f a sy n th etic e stim a to r o f th e m ea n in sequential e stim a tio n is also presented.

Key words: sequential estim a tio n , b o o ts tra p m eth o d , sy nthetic estim ato r.

I. IN T R O D U C T IO N

N o n p aram etric sequential m ethods allow to estim ate u n kn ow n probability d istrib u tio n function o f random variable under investigation, as well as an un know n d istrib u tio n p aram eter, when the d istrib u tio n class is unknow n.

T o estim ate d istrib u tio n p aram eters e.g. the m ean, sequential procedures o f n o n p a ra m e tric point or interval estim ation can be applied for different sequential sam pling schemes.

T h e idea o f sequential p oint estim ation o f the m ean is to determ in e its estim ato r values from random sam ple with size th a t m inim izes risk function. I f we d o n o t take into acco u n t sam pling costs connected w ith the sequential d raw ing o f elem ents, the risk function is equal to th e m ean sq u are erro r an d , in the case o f unbiased estim ators, to relevant e stim a to r’s variance. In such cases we determ ine the estim atio n precision th ro u g h establishing

(2)

the n u m b er w hich c a n n o t be exceeded by variance or the m ean square error. Sam ple size is being increased sequentially until the desired precision o f estim ation is achieved.

C alculating the variance o f param eter estim ato r at every stage o f the sequential p ro cedure is n o t always easy, som etim es even im possible. D ue to th e com plicated form o f som e p aram eter estim ators often, we d o not have any in fo rm atio n ab o u t their variance or ab o u t variance estim ators. In th e ap p lica tio n o f the estim ators o f this kind to sequential estim ation o f the m ean there appears a problem o f defining the sto p p in g procedure for the sam ple size increasing process. In this p ap e r we suggest using, in such cases, sim ulation m eth o d s o f estim ating variance such as M ah alan o b is m eth o d , jack k n ife o r b ootstraps. A n exam ple o f applying b o o tstra p m ethod to estim ate th e variance o f synthetic estim ator o f su b p o p u latio n m ean is also presented.

L et X be ra n d o m variable and 0 be u n k no w n m ean value o f this variable. By dn we den o te an estim ator o f p aram eter 0 determ ined from sam ple X l ,.. ., X„.

A t every stage o f a sequential process aim ing a t assessing the value o f 0 we face the statistical problem o f decision m aking. W e decide ab o u t increasing the sam ple by draw ing one or a few m o re elem ents o r ab o u t co ncluding the sam pling process and treatin g the e stim a to r’s value th a t we arrived at, as a good enough estim ate o f 0.

W hen we m ak e a decision we can define the loss fu nction as follows (see Sen, 1984):

where g is a n onnegative, nonincreasing fun ctio n o n (0, -I- oo), w ith p ro ­ perty g(0) = 0, and c(n) is the cost function associated w ith draw ing sam ple.

L et us assu m e th a t n0(n0 > 1) exists, such th a t fo r every n > n0

T h e risk incurred in the estim ation o f the m ean 0 from an n-elem ent sam ple is given by the form ula:

II. N O N P A R A M E T R IC S E Q U E N T IA L P O IN T E S T IM A T IO N

L„ = 0 ( |t f „ - 0 |) + c(n), (1)

Е [ з ( |0 „ - О |) ] exists.

(3)

The q u a n tity R n m ay be viewed as the sum o f tw o functions o f argum ent n. T h e function с is nondecreasing (e.g. c(n) = pn, where p deno tes the cost o f draw in g one elem ent), and if 0„ is a consisten t estim ato r o f 0, the function E[g(\C)n — 0 |)] is nonincreasing, m o n o to n e and converges to 0 (see Sen, 1984).

Wc m ak e a decision at such a sam ple size n for which th e function R n reaches its m inim um . F o r established functions g an d с we define n* as follows:

n* - min{n > n0: R„ = inf R m}. (3)

m

T h e value o f estim ato r f)„. is the m inim um risk estim ate o f p aram eter 0. If, in the process o f the sequential p o in t estim ation o f 0, we do n o t tak e in to acco u n t the sam pling cost, the risk incurred in th e sequential estim ation o f the p aram eter from n-elem ent sam ple will depend only on E t o ( l ^ - O I ) ] .

I f g(x) = x 2, it im plies R„ = E(|0„ — 0 \ 2) and for an unbiased estim ato r dn, functio n R„ will be the variance o f param eter 0 estim ato r (R n = D 2(fín)), while for a biased estim ator 0n, function R n will be equal to the m ean square e rro r ( R n = D 2(0„) + E ( |E ( 0 J - 0 |2)).

T h e sequential p o in t estim ation o f p aram eter 0 w ith the aid o f an unbiased estim ato r will be characterized w ith th e sto p p in g rule for the draw ing process determ ined by inequality D 2( 0 „ ) ^ 2, w here £ is a fixed estim ation precision. T h a t m eans th a t we will be sequentially adding elements to the sam ple as long as estim ato r’s variance is less o r equal e2. In such cases size n* ensuring the estim atio n o f p aram eter 0 w ith precision not sm aller th a n the fixed one, is defined in the follow ing way:

n* = m in{n: Ď 2(0„)sge2}, (4)

where D 2(0„) is an estim ato r o f variance D 2(0„). T h e value o f estim ator 0„. is th e e s tim a te o f p a r a m e te r 0 w ith p re c is io n n o t e x c e e d in g f..

U I. S IM U L A T IO N M E T H O D S O F V A R IA N C E E S T IM A T IO N

W e use sim ulation m ethods in estim ating the variance o f estim ato rs of the m ean , w hen we do no t know neither the variance n o r any variance estim ato r (usually d u e to the com plicated form ula o f the estim ato r o f the m ean). T h e use o f such estim ators in sequential estim atio n is possible, but

(4)

we en c o u n te r the problem o f defining the stopp ing rule. If we d o n o t take into acco u n t the sam pling costs, the sequential estim atio n procedure will be connccted with estim ating variance (m ean square erro r), a t every stage, and com p arin g it with a fixed estim ation precision.

T o assess the variance o f the estim ators o f the m ean the following sim ulation m eth o d s fo r every stage o f the sequential pro cedu re are proposed:

- the M ah a la n o b is m ethod; - the jack k n ife m ethod; - the b o o tstra p m ethod.

M ahalanobis m ethod. In the first step o f the M ah alan o b is procedure in the sequential estim ation o f p aram eter 0 from the d raw n /^-elem ent sam ple we c re a te s d isjo in t su b sam p les ( s > 2), c o n ta in in g l [ elem e n ts, fo r / = 1 , 2 , s. I f sam ple size kn (и = 1 ,2 ,...) can be dcvidcd by a fixed n u m b er s, th e subsam ple .sizes are determ ined from th e form ula:

l l„ = — for i = 1, ..., s.

s (5)

In o th er case the sizes o f p articu lar subsam ples are given by the form ula:

V + 1 for 1= 1, . . . , с

(

6

)

ii -t-1 for 1 = 1, ..., с fo r i = с + 1, ..., s w here с = k . — s

F ro m each o f the s subsam ples we determ ine the value o f estim ator (i = 1, ..., s), then from all sam ples contain in g k y elem ents we calculate the value o f estim ato r 0„ fo r n = 1. T h e variance o f this estim ato r is assessed w ith the form ula:

1 — 5

D 2(9„) = s (s _ ^ Ż 0 C - i ) 2 for « = 1 ,2 , ..., (7)

w here N den otes the p o p u latio n size.

I f the assessed variance value is less o r equal to a fixed n u m b er e2 , the

value o f estim ato r 0 1, determ ined from k t elem ents, con stitu tes a good estim ate o f the m ean o f th e random variable considered. In o th e r case we enlarge the sam ple by d elem ents (d = 1 ,2 , ...,). A t the л-th stage sam ple

(5)

will have k„ = k t + (n — 1 )d elem ents. T h e subsam ple sizes are determ ined from fo rm u la (5) o r (6). T h e value o f estim ato r 0„ is determ ined from /c„-element sam ple and its variance from s values o f the estim ato r determ ined from / j,-element subsam ples. T h e sequential p o in t estim ation procedu re is repeated until the value o f the variance estim ato r o f the estim ato r used is less or equal to e2.

Jac k k n ifc m ethod. In the first step (n = 1) we draw according to an arb itra ry schem e, but no t the layer one, k l pop u latio n elem ents and similarly as in the M ah a la n o b is m ethod we create s subsam ples. H ow ever, these subsam ples are created in different way. We ran d o m ly rem ove from the к !-elem ent sam ple l[ elem ents from form ula (5) o r (6), respective to fc, being divisible by s o r not.

T h e variance o f the m ean estim ato r is estim ated from the subsam plcs consisting o f kn — l ‘n elem ents on the basis o f the form ula (see B racha, 1998):

0 2 ( ^ = ф Ь ) w

where

0;; = s < ? „ - ( 5 - i ) f y , (9)

where 0 and 0,< are the estim ators o f the m ean determ ined from the fl % ^„-element p ro p e r sam ple and (kn — /j,)-element i-th subsam ple (i = 1, ..., s), respectively.

If D 2(0„) is greater then e2, we enlarge the sam ple by d elem ents and repeat the above procedure.

W hen we apply the M ah alan o b is or jack k n ife m eth o d to assess the variance o f the estim ato r considered we are encou ntered w ith the problem o f the n u m b er s o f subsam ples determ ined a t the begining an d th e follow ing steps o f sequential estim ation. If the sam ple size at a certain stage o f the sequential p ro ced u re grows considerably w ith respect to the initial sam ple th en s should be changed. In such cases the use o f the M ah a la n o b is and jack k n ife m eth od is m ore problem atic.

A n o th er m ethod th a t m ay be useful to assess the variance o f the estim ator in sequential estim ation is the b o o tstra p m ethod.

B ootstrap method. In the first step o f the sequential estim ation o f sam ple we d ra w k t elem ents from the pop u latio n . T h ese sam ple observ atio n allow to determ ine the value o f estim ator 0„ for n = 1. T h en from the existing sam ple we g enerate J (e.g. J — 1000) realizations o f th e b o o tstra p sam ple i.e. the sam ple generated according to the b o o tstra p distrib utio n:

(6)

Р(Х„ = x j = — for m = 1 , kn and b = 1 , 2 , . . . , J (10) Kn

W c d eterm ine the value o f the estim ato r öh „ for n = 1 and b = 1,2 T h e e stim a to r’s variance is assessed with the form ula:

= ! E A . - V 0 1 )

J i>= 1 (in the first stage we assum e n = 1).

If the co ndition D 2( 0 J < £ 2 does n o t hold, we d raw the fixed num ber d o f elem ents, pool them and the sam ple tog eth er, arriv in g at the sam ple consisting o f kn + 1 = k n + d elem ents, for n = 1 ,2 , 3,... F o r th e pooled sam ple we d eterm ine J realizations o f the b o o tstra p sam ple an d we assess the e stim a to r’s variance. We go on with the described process until the variance assessm ent does no t exceed e2.

IV. N O N P A R A M E T R IC S E Q U E N T IA L E S T IM A T IO N O F S U B P O P U L A T IO N M E A N

L et us consider the problem o f the estim ation o f the m ean in some distinguished su b p o p u latio n s o f the whole p o p u latio n , w hen we do n o t know its d istrib u tio n . If we have som e in fo rm atio n a b o u t the values o f the ra n d o m variable in the w hole p o p u latio n as well as a b o u t an auxiliary variable correlated w ith the variable considered, we m ay use it in synthetic estim ators, w hich are m ore effective th an direct estim ato rs i.e. determ ined from the su b p o p u latio n sam ple (see D ol, 1991).

S ynthetic estim ato rs are constructed on assum ption th a t the p aram eters o f th e d istrib u tio n o f the variable investigated in su b p o p u latio n are very close to th e param eters o f the d istrib u tio n o f this variable in th e whole p o p u latio n .

L et us d en o te the variable investigated by X an d the auxiliary variable by У. M oreo v er, let us assum e th a t the p o p u latio n and su b p o p u latio n are divided into G layers.

O ne o f synthetic estim ators o f the m ean 0o o f v ariab le X fo r sub ­ p o p u latio n is given by the form ula:

(7)

w here N 0 is th e su b p o p u latio n size, T Y 0g - the global value o f the auxiliary variable У in the 0-th layer o f the su b p o p u latio n , У . - the m ean value of.л. У variable У in the g-th layer o f the p o p u latio n , X ng - the m ean value o f variable X in the 0-th layer o f the pop u latio n estim ated from /c„-element sam ple o f the whole p opulation.

We sta rt estim atio n from the /c,-elem ent sam ple. W e ca lcu late th e v alu e o f th e e s tim a to r given by fo rm u la (1 2) an d by m e a n s o f th e b o o tstra p m eth od wc estim ate its variance. If the variance does n o t exceed e2 we conclude the estim ation pro cedure jud ging the value o f estim ator (1 2) we got as a good enough estim ate o f the su b p o p u latio n m ean of variable X . O therw ise, we draw new elem ents and we rep eat the whole procedure.

V. E X A M P L E O F T H E A P P L IC A T IO N O F B O O T S T R A P M E T H O D T O V A R IA N C E E S T IM A T IO N O F S Y N T H E T IC E S T IM A T O R O F M E A N

In ord er to present som e possible applications o f the sequential estim ation o f the m ean with b o o tstra p variance estim ation at every sequential step, a p o p u latio n o f 60 000 elem ents and its su b p o p u latio n o f 3000 elem ents arc generated in the following way:

1. W e generate N , = 20000 values according to the N (4, 1) d istrib u ­ tion; we get values x 1(. . . , x 20oo and first = 1000 values are transform ed follow ing the form ula: x, = x ( + £„ where £, is generated from the N (1, 3) d istrib u tio n . T h e elem ents x l t ..., x 10oo. *1 0 0 1» •••> * 2 0 0 0 0 co n stitu te the first layer o f the p o p u latio n and the elem ents x1, . . . , x1 0 0 0 are the first small area layer.

2. W e generate N 2 = 20000 values according to the N (6, 2) d istrib utio n and we get values x 2ooot. •••> X4.0000 ar*d fu st k 2 = 1000 values are tra n sfo r­ m ed fo llo w in g th e fo rm u la from p re v io u s p o in t. T h e elem ents X2 0 0 0 1. - . X 2 1 0 0 0. X2 1 0 0 1, X4 0 0 0 0 constitute the second layer o f th e population and elem ents х 200о ь •••> * 2 1 0 0 0 are the second sm all area layer.

3. Wc generate N 3 = 20000 values according to th e N ( 8, 3) d istrib utio n and we get values, Х4.0 0 0 1» •••> x 60oooo an^ f*rst ^ 3 = 1000 are transfo rm ed fo llo w in g th e fo rm u la from p o in t 1. T h e elem en ts X4.0 0 0 1, •••> *4 1 0 0 0; X4.1 0oi»—> x 6oooo constitute the th ird layer o f the p o p u la tio n and elem ents x 40ooi>..., x 41000 are the third small area layer.

4. W e arran g e sequence y u ..., Убоооо follow ing th e fo rm u la y, — 3 x , + <!;,, w here are generated from the N ( 0, a) d istrib u tio n for a = 1 , 3 ,5,7 .

5. F ro m th e w hole p o p u latio n we draw dependently a sam ple o f size 1000 and we determ ine the su b p o p u latio n m ean estim ato r value given by fo rm u la (12).

(8)

6. F ro m the draw n sam ple we generate 1000 b o o tstra p sam ples and we assess the variance o f estim ator (12) with form ula (11).

7. T h e variance value we got is com pared with fixed value e2and we conclude the p rocedure or we d raw 10 new elem ents from the pop ulatio n and wc sta rt all over again from point 5.

T h e estim ates o f the m ean o f the su b p o p u latio n considered, com puted with the help o f sequential estim ation with b o o tstra p estim atio n o f variance, are presented in T ab le 1.

Table 1. I'he sizes o f sam ples fo r sequential subpo p u latio n m ean estim ation for fixed precisions e

N u m b e r o f experim ent S ta n d a rd d e v ia tio n a V alue o f £ V alue o f e stim a to r 9 j

О Sam ple size

1 1 0.03 7.0636 0.0281 1000 2 0.01 7.0381 0.0026 1500 3 3 0.06 7.0823 0.0468 1000 4 0.03 7.0536 0.0284 1540 5 0.01 7.0423 0.0068 3750 6 5 0.09 7.1178 0.0823 1000 7 0.06 7.0399 0.0533 1190 8 0.03 7.0721 0.0366 4160 9 7 0.09 7.1342 0.0987 1000 10 0.08 7.0928 0.0773 1180 11 0.06 7.0641 0.0586 3230 Source: A u th o r’s calculations.

T h e actu al value o f 0Q was 7.0355. In m ost o f experim ents carricd out (a p a rt from experim ent 8 and 9) an estim ate o f p aram eter actu al value 0o w ith accuracy n o t exceeding a fixed value was received. T h is m eans th at the use o f b o o tstra p s to assess the variance o f the m ean estim ato r used, was successful in the cases analysed. T he sequential sam ple size w as strictly connected w ith the prefixed accuracy o f estim ation an d , obviously, it grew with the grow ing estim ation accuracy.

VI. FIN A L R E M A R K S

T h e estim ation o f the m ean, with the help o f sequential m ethods, is connected with establishing a criterion o f stopping the sequential sam pling and, in consequence, with the variance or the m ean squ are e rro r o f the estim ato r applied.

(9)

In the p ap e r som e sim ulation m eth ods o f variance estim atio n, am ong other, o f th e estim ato rs o f the m ean, th a t can used in sequential estim ation were presented. P articu lar atten tio n was devoted the b o o tstra p m ethod. T his m eth od was used to estim ate the variance o f the synthetic estim ator o f su b p o p u latio n m ean. In the cases studied, the use o f the b o o tstrap m ethod in sequential estim ation led to the estim ates o f su b p o p u latio n m ean w ith precision n o t exceeding a prefixed num ber.

R E F E R E N C E S

B rach a Cz. (1998), M eto d a reprezentacyjna w badaniach opinii p ublicznej i m arketingu, W yd. F.fekt, W arszaw a.

D o l W. (1991), S m a ll A rea Estim ation. A Synthesis between Sam pling T heory a n d Econom etrics, W o lters N o o rd h o ff G ro n in g e n .

Sen P.K.. (1984), N o n p a ram e tric sequential estim ation, [in:] H andbook o f S tatistics, vol. 4, Elsevier Science P ublishers, 487-514.

D o r o ta P e k a s ie w ic z

Z A S T O S O W A N IE M E T O D S Y M U L A C Y JN Y C H D O S Z A C O W A N IA W A R IA N C JI s f:k w e n c y j n e g o E S T Y M A T O R A N IE P A R A M E T R Y C Z N E G O Ś R E D N IE J

Streszczenie

N iep aram etry czn e m eto d y estym acji sekw encyjnej pozw alają, p rzy ró żn y ch schem atach lo so w an ia p ró b y , oszacow ać niezn an y p a ra m e tr ro z k ła d u zm iennej losow ej, gdy k lasa ro zkładu tej zm iennej je s t nieznana.

S ekw encyjna estym acja p u n k to w a średniej zm iennej losowej polega n a w yznaczeniu wartości e sty m a to ra średniej n a p o d staw ie p ró b y losow ej, k tó rej liczebność je s t o d p o w ied n io zw iększana tak , ab y fu n k cja ry zy k a osiągnęła m inim um . Jeśli nie uw zględniam y k o sztó w zw iązanych z p obieraniem elem entów d o próby, to funkcja ryzyka jest ró w n a błędowi średniokw adratow em u, a w p rz y p ad k u esty m a to ró w n ieobciążonych w ariancji stosow anego e sty m a to ra.

W yznaczenie w ariancji e sty m a to ra szacow anego p a ra m e tru nie zaw sze jes t łatw e, a czasami n a w et ok azu je się niem ożliw e. W statystyce m ałych o bszarów często stosuje się esty m ato ry pośred n ie, k tó re są bardziej efektyw ne niż bezpośrednie, ale ich sk o m p lik o w an a p o sta ć spraw ia, że często nie m am y inform acji ani o ich w ariancji, ani o esty m ato rze w ariancji (lub błędzie śred n io k w ad rato w y m ). Przy zastosow aniu lego typu esty m ato ró w w estym acji sekwencyjnej średniej p o jaw ia się p ro b lem ze sform ułow aniem p ro ced u ry z atrzy m a n ia p ro cesu pow iększania p ró b y . W p ra cy p ro p o n o w a n e je s t stosow anie, w takich p rz y p ad k a ch , sym ulacyjnych m etod szacow ania w ariancji, m .in. m eto d y M a h a la n o b isa , jac k k n ife i m eto d y b o o tstra p o w ej. P o n a d to w pracy przedstaw iony jest przykład zastosow ania m etody b ootstrapow ej d o szacow ania wariancji syntetycznego e sty m a to ra średniej d la p o d p o p u lacji.

Cytaty

Powiązane dokumenty

Here, we bench- mark a set of commonly used GFP variants to analyze gene expres- sion in the low-GC-rich Gram-positive model organisms Bacillus subtilis, Streptococcus pneumoniae,

Especially, skills and tools used in building the existing house are made available by neighbours, family, friends and community members (Smits, 2017).. Therefore, it is essential

The results of averaged traffic analysis over different time scales shows the relation between variance-to-mean ratio and the level of self-similarity, which in turn affects

At this stage of analysis, both edge beams were simply supported (Fig. This modification was necessary to perform indirect pretensioning of roof cables, which is caused by the

Combination of cross and nested classifications A xC(B) Now, we consider a model in which the fixed treatment A and the random treat ­ ment B form a cross classification

Traktując tytuł jako wykładnik kategorii dyskursywnych, typowych dla blogów ekonomicznych, w analizach zwracam uwagę na leksykę etykietującą, na pewne aspekty semantyki i

Dla nauczycieli informatyki kursy te mogą stanowić źródło zadań na kółko programistyczne lub jako formę sprawdzenia się dla uczniów szczególnie zdolnych. Kurs

Science-world news | Science Section | EDUKACJA BIOLOGICZNA I ŚRODOWISKOWA 1/2015 80 IN SHOR T SCIENCE SCHOOL Science-world news Resurrection plants.. Resurrection plants