A C T A U N I V E R S I T A T I S L O D Z I E N S I S
F O L IA O E C O N O M IC A 194, 2005
D o r o t a Pe k a s i e w ic z *
APPLIC A TIO N OF SIM U L A T IO N M E T H O D S TO ESTIM A TIO N O F VARIANCE OF
N O N PA R A M E T R IC SEQ U E N T IA L E STIM A TO R OF M EAN
Abstract
N o n p a ram e tric sequential m ethods allow to estim ate u n k now n param ete r o f ra n d o m variable d istrib u tio n , w hen th e d istrib u tio n o f the variable is u n k n o w n . W e c an ap p ly these m eth o d s to d iffere n t sam p lin g designs.
T h is p ap er c o n ta in s a p ro p o sal o f ap plying sim ulation m eth o d s to estim a te the variance o f a n o n p a ra m e tric e stim a to r o f m ean. A n a p p lic atio n o f b o o ts tra p m eth o d s to estim ate the varian ce o f a sy n th etic e stim a to r o f th e m ea n in sequential e stim a tio n is also presented.
Key words: sequential estim a tio n , b o o ts tra p m eth o d , sy nthetic estim ato r.
I. IN T R O D U C T IO N
N o n p aram etric sequential m ethods allow to estim ate u n kn ow n probability d istrib u tio n function o f random variable under investigation, as well as an un know n d istrib u tio n p aram eter, when the d istrib u tio n class is unknow n.
T o estim ate d istrib u tio n p aram eters e.g. the m ean, sequential procedures o f n o n p a ra m e tric point or interval estim ation can be applied for different sequential sam pling schemes.
T h e idea o f sequential p oint estim ation o f the m ean is to determ in e its estim ato r values from random sam ple with size th a t m inim izes risk function. I f we d o n o t take into acco u n t sam pling costs connected w ith the sequential d raw ing o f elem ents, the risk function is equal to th e m ean sq u are erro r an d , in the case o f unbiased estim ators, to relevant e stim a to r’s variance. In such cases we determ ine the estim atio n precision th ro u g h establishing
the n u m b er w hich c a n n o t be exceeded by variance or the m ean square error. Sam ple size is being increased sequentially until the desired precision o f estim ation is achieved.
C alculating the variance o f param eter estim ato r at every stage o f the sequential p ro cedure is n o t always easy, som etim es even im possible. D ue to th e com plicated form o f som e p aram eter estim ators often, we d o not have any in fo rm atio n ab o u t their variance or ab o u t variance estim ators. In th e ap p lica tio n o f the estim ators o f this kind to sequential estim ation o f the m ean there appears a problem o f defining the sto p p in g procedure for the sam ple size increasing process. In this p ap e r we suggest using, in such cases, sim ulation m eth o d s o f estim ating variance such as M ah alan o b is m eth o d , jack k n ife o r b ootstraps. A n exam ple o f applying b o o tstra p m ethod to estim ate th e variance o f synthetic estim ator o f su b p o p u latio n m ean is also presented.
L et X be ra n d o m variable and 0 be u n k no w n m ean value o f this variable. By dn we den o te an estim ator o f p aram eter 0 determ ined from sam ple X l ,.. ., X„.
A t every stage o f a sequential process aim ing a t assessing the value o f 0 we face the statistical problem o f decision m aking. W e decide ab o u t increasing the sam ple by draw ing one or a few m o re elem ents o r ab o u t co ncluding the sam pling process and treatin g the e stim a to r’s value th a t we arrived at, as a good enough estim ate o f 0.
W hen we m ak e a decision we can define the loss fu nction as follows (see Sen, 1984):
where g is a n onnegative, nonincreasing fun ctio n o n (0, -I- oo), w ith p ro perty g(0) = 0, and c(n) is the cost function associated w ith draw ing sam ple.
L et us assu m e th a t n0(n0 > 1) exists, such th a t fo r every n > n0
T h e risk incurred in the estim ation o f the m ean 0 from an n-elem ent sam ple is given by the form ula:
II. N O N P A R A M E T R IC S E Q U E N T IA L P O IN T E S T IM A T IO N
L„ = 0 ( |t f „ - 0 |) + c(n), (1)
Е [ з ( |0 „ - О |) ] exists.
The q u a n tity R n m ay be viewed as the sum o f tw o functions o f argum ent n. T h e function с is nondecreasing (e.g. c(n) = pn, where p deno tes the cost o f draw in g one elem ent), and if 0„ is a consisten t estim ato r o f 0, the function E[g(\C)n — 0 |)] is nonincreasing, m o n o to n e and converges to 0 (see Sen, 1984).
Wc m ak e a decision at such a sam ple size n for which th e function R n reaches its m inim um . F o r established functions g an d с we define n* as follows:
n* - min{n > n0: R„ = inf R m}. (3)
m
T h e value o f estim ato r f)„. is the m inim um risk estim ate o f p aram eter 0. If, in the process o f the sequential p o in t estim ation o f 0, we do n o t tak e in to acco u n t the sam pling cost, the risk incurred in th e sequential estim ation o f the p aram eter from n-elem ent sam ple will depend only on E t o ( l ^ - O I ) ] .
I f g(x) = x 2, it im plies R„ = E(|0„ — 0 \ 2) and for an unbiased estim ato r dn, functio n R„ will be the variance o f param eter 0 estim ato r (R n = D 2(fín)), while for a biased estim ator 0n, function R n will be equal to the m ean square e rro r ( R n = D 2(0„) + E ( |E ( 0 J - 0 |2)).
T h e sequential p o in t estim ation o f p aram eter 0 w ith the aid o f an unbiased estim ato r will be characterized w ith th e sto p p in g rule for the draw ing process determ ined by inequality D 2( 0 „ ) ^ 2, w here £ is a fixed estim ation precision. T h a t m eans th a t we will be sequentially adding elements to the sam ple as long as estim ato r’s variance is less o r equal e2. In such cases size n* ensuring the estim atio n o f p aram eter 0 w ith precision not sm aller th a n the fixed one, is defined in the follow ing way:
n* = m in{n: Ď 2(0„)sge2}, (4)
where D 2(0„) is an estim ato r o f variance D 2(0„). T h e value o f estim ator 0„. is th e e s tim a te o f p a r a m e te r 0 w ith p re c is io n n o t e x c e e d in g f..
U I. S IM U L A T IO N M E T H O D S O F V A R IA N C E E S T IM A T IO N
W e use sim ulation m ethods in estim ating the variance o f estim ato rs of the m ean , w hen we do no t know neither the variance n o r any variance estim ato r (usually d u e to the com plicated form ula o f the estim ato r o f the m ean). T h e use o f such estim ators in sequential estim atio n is possible, but
we en c o u n te r the problem o f defining the stopp ing rule. If we d o n o t take into acco u n t the sam pling costs, the sequential estim atio n procedure will be connccted with estim ating variance (m ean square erro r), a t every stage, and com p arin g it with a fixed estim ation precision.
T o assess the variance o f the estim ators o f the m ean the following sim ulation m eth o d s fo r every stage o f the sequential pro cedu re are proposed:
- the M ah a la n o b is m ethod; - the jack k n ife m ethod; - the b o o tstra p m ethod.
M ahalanobis m ethod. In the first step o f the M ah alan o b is procedure in the sequential estim ation o f p aram eter 0 from the d raw n /^-elem ent sam ple we c re a te s d isjo in t su b sam p les ( s > 2), c o n ta in in g l [ elem e n ts, fo r / = 1 , 2 , s. I f sam ple size kn (и = 1 ,2 ,...) can be dcvidcd by a fixed n u m b er s, th e subsam ple .sizes are determ ined from th e form ula:
l l„ = — for i = 1, ..., s.
s (5)
In o th er case the sizes o f p articu lar subsam ples are given by the form ula:
V + 1 for 1= 1, . . . , с
(
6)
ii -t-1 for 1 = 1, ..., с fo r i = с + 1, ..., s w here с = k . — sF ro m each o f the s subsam ples we determ ine the value o f estim ator (i = 1, ..., s), then from all sam ples contain in g k y elem ents we calculate the value o f estim ato r 0„ fo r n = 1. T h e variance o f this estim ato r is assessed w ith the form ula:
1 — 5
D 2(9„) = s (s _ ^ Ż 0 C - i ) 2 for « = 1 ,2 , ..., (7)
w here N den otes the p o p u latio n size.
I f the assessed variance value is less o r equal to a fixed n u m b er e2 , the
value o f estim ato r 0 1, determ ined from k t elem ents, con stitu tes a good estim ate o f the m ean o f th e random variable considered. In o th e r case we enlarge the sam ple by d elem ents (d = 1 ,2 , ...,). A t the л-th stage sam ple
will have k„ = k t + (n — 1 )d elem ents. T h e subsam ple sizes are determ ined from fo rm u la (5) o r (6). T h e value o f estim ato r 0„ is determ ined from /c„-element sam ple and its variance from s values o f the estim ato r determ ined from / j,-element subsam ples. T h e sequential p o in t estim ation procedu re is repeated until the value o f the variance estim ato r o f the estim ato r used is less or equal to e2.
Jac k k n ifc m ethod. In the first step (n = 1) we draw according to an arb itra ry schem e, but no t the layer one, k l pop u latio n elem ents and similarly as in the M ah a la n o b is m ethod we create s subsam ples. H ow ever, these subsam ples are created in different way. We ran d o m ly rem ove from the к !-elem ent sam ple l[ elem ents from form ula (5) o r (6), respective to fc, being divisible by s o r not.
T h e variance o f the m ean estim ato r is estim ated from the subsam plcs consisting o f kn — l ‘n elem ents on the basis o f the form ula (see B racha, 1998):
0 2 ( ^ = ф Ь ) w
where
0;; = s < ? „ - ( 5 - i ) f y , (9)
where 0 and 0,< are the estim ators o f the m ean determ ined from the fl % ^„-element p ro p e r sam ple and (kn — /j,)-element i-th subsam ple (i = 1, ..., s), respectively.
If D 2(0„) is greater then e2, we enlarge the sam ple by d elem ents and repeat the above procedure.
W hen we apply the M ah alan o b is or jack k n ife m eth o d to assess the variance o f the estim ato r considered we are encou ntered w ith the problem o f the n u m b er s o f subsam ples determ ined a t the begining an d th e follow ing steps o f sequential estim ation. If the sam ple size at a certain stage o f the sequential p ro ced u re grows considerably w ith respect to the initial sam ple th en s should be changed. In such cases the use o f the M ah a la n o b is and jack k n ife m eth od is m ore problem atic.
A n o th er m ethod th a t m ay be useful to assess the variance o f the estim ator in sequential estim ation is the b o o tstra p m ethod.
B ootstrap method. In the first step o f the sequential estim ation o f sam ple we d ra w k t elem ents from the pop u latio n . T h ese sam ple observ atio n allow to determ ine the value o f estim ator 0„ for n = 1. T h en from the existing sam ple we g enerate J (e.g. J — 1000) realizations o f th e b o o tstra p sam ple i.e. the sam ple generated according to the b o o tstra p distrib utio n:
Р(Х„ = x j = — for m = 1 , kn and b = 1 , 2 , . . . , J (10) Kn
W c d eterm ine the value o f the estim ato r öh „ for n = 1 and b = 1,2 T h e e stim a to r’s variance is assessed with the form ula:
= ! E A . - V 0 1 )
J i>= 1 (in the first stage we assum e n = 1).
If the co ndition D 2( 0 J < £ 2 does n o t hold, we d raw the fixed num ber d o f elem ents, pool them and the sam ple tog eth er, arriv in g at the sam ple consisting o f kn + 1 = k n + d elem ents, for n = 1 ,2 , 3,... F o r th e pooled sam ple we d eterm ine J realizations o f the b o o tstra p sam ple an d we assess the e stim a to r’s variance. We go on with the described process until the variance assessm ent does no t exceed e2.
IV. N O N P A R A M E T R IC S E Q U E N T IA L E S T IM A T IO N O F S U B P O P U L A T IO N M E A N
L et us consider the problem o f the estim ation o f the m ean in some distinguished su b p o p u latio n s o f the whole p o p u latio n , w hen we do n o t know its d istrib u tio n . If we have som e in fo rm atio n a b o u t the values o f the ra n d o m variable in the w hole p o p u latio n as well as a b o u t an auxiliary variable correlated w ith the variable considered, we m ay use it in synthetic estim ators, w hich are m ore effective th an direct estim ato rs i.e. determ ined from the su b p o p u latio n sam ple (see D ol, 1991).
S ynthetic estim ato rs are constructed on assum ption th a t the p aram eters o f th e d istrib u tio n o f the variable investigated in su b p o p u latio n are very close to th e param eters o f the d istrib u tio n o f this variable in th e whole p o p u latio n .
L et us d en o te the variable investigated by X an d the auxiliary variable by У. M oreo v er, let us assum e th a t the p o p u latio n and su b p o p u latio n are divided into G layers.
O ne o f synthetic estim ators o f the m ean 0o o f v ariab le X fo r sub p o p u latio n is given by the form ula:
w here N 0 is th e su b p o p u latio n size, T Y 0g - the global value o f the auxiliary variable У in the 0-th layer o f the su b p o p u latio n , У . - the m ean value of.л. У variable У in the g-th layer o f the p o p u latio n , X ng - the m ean value o f variable X in the 0-th layer o f the pop u latio n estim ated from /c„-element sam ple o f the whole p opulation.
We sta rt estim atio n from the /c,-elem ent sam ple. W e ca lcu late th e v alu e o f th e e s tim a to r given by fo rm u la (1 2) an d by m e a n s o f th e b o o tstra p m eth od wc estim ate its variance. If the variance does n o t exceed e2 we conclude the estim ation pro cedure jud ging the value o f estim ator (1 2) we got as a good enough estim ate o f the su b p o p u latio n m ean of variable X . O therw ise, we draw new elem ents and we rep eat the whole procedure.
V. E X A M P L E O F T H E A P P L IC A T IO N O F B O O T S T R A P M E T H O D T O V A R IA N C E E S T IM A T IO N O F S Y N T H E T IC E S T IM A T O R O F M E A N
In ord er to present som e possible applications o f the sequential estim ation o f the m ean with b o o tstra p variance estim ation at every sequential step, a p o p u latio n o f 60 000 elem ents and its su b p o p u latio n o f 3000 elem ents arc generated in the following way:
1. W e generate N , = 20000 values according to the N (4, 1) d istrib u tion; we get values x 1(. . . , x 20oo and first = 1000 values are transform ed follow ing the form ula: x, = x ( + £„ where £, is generated from the N (1, 3) d istrib u tio n . T h e elem ents x l t ..., x 10oo. *1 0 0 1» •••> * 2 0 0 0 0 co n stitu te the first layer o f the p o p u latio n and the elem ents x1, . . . , x1 0 0 0 are the first small area layer.
2. W e generate N 2 = 20000 values according to the N (6, 2) d istrib utio n and we get values x 2ooot. •••> X4.0000 ar*d fu st k 2 = 1000 values are tra n sfo r m ed fo llo w in g th e fo rm u la from p re v io u s p o in t. T h e elem ents X2 0 0 0 1. - . X 2 1 0 0 0. X2 1 0 0 1, X4 0 0 0 0 constitute the second layer o f th e population and elem ents х 200о ь •••> * 2 1 0 0 0 are the second sm all area layer.
3. Wc generate N 3 = 20000 values according to th e N ( 8, 3) d istrib utio n and we get values, Х4.0 0 0 1» •••> x 60oooo an^ f*rst ^ 3 = 1000 are transfo rm ed fo llo w in g th e fo rm u la from p o in t 1. T h e elem en ts X4.0 0 0 1, •••> *4 1 0 0 0; X4.1 0oi»—> x 6oooo constitute the th ird layer o f the p o p u la tio n and elem ents x 40ooi>..., x 41000 are the third small area layer.
4. W e arran g e sequence y u ..., Убоооо follow ing th e fo rm u la y, — 3 x , + <!;,, w here are generated from the N ( 0, a) d istrib u tio n for a = 1 , 3 ,5,7 .
5. F ro m th e w hole p o p u latio n we draw dependently a sam ple o f size 1000 and we determ ine the su b p o p u latio n m ean estim ato r value given by fo rm u la (12).
6. F ro m the draw n sam ple we generate 1000 b o o tstra p sam ples and we assess the variance o f estim ator (12) with form ula (11).
7. T h e variance value we got is com pared with fixed value e2and we conclude the p rocedure or we d raw 10 new elem ents from the pop ulatio n and wc sta rt all over again from point 5.
T h e estim ates o f the m ean o f the su b p o p u latio n considered, com puted with the help o f sequential estim ation with b o o tstra p estim atio n o f variance, are presented in T ab le 1.
Table 1. I'he sizes o f sam ples fo r sequential subpo p u latio n m ean estim ation for fixed precisions e
N u m b e r o f experim ent S ta n d a rd d e v ia tio n a V alue o f £ V alue o f e stim a to r 9 j
О Sam ple size
1 1 0.03 7.0636 0.0281 1000 2 0.01 7.0381 0.0026 1500 3 3 0.06 7.0823 0.0468 1000 4 0.03 7.0536 0.0284 1540 5 0.01 7.0423 0.0068 3750 6 5 0.09 7.1178 0.0823 1000 7 0.06 7.0399 0.0533 1190 8 0.03 7.0721 0.0366 4160 9 7 0.09 7.1342 0.0987 1000 10 0.08 7.0928 0.0773 1180 11 0.06 7.0641 0.0586 3230 Source: A u th o r’s calculations.
T h e actu al value o f 0Q was 7.0355. In m ost o f experim ents carricd out (a p a rt from experim ent 8 and 9) an estim ate o f p aram eter actu al value 0o w ith accuracy n o t exceeding a fixed value was received. T h is m eans th at the use o f b o o tstra p s to assess the variance o f the m ean estim ato r used, was successful in the cases analysed. T he sequential sam ple size w as strictly connected w ith the prefixed accuracy o f estim ation an d , obviously, it grew with the grow ing estim ation accuracy.
VI. FIN A L R E M A R K S
T h e estim ation o f the m ean, with the help o f sequential m ethods, is connected with establishing a criterion o f stopping the sequential sam pling and, in consequence, with the variance or the m ean squ are e rro r o f the estim ato r applied.
In the p ap e r som e sim ulation m eth ods o f variance estim atio n, am ong other, o f th e estim ato rs o f the m ean, th a t can used in sequential estim ation were presented. P articu lar atten tio n was devoted the b o o tstra p m ethod. T his m eth od was used to estim ate the variance o f the synthetic estim ator o f su b p o p u latio n m ean. In the cases studied, the use o f the b o o tstrap m ethod in sequential estim ation led to the estim ates o f su b p o p u latio n m ean w ith precision n o t exceeding a prefixed num ber.
R E F E R E N C E S
B rach a Cz. (1998), M eto d a reprezentacyjna w badaniach opinii p ublicznej i m arketingu, W yd. F.fekt, W arszaw a.
D o l W. (1991), S m a ll A rea Estim ation. A Synthesis between Sam pling T heory a n d Econom etrics, W o lters N o o rd h o ff G ro n in g e n .
Sen P.K.. (1984), N o n p a ram e tric sequential estim ation, [in:] H andbook o f S tatistics, vol. 4, Elsevier Science P ublishers, 487-514.
D o r o ta P e k a s ie w ic z
Z A S T O S O W A N IE M E T O D S Y M U L A C Y JN Y C H D O S Z A C O W A N IA W A R IA N C JI s f:k w e n c y j n e g o E S T Y M A T O R A N IE P A R A M E T R Y C Z N E G O Ś R E D N IE J
Streszczenie
N iep aram etry czn e m eto d y estym acji sekw encyjnej pozw alają, p rzy ró żn y ch schem atach lo so w an ia p ró b y , oszacow ać niezn an y p a ra m e tr ro z k ła d u zm iennej losow ej, gdy k lasa ro zkładu tej zm iennej je s t nieznana.
S ekw encyjna estym acja p u n k to w a średniej zm iennej losowej polega n a w yznaczeniu wartości e sty m a to ra średniej n a p o d staw ie p ró b y losow ej, k tó rej liczebność je s t o d p o w ied n io zw iększana tak , ab y fu n k cja ry zy k a osiągnęła m inim um . Jeśli nie uw zględniam y k o sztó w zw iązanych z p obieraniem elem entów d o próby, to funkcja ryzyka jest ró w n a błędowi średniokw adratow em u, a w p rz y p ad k u esty m a to ró w n ieobciążonych w ariancji stosow anego e sty m a to ra.
W yznaczenie w ariancji e sty m a to ra szacow anego p a ra m e tru nie zaw sze jes t łatw e, a czasami n a w et ok azu je się niem ożliw e. W statystyce m ałych o bszarów często stosuje się esty m ato ry pośred n ie, k tó re są bardziej efektyw ne niż bezpośrednie, ale ich sk o m p lik o w an a p o sta ć spraw ia, że często nie m am y inform acji ani o ich w ariancji, ani o esty m ato rze w ariancji (lub błędzie śred n io k w ad rato w y m ). Przy zastosow aniu lego typu esty m ato ró w w estym acji sekwencyjnej średniej p o jaw ia się p ro b lem ze sform ułow aniem p ro ced u ry z atrzy m a n ia p ro cesu pow iększania p ró b y . W p ra cy p ro p o n o w a n e je s t stosow anie, w takich p rz y p ad k a ch , sym ulacyjnych m etod szacow ania w ariancji, m .in. m eto d y M a h a la n o b isa , jac k k n ife i m eto d y b o o tstra p o w ej. P o n a d to w pracy przedstaw iony jest przykład zastosow ania m etody b ootstrapow ej d o szacow ania wariancji syntetycznego e sty m a to ra średniej d la p o d p o p u lacji.