A C T A U N I V E R S I T A T I S L O D Z I E N S I S
FO LIA O E C O N O M IC A 175, 2004
T o m a s z Ż ą d ł o *
O N M E A N S Q U A R E E R R O R
O F S Y N T H E T IC R E G R E S S IO N E S T IM A T O R
Abstract. The problem o f estim ation o f the total value in a small dom ain is considered. Because o f the small num ber (or the lack) o f elements o f the considered su bpopulation in the sample, inform ation on all drawn elements is used. The synthetic regression estim ator is presented. T he equations o f the bias and the mean square error for any sam ple design are derived. The problem o f the assum ptions on the population and the d om ain ’s structure due to the bias and M SE reduction is considered. The im portance o f the bias influence on the accuracy o f the estim ation is presented. T he possibility o f the increase o f the M SE and bias due to the increase o f the sam ple size is shown. The approxim ate equations o f the bias and mean square error for the sim ple random sampling without replacem ent are derived. The accuracy o f the synthetic regression estim ator (based on approxim ate equations and the simulation study) and the H orw itz-Thom pson direct estimator is compared. T h e com parison is based on agricultural data from D ąbrow a Tarnowska region. The entire pop ulation consist o f 8624 farms and it includes the dom ain o f interest - Bolesław com m une - with 588 farms.
Key words: small area statistics, indirect estim ators, synthetic estim ators.
1. IN T R O D U C T IO N
S ynthetic estim ation is considered b o th in Polish (e.g. B r a c h a 1994), B r a c h a 1996, G e t k a - W i l c z y ń s k a 2000 an d in foreign lite ratu re (e.g. S ä r n d a l , S w e n s s o n , W r e t m a n 1992). In this p ap e r eq u atio n s o f the m ean sq uare e rro r and the bias o f the synthetic regression estim ato r are presented tak in g differences between the p o p u latio n and d o m a in ’s stru ctu re into consideration.
T h e term “sy n th etic” m eans, th a t the estim ato r uses in fo rm atio n bo th on surveyed sm all d o m ain and period o f tim e and also on o th er d o m ain s an d /o r perio ds o f tim e. T h e estim ato r considered in this p aper is d o m ain indirect estim ator. T he idea o f synthetic estim ation is based on the assum ption, th a t som e relatio n sh ip betw een the variable o f in terest and the auxiliary
variable observed in the en tire p o p u latio n is the sam e in sm all area. Because m o st often the assu m p tio n is no t m et, it m u st be stressed, th a t synthetic estim ato rs are biased.
2. M E A N S Q U A R E ERROR (M SE ) FOR ANY S A M P L IN G D E SIG N
C o n sid eratio n s are conducted for any sam pling design. It is assum ed, th a t the sam ple Sis d raw n from the entire p o p u latio n by sam ple design P ( S ) with first o rd e r inclusion probabilities w here i = l ,
...,
N . F o r any sam ple S with size n d raw n from the p o p u latio n ę w ith th e size N , Sd = s n í} ,,, w here th e d -th d o m a in is d en o ted by ęd. T h e size o f S d equals nd (ran d o m variable) and the size o f ęd equals N d. T h e set o f elem ents o f the p o p u la tio n , which belong to d-th d o m ain ęd, could be w ritten as f t d = Sdu J d, w here S d denotes elem ents o f th e J -th d o m ain , which were n o t d ra w n to the sam ple. E q u atio n s o f th e M S E and the bias for sim ple ra n d o m sam pling w ith out replacem ent will also be derived in this paper.S ynthetic regression e stim ato r o f to tal value in sm all d o m ain , w hich is m ost often m o re precise th a n synthetic ratio estim ato r, is as follows:
fSYN-regr = N d [ f + J}( xd
- X)] = ^
Ý'*9'+
N J ( x d ~ x )(1)
w here
N d - sm all d o m ain size, N - p o p u la tio n size,
л J И
У = Y 1 - H o rw itz -T h o m p so n (H T ) estim ato r o f the m ean value " i e S n i
o f the variable o f in terest in the p o p u latio n , J x •
X = Y 1 - H T estim ato r o f the m ean value o f th e aux iliary variable N u s n i
in the p o p u latio n ,
x d - the m ean value o f th e auxiliary variable in d-th sm all d o m ain , x - the m ean value o f the auxiliary variable in the p o p u latio n ,
Д = — --- --- - - the estim ato r o f regression coefficient in the K x - f ) 2 '
ieS Щ
Ý re<>r = д г(У + ß ( 5 i _ X ) - regression estim ato r o f the to tal value in the p opulation .
Let the bias o f the regression estim ato r o f th e to tal value in the p o p u latio n be d en o ted by B regr = E (Ý re*r — У). T h e bias B regr fo r sim ple random sam ple w ithout replacem ent is as follows (e.g. W y w i a ł 1992, s. 137):
K . . . = E ( Y ) _ Z T ] { *2l ( x , y ) - + + 0 ( N n ~ 2) (2) where fc3( x ) =
, k2l(x,y)
=c
2i(x,y)
s/c2(x)[ct(x) - cj(x)]
ч / с 20 ' ) [ с 4( х ) - c i ( x ) ]B2(x)
= c 4( x ) c j 2( x ) , С ц (х ,у ) K x .ľ ) = J c 2( x ) c 2(y) CM = i l ( x i - x )r> c r(y) = ~ £ (y, - у )r, c rt(x, y) = I £ (x i - * ) r(tt - У)*-ie (l iv ( е й л i6HLet us derive the e q u a tio n o f synthetic regression estim ato r o f th e to tal value in sm all d o m ain fo r any sam ple design:
E ( y s ™ - r ..0 — Уd = ~ E (? " * ') + E0)Nj[xd- x ) - Y d- N^ Y + ~ jY =
w N N
= B '" " + E t f ) N J x t - x ) + ( ^ Y - y\ = (3 )
- f ' « ’•" + Е(Д) X - X ,) - ^ У -
Y.
If the m ean value o f the variable o f interest in th e p o p u la tio n equals the m ean value o f the variable o f interest in d-th sm all d o m ain and if the m ean value o f auxiliary variable in the p o p u latio n equals the m ean value
N o f auxiliary variable in d-th sm all d o m ain , then E( Y $rN~ reer) - Y d = — B rear. In this case the bias o f synthetic regression estim ato r o f to tal value in small d o m ain is the fu n ctio n o f the bias o f regression e stim a to r o f the to tal value in the p o p u la tio n and it is the less, th e less is th e value o f
ra tio o f d o m ain size and p o p u latio n size. It is w orth stressing, th a t for sim ple ran d o m sam ple it is o f o rd e r 0 ( N ■ n ~ 1), so it decreases due to the increase o f sam ple size.
T h e eq u a tio n (3) could also be w ritten as follows:
* [ (
B reer- ( E { J } ) - ß ) J x - Xn a N + ^ y - O - Ä - X ,[(
L et us n o tic e , th a t th e value o f \ [ ^ Y — yA — ß i ^ X — X d dX - X . do es n o t d ep e n d on sam p le size. L et us c o n s id e r th e ex p ressio n^ Breer — (Е(Д) — ß) — ß( ^j y X — - ^ У |- T h e ab solute value o f th e bias o f regression e stim a to r o f to ta l value in th e p o p u la tio n d en o ted by the expression B rear reduces d u e to the increase o f sam ple size. T h e sam e property has the bias o f estim ato r o f regression coefficient d en o ted by (ß(ft) — ß)- Because the lim it o f the sum o f sequences eq uals the sum o f
t —d-B reer—(E(ft) — ß) — ß[N Í N
N \ N
decreases d u e to the increase o f sam ple size. T h e absolu te value o f this expression can decrease m o noton ically due to the increase o f sam ple size. It should be m en tio n ed , th a t if values o f b oth considered elem ents, into which the bias o f synthetic regression estim ato r was decom po sed, have opposite signs, then the absolute value o f the bias o f synthetic regression estim ato r can grow due to the increase o f sam ple size. It is also possible, th a t the ab so lu te value o f the bias o f synthetic regression estim ato r does n o t change m o n o to n ic ally due to the increase o f sam ple size.
A nalysing the eq u a tio n (3), it is w o rth stressing, th a t using synthetic regression estim ato rs it should be assum ed, that:
• M ' and (4)
N Y N X K ’
w hat can be w ritten as:
N d = X d = Y d
C o m p arin g afo rem en tio n ed assum ption (4) w ith assu m p tio n using for
X Y
synthetic ra tio e stim ato r given by expression it should be m
en-Л. Y
tioncd, th a t assu m p tio n for synthetic regression estim ato rs is m o re re strictive.
Let us derive e q u a tio n o f the m ean square e rro r o f the estim ato r Ý SYN regr tak in g the assu m p tio n (4) into consideration.
M S E (ý s w - i w ) = E ^ d Ý"»' + N j ( x d- x ) - YdJ = -
E
( j j + NN Y - NN i Y + " Л * . - П - Y dJ -_ E ( ” ' ( ý"" -
Y )+
" j Г -У, -
ß- E ® +
X - И У =
=
+
Y . J + (0>ф) + Е ‘ф ) ) ф х - х Х +
• E { ( ( Ý ~ " - E ( r e»0) + ( Е ( Г ^ ) - У ))(Д - Е(Д)) + Е(Д))} +- 2 E 0 ) ( ^ Y - Y ^ X - X ty
= ( ^ ) Ím se < í ’'"> + ( ^ ' у - у‘) ! + № !(й +
X - х Х +
+ 2 w ( w у ~ у ' ) в " " ~ 2 n ( Ty * ~ Х ‘ ) ( с т ( f " " - f t + Е <АВ " " ) + - 2 Е ( й ( ^ У - у , ) ( ^ Х - ^ . (5)I f the assu m p tio n (4) is m et, then M S E ( f J 1'JV_reer) = ^ ~ ^ M S E ( F 'effr),
/iV V iV
E q u a tio n (5) could also be w ritten as follows:
D2( f " " ) + D2( Д ) ( ^ X
-M SE(1 +
2
(6)
F irst three elem ents o f above-m entioned sum (6) form the variance o f
ý s Y N - reer estim ato r. It can be w ritten by altern ativ e expression:
T h e fo u rth elem ent o f th e sum (6) is squared bias o f ý*™ -'«»' estim ato r, which is given by the e q u a tio n (3).
T h e value o f D2(ý * rjw_,eír) decreases due to the increase o f sam ple size. T h e value o f sq u a re d bias o f synthetic regression e s tim a to r given by
to the increase o f sam ple size. If th e increase o f value o f sq u ared bias due to the increase o f sam ple size is higher th a n the decrease o f value o f D2( ý J ľN" reer) d u e to th e in crease o f sam p le size, th e n v alu e o f M S E ( F YN~reer) will increase due to the increase o f sam ple size. It is also possible, th a t the value o f the M S E o f synthetic regression e stim a to r does n o t change m o n o to n ic ally d u e to the increase o f sam ple size.
3. M E A N S Q U A R E ERR OR FOR SIM P L E R A N D O M S A M P L IN G W IT H O U T R EPLAC EM EN T
T h e accuracy o f synthetic ra tio estim ato r will be com p ared w ith the accuracy o f H o rw itz -T h o m p so n estim ato r o f to tal value in th e d o m ain , w hich fo r sim ple ra n d o m sam pling w ithout replacem ent is given by e q u ation:
expression
2
Its variance is as follow s ( S ä r n d a l , S w c n s s o n , W r e t m a n 1992): D H f 1 - J V * — — ( S ‘ ; N ~ N ‘ * A u ( r " > - JV N n N ( s ->'‘ + — J T y ‘ } where Ч . а - Т г Ц - i i y i - y * ) 1- , y <i 1 i = i
S ynthetic regression estim ato r given by the e q u a tio n (1) fo r any sam ple design, for sim ple ra n d o m sam pling w ithout replacem ent is as follows:
ý S Y N - r e e r = N where Q = - x s ) 2> &X, = —Ц Е (*1 - * s)(y, - y s), n M eS n l i e s 1 " _ 1 n
*5 = ; E*i. ys = ~ Ett-
n i = l ” / = 1U sing follow ing expressions: Six = — — E (xf - x ) 2, S.2, = —L - V (y - y ) 2,
« ЧбО N - l ien
N— ^ x ‘ - * ) ( У « - Л . ^ * = г г т Е К - ^ ) 2. = ~— r ECľi - Уж)2>
” i ei l " — 1ł e S n — 1 i 6 S
= 7 3т E (x i ~ *s)(.V( - y s) let us specify ab o v e -m en tio n ed elem en ts o f 1 ieS
e q u a tio n s (5) o f M S E ( f J ł'N_,ee'') fo r sim ple ra n d o m sam p lin g w ith o u t replacem ent.
F irst (see e.g. B r a c h a 1996), , N — n
M SE( Y reer) = N 2——— S L ( l — r2(x, y)) + 0 ( N 2n ~ 2). N n
Second,
where
M SEO » = M S e ( ^ ) = - У - D \ ? xy) - 2 S ' ;yc ov (S. xy, § } x) + U*y
+ ( ' |г У е>! (Я , f 0 ( п ~ ),
+ 0 ( п - 2) (7) w here (see e.g. W y w i a ł 2 0 0 0)
D2(S?*y) = (c o v .2 2(x, у) - с оv?u (x, у)) + 0 ( N ‘) + 0 (n 2),
D2(S ? J = (co v .4(x) - cov?2(x)) + 0 ( N *) + 0 ( n 2),
and (the eq u a tio n is derived in the P a rt A.3 o f the A pp end ix)
cov(S?xy, S.2,) = (c o v .31(x, y) - c o v .2( x ) c .u (x, y) + 0 ( N x) + 0 (n 2). n
T h ird (the e q u a tio n is derived in the P a rt A .l o f the A ppendix),
+ /утерт Йч = N N ~nfcOV.12(x, y) _ COV.u (x, y)COV.21(x, y) COV?n (x, y)COV.3(x)
C0V ’ n N — 2 y c o v .2(x) cov.22(x) c o vJj(x)
- C0V*11^ ’^ ( NNM "\/cov2(y)[52(x) - \}{kn (x, у) - k3(x)r(x, у)Л + 0 { N - n ' 2) + 0(n~l).
cov.2(x) \ n N - 2 /
F o u rth , B reer for sim ple ra n d o m sam pling w ith o u t replacem ent is given by the e q u a tio n (2).
T h e bias o f synthetic regression estim ato r given for any sam ple design by the eq u a tio n (3) for sim ple ra n d o m sam pling w ith o u t replacem ent using the e q u a tio n (2) is as follows:
N d / N N — n
COV.u (x, y) 1 1 (
~ ^ x T ■ y) “ У)) +
4. SIM U L A T IO N ST U D Y
T h e co m p ariso n is based o n ag ricultu ral d a ta on 8624 farm s from D ą b ro w a T a rn o w sk a region. A p p roxim ate eq u a tio n s o f the bias and the M SE derived in the Section 3will be used. T h e co m p ariso n will also be supported by sim u latio n study in which 500 sam ples will be d ra w n at random . D a ta includes in fo rm atio n on sow ing area (in 100 sq u are m eters) the variable o f interest, and arab le area (in 1 0 0 sq u are m eters), which is auxiliary variable. T h e value o f the co rrelatio n coefficient betw een these variables equals 0.974. Bolesław com m une is treated as th e d o m ain o f interest. It includes 588 farm s. T w o sam ple sizes are considered - 8 6 and 259 elem ents, w h at equals 1 and 3% o f p o p u latio n size. L et us analyse
N X
the assum ptions given by equation (1). T he value o f d = 0.0682, — = 0 0922
N X
Y
and the value - = 0.0968. T h e difference between first ra tio an d o th e r ratios is significant.
Let us analyse results presented in the T ab . 1. A t the beginning high relative efficiency o f synthetic estim ato r (the ra tio o f the M S E o f syn thetic estim ato r and the variance o f H o rw itz -T h o m so n e stim ato r) m u st be stres sed, b u t it also sh o u ld be p o in ted o u t, th a t it d ecreases d u e to th e increase o f sam ple size. T h e reason is th a t the M SE o f sy nthetic ratio estim ato r includes the elem ent, which does n o t depend on sam p le size. T h e value o f the M S E o f synthetic regression estim ato r equals ca. 1% o f the variance o f H o rw itz -T h o m so n estim ato r for sam ple size o f 8 6 elem ents and 3% for sam ple size o f 259 elem ents. Because o f th e possibility o f low efficiency o f estim atio n , synthetic regression estim atio n should be used only for sm all d o m ain estim atio n purposes. T he value o f the bias has strong influence on the accuracy o f estim ation. It does n o t exceed 5% o f real m ean value and it decreases due to the increase o f sam ple size for analysed sam ple sizes. S im ulation study show s th a t fo r sam ple size o f 8 6 elem ents variance equals ca. 12% o f M SE, for sam ple size o f 259 ele m ents - only ca. 3 % .
DC Ю
T a b l e 1
R esults
H orw itz-T h om p son estim ator
Synthetic ratio estim ator sim ulation
Synthetic ratio estim ator - Taylor
sam ple size sam ple size sampl e size
86 259 86 259 86 259
Square root o f M SE 127 462.72 72 700.505 13 545.723 12 954.143 12 194.238 11 308.607
R elative square root o f M SE (in % ) 47.9434 27.3453 5.0950 4.8725 4.5867 4.2536
Standard error 127 462.72 72 700.505 4 709.3590 2 739.6764 3 009.1891 1 808.8175
R elative standard error (in %) 47.9434 27.3453 1.7714 1.0305 1.1319 0.6804
Bias 0 0 - 1 2 700.73 - 1 2 661.12 -1 1 817.12 -1 1 163.01
R elative bias (in %) 0 0 -4 .7 7 7 2 -А Л Ь П -4.4448 -4 .1 9 8 8
R a tio o f variance and M S E 1 1 0.1209 0.0447 0.0609 0.0256
R a tio o f M SE and variance
o f HT estimator 1 1 0.0113 0.0317 0.0091 0.0242 Tom asz Ż ą d ło
5. C O N C L U SIO N
S um m ing up, in th e p ap e r th e m ean square e rro r o f synthetic regression estim ato r o f to tal value in small d om ain is considered. T h e e rro r was presented in convenient way as the function o f in ter alia th e bias o f ra tio estim ato r o f p o p u la tio n m ean value and th e bias resulted from th e fact, th a t the assu m p tio n (4) is n o t m et. T he accuracy o f the e stim a to r was com pared in sim ulation studies with the accuracy o f direct Ilo rw itz-T h o m p so n estim ato r for sim ple ra n d o m sam pling. T hey confirm ed k n o w n fact, th at synthetic e stim ato r should be recom m ended for sm all sam ples.
A PPE N D IX
A. I . D E R IV A T IO N S
In this p a rt o f the appendix an eq u a tio n o f c o v (Ý regr, ft) for sim ple ran d o m sam pling w ith o u t replacem ent will be derived. F ollo w in g n o tatio n s will be used:
covr(x) = * £ (*I - *)'> cov.f(x) = £ (xu - x ) ', covr(y) =
i
£ ( y t - y ) ' ,™ f e d ' i e í l M i et l
c o \ . r(y)
= A
Z (Л -
УУ’covr*(*>
y ) = —£
(x u - x ) r (y, - y ) k,~ 1 t e n i et l
c o v .r*(x, y) = — Ц X (x t - x ) r (y, - y ) k,
™ 1 iei)
S i = co v2(x), S2 = c o v 2(y), S xy = c o v n (x, y), Six = c o v ,2(x ), Si y = c o \ . 2(y), S. xy = c o v .u (x, y),
Ś.2, = - Ц X (x, - x s)2, Śiy = 1 ■ XCVi“ Ух)2. Š.xy = - 1 z (х,- - X)(уi -
у)-п ^ i e S П I i eS П l j 6S
F o r sim ple ra n d o m sam p lin g w ith o u t re p lacem en t x = x s = £ х ; ,
Let us notice that:
с о v ( r e«', [!) = E ((fl - E (ß)) = ( r eer - N y ) ) =
= E [^ (N y s + j)(Nx - N x ,) - Ny ) ] - Е(Д)Вгевг.
T he bias o f the regression estim ato r o f the to tal value in the p o p u latio n d en o ted by B regr = E ( Ý regr - N y ) fo r sim ple ra n d o m sam p lin g w ith o u t replacem ent is given by the eq u atio n (8).
F o r sim ple ra n d o m sam pling w ithou t replacem ent E(ft) is given by the eq uation (7).
T o derive an e q u a tio n o f c o v ( Ý reer,fí) let us notice, th at:
E [$ (N x s + f t ( Nx - N x s ) - Ny ) ] = E
Ś.
§ L xy N y s + : ? ( N x - N x s) - N y J*v = E S ' ľ + A - ( Š : , - S . „ ) - (Š ?, - S h ) X J , x ( Э ' х ) ( N y s - N y ) + I S .,о2 о2 (^*xy S*xy) Ü«, + 0 ( N ■ n~ ) =- ~2 ™ & x y , ľs) /о2 42C0V(^’*> Уз) - ( o2~\2COVffixy, *s) + \ 2 ' COvffixf Xs) +
X \ ^ 4x)
+ ( S ? )2E ( ^ x S ‘xy)2(* *s) 2 N y E ( Ś . xy — S*xy)($'x ~~ S?x) ( x — X s ) +
+ N ~ &x) 2(x - *s ) + 0 ( N ■ n z). W*x)
U sing results presented in J. W y w i a ł (1992, p. 138-139) it is k n o w n , that:
ч 1 N - n
со v( S. xy, y s) = ~ тг— > c o v .12(x ,y ) , n N — 2
cov(S.xy, x s) = cov (Šix, у ) = 1 ^ c o v .2 1(x, y), n N — 2
/Л 2 _ ч 1 N - n
с о \ ( S i xy, x s) = —— ^_cov.3(x). n N — 2
A ccording to eq u a tio n s derived in the P a rt A .2 o f the A pp en dix we have: E ( Š . xy- S xy) 2( x s - x ) = 0(rt ~2) +
Because E( S . xy — Sxy)( Śi x — S 2) ( x s — x ) and E (Ślx — S$)(x s — x ) are simplified form s o f E( Š . xy — S xy) 2( x s — x ) , we have:
E ( Ś . xy - S xy)(Śtx - S 2x) ( x s - x ) = 0 ( n - 2) + 0 (n ~ E ( S. x - S x) ( x s - x ) = 0 ( n ~ 2) + 0 ( n ~ 1N ~ i ). Hence,
f a - N N ~ n(cov*»(*.Jfl_ jflcav. ufc у) , cov?n(x,y)cov.3(x)\
4 n N ~ 2 \ cov.j(x) cov*j(x) “ + cov.32(x) ) +
~ (~n W - l . n /c 0 V2( X)[ß2(X) - l ] { fc2l ( x , У) - k 3( x ) r ( x , y ) } ^ ' COV.n(x,y) 1 1 Л . , ч . --- --- - 2~/~ \ ( (cov* 3 i(* > )0 -c o v .2(x )c o v .11(x ,j0 ) + co v .2(x) n c o v r2( x ) \ ' CO v .u (x,y) + c o v .2(x) ( c o v . 4( x ) - c o v ? 2( x ) ) V j + + 0 ( N ■f ») + 0 ( . - ) - N g - - - " ( 'со^ ф ^ _ 2 ^ - n t e r t c o v - ^ t e y ) n N — 2 \ co v .2(x) covr2(x) c?u (x, y )c .3(x )N c?2(x) l + ™ ^ _ 2 \ / c o v2( y ) [ B2( x ) - I ] { / c2i ( x , y ) - /c3( x ) r ( x , y ) } ^ + + ^ ^ ^ \ / c o v 2(y)[B2(x) - l]{k21(x ,y ) - k 3( x ) r ( x , y ) } ^ ■ ■ ^ ( c o v * 3 l ( x » y ) - c o v .2( x ) c o v . u ( x , y ) ) + + ^ C O V .4 ( x ) — COV?2 ( > ; ) ) J ) + 0 ( N - n ~ 2) + 0 ( n ~ 1) .
Because ' N N - n N _ 2V c o v2(y )[ß2(x) - l]{fc21(x ,y ) - /c3(x)r(x,y)} 1 (c o v .3 1( x , y ) - c o v .2(x )c o v .n (x,}0) + n cov?2(x) CO v . u i x . y ) + cov (c o v .4(x) - cov?2( x ) ) ^ = 0 ( N ■ n 2) finally we have:
fvrror by N N - n (c o \ .n (x,y) ,covM1(*,y)cov.2l(*,y) , coviu (x,y)cov.3(x)\ ,
“ V(y co v ii(x ) + 5 ® j
- cov' “ f c ^ ( — í j a > v 2( y yí B, ( x ) - 1 ] { к ц (х , у ) - k , ( x ) r ( x , y ) ] \ +
c o v .2(x) \ n N — 2 J
+ 0 ( N - n - z ) + 0 ( n ~ l ).
A .2. A D D IT IO N A L D E R IV A T IO N S - PA R T O N E
In this p a rt o f the appendix an eq u a tio n o f E (S. X)1 — S ' Xy) 2( x s — x ) for sim ple ran d o m sam pling w ithout replacem ent will be derived. F ollo w ing d erivations are based on the assum ption , th a t x = у = 0, w hat does n o t have influence on the generality o f results:
-S'.,,) (Xj x) = E(Ś»XyXg 2S.xyŠ.xyx s -f SXyX) = = E(Šixyx s) — 2S.xyE(Š'Xyx s).
T h en, using results presented in J. W y w i a ł (1992), p. 138-139, we receive
On M ean Square Error o f Synthetic Regression Estimator
Let us notice, th a t
\
( N
Y iE(ś»XfXs) = ^ _ JJ2 Ei Yj xďlai — n x $ysJ X S = ^ \ (
E w i V f I
x f l i)+
i = 1- l ö ^ X H C H +? ( W C W
All o f three elem ents o f above-m entioned sum will be derived separately
л _J \
using fo llo w in g e q u a tio n s : E ( a , ) = - - > E(a, a,) = —-— — —»
N v N ( N - l )
с / ч п ( и - 1 ) ( и - 2 ) n ( n - l ) ( n - 2 ) ( n - 3 )
( a A r k ) - _ j ^ * ( a ta j a ka, ) - Щ ц _ _ 2 ) ( N _ 3 ) ‘ S o m e simple tra n sfo rm a tio n s will be om itted.
F irst, 3 ^ 2 E( E
( E
“ ^ ov*2i(* .y )c o v .u (x, } 0 + 0 ( n ~ 2). n(n Second, „ > - 2 , ) 2 E ( |/ W , ) ( J > a ) - ( I у а) = 0 ( n - »). T h ird ,Й
^
Е( 1 ,3<А) ’( 1 / Л) г = 0 ( " " >-
A fter sum m ing up the three elem ents derived abo ve we receive:E(S}xyx s) = ^ - ^ J c o v .2i(x ,y )c o v .u (x ,y ) + 0(n ~ 2).
Using the e q u a tio n (8) finally we received: 2 N — n
E(St Xy S»xy) ix s x ) — Е(Х»ХуХ5) c o v ji(x )c o v «2i(x ,y ) — n iV — z
r2 2 2 t f - n
= ( n ~ N ]cov.2i(x,y)covM1(x,y) covn (x)cov.21(x,y) + 0(n“ 2) = 0(n 2).
A 3. ADDITIONAL DERIVATIONS - PART I WO
In P a rt A .2 o f the ap pendix an eq uatio n o f c ov (Ś. xy, Śixy) for sim ple ran dom sam pling w ith o u t replacem ent will he derived. F ollow ing d eriv atio ns are based on the assu m p tio n , th a t x = у = 0, w hat does not have influence on the generality o f results.
c o v & x y J K ) = E ( Ś . xy - S.xy)(Śix - Slx) = E ( Ś . xy, Ś . xy) - S. , ySlx (9) Let us notice, that:
( n - 1)
(nZ " i)Ľ ( Z W i - ľ , x f a t - n x ^ J =
-
n(,íx‘a‘X k AX,ím) +H ž xiaN i ml
■All o f fo u r elem e n ts o f a b o v e -m e n tio n e d sum will be d eriv e d s e p a rately using follow ing equations: E (a f) = E (a,a; ) = ■ ^ Е ( а ^ а к)
n(n — l)(n — 2) p . . _ n ( n - l ) ( n - 2)(n - 3) N ( N - 1 ) ( N - 2) (a‘aJa*a‘} - ]V(N - 1)(N - 2 ) ( N - 3)" tran sfo rm atio n s will be o m itted :
F irst,
Som e sim ple
(n
E( Z Ц
Z
^
cov.3i(x,y) +^1
+ ^ c o v .n (x,y)cov.2(x) + Second, - 1 Ф - l) 2“ V r i Third, 1 + 0 ( n - 2) + 0 ( N - 1).Щ
Z W i Y Z * i a ] = - ^ c o v . u (x ,y ) c o v .2(x) + 0 ( n " 2) + 0(iV L). n \ / n .1 = 1 / \ i = l E Í x f ai) 1 у а ) = соу.и (х,у)соу.2(х) + 0(п_2) + 0(А/ ‘).F o u rth ,
A fter sum m ing up the fo u r elem ents derived above we receive:
E (ś. xyś . x) = 1 cov.31(x,y) + ( 1 - '
n \ n
U sing the e q u a tio n (9) finally we received:
co v(Ś.xy, Ś. x) = * (c o v .3ł (x ,y ) - c o v . 2( x ) c o v . n ( x , y ) ) -f 0 ( N _1) -I- 0 ( n ~ 2). n
R EFEREN CES
B r a c h a С. (1994), M etodologiczne a sp e k ty badania m ałych obszarów, “ Studia i M ateriały. Z Prac Zakładu Badań Statystyczno-E konom icznych”, nr 43, G U S , W arszawa.
B r a c h a C. (1996), T eo retyczn e po d sta w y m eto d y reprezentacyjnej, PW N , W arszawa. G e t k a - W i l c z y ń s k a E. (2000), E stym acja zjaw isk rzadkich w populacji skończonej, P hD
thesis, Szkoła G łów na H andlow a, Warszawa.
S ä r n d a l C. E., S w e n s s o n B., W r e t m a n J. (1992), M o d e l A ssiste d S u rvey Sam pling, Springer-Verlag, N ew York.
W y w i a ł J. (1992), S ta ty s ty c z n a m etoda reprezentacyjna tv badaniach ekonom icznych ( op
tym alizacja badań p r ó b k o w y c h ), A kadem ia Ekonom iczna w K atow icach, K atow ice.
W y w i a ł J. (2000), Ocena param etrów cech populacji z w yko rzysta n iem danych o zm iennych
dodatkow ych, spraw ozdanie z realizacji Grantu K B N 1H02B 008 16.
T o m a s z Ż ą d ło
O B Ł Ę D Z IE ŚR E D N IO K W A D R A T O W Y M S Y N T E T Y C Z N E G O EST Y M A TO R A R E G R E SY JN E G O
W opracowaniu rozw aża się problem estymacji wartości globalnej w m ałym obszarze. Ze względu na m ałą liczbę (lub brak) elem entów populacji z rozważanej dom eny, w próbie wykorzystywane są inform acje o wszystkich elementach populacji. Zaprezentow any zostaje syntetyczny estym ator regresyjny. W yprow adzono też ogólne wzory na błąd średniokw adratowy i obciążenie tego estym atora dla d ow oln ego planu losow ania. O m ów iony zostaje problem założeń dotyczących struktury populacji i dom eny z punktu widzenia redukcji błędu średnio- kw adratowego i obciążenia rozw ażanego estym atora. Zaprezentowane jest znaczenie wpływu
obciążenia na precyzję estymacji. Pokazana zostaje m ożliw ość wzrostu wartości błędu średnio- kw adratow ego i ob ciążenia w raz ze wzrostem liczebności próby. W yprow ad zon e zostają przybliżone wzory na błąd średniokwadratowy i obciążenie estym atora dla próby prostej losowanej bezzwrotnie. P orów nano również precyzję syntetycznego estym atora regresyjnego (wartości uzyskane na podstaw ie w zorów przybliżonych oraz symulacji) z precyzją bezpośred niego estym atora H orw itza T hom p son a. Porów nanie bazuje na danych ze spisu rolnego dla pow iatu D ąbrow a Tarnow ska. Populacja składała się z 8624 gospodarstw rolnych i o b e jm owała rozważany mały obszar gm iny Bolesław, na której terenie znajduje się 588 g o s