A C T A U N I V E R S I T A T I S L O D Z I E N S I S F O L IA O E C O N O M IC A 194, 2005 T o m a s z Ż ą d ł o * O N SY N TH ETIC R A I I O ESTIM A TO R BASED O N S U PE R PO PU L A T IO N A PPR O A C H Abstract
In the p a p e r p ro p e rties o f a p re d ic to r o f the form o f synthetic ra tio e stim a to r o f do m ain to ta l, k n o w n fro m ra n d o m isa tio n a p p ro a c h , are considered. T h e p r o o f o f its £-unbiasedness fo r sim ple regression su p e rp o p u latio n m odel in s tra ta is show n. F o r the m odel B L U p red ic to r is also presented. E q u atio n s o f prediction variances o f both predictors are derived. F o r considered p re d ic to rs the p ro b lem o f m odel m isspecification is considered an d e q u atio n s o f p rediction m ean sq u a re e rro rs a rc derived. T h e co m p ariso n o f accuracy is su p p o rte d by sim u latio n study.
Key words: sm all a rea statistics, su p erp o p u latio n a p p ro ac h , m odel m isspecification, £-bias.
I. INTRODUCTION
Let p o p u la tio n (2 o f size N be divided into С s tra ta d eno ted by Q c each o f size N c (w here c = 1 ,...,C ) and D dom ain s Q d each o f size N d (w here d = 1 One d om ain can be a p a rt o f m o re th a n one stratum . Sets i2cn i 2 d will be denoted by Q cd and their sizes by N cd. F ro m each stra ta sam ple sc o f size nc is draw n. Let sets scr \ Q d be d eno ted by scd and
с с
th eir sizes by ncd. L et us introduce additional sym bols: sc = s, £ nc = n,
C = 1 C - 1
^ГС ß c *^C5 N rc = N c fíc, £2rí| S 4, N rd N d ľl^, ^ r e d = ^ c d $cd>
N red = N cd — ncd. Let us stress th a t subscript d* will denote do m ain o f interest, w hich to ta l value 7 > = £ У. is estim ated.
U. S IM P L E R E G R E S S IO N S U P E R P O P U L A T IO N M O D E L IN S T R A T A
Let us consider sim ple regression su p erp o p u latio n m odel in s tra ta with assum ption:
Let us add th a t ßc is unknow n and x t , . . . , x N are know n. W h at is m ore, for considered su p erp o p u latio n m odel and for o th er su p erp o p u latio n m odels assum ed for stra ta , which will be discussed in follow ing p a rts o f the paper, it is assum ed th a t random variables Y u ..., YN arc in dep end en t and:
w here v(.) denotes values o f know n function o f auxiliary variable.
Let us intro d u ce predictor o f d om ain to tal value o f the form o f ratio synthetic estim ato r know n from random izatio n ap p ro ach . F o r considered stratified ra n d o m sam pling it is as follows (e.g. B racha, 1994; B racha, 1996; G etka-W ilczyńska, 2000; W ywiał, Ż ądło, 2003):
It was proved th a t p redictor o f the form o f synthetic ra tio estim ato r is ^-unbiased for sim ple regression su perpop ulatio n m odel assum ed for strata.
d ) where Hd Y cl) — ß c x cj, ЕДес() — 0 = D i ( Y ci) = = (J2c v{xci) (2) (3) where *C|*C * »Cäc I
Let us notice th a t for assum ed sup erp o p u latio n m odel:
W hal should he stressed is th a t pred ictor o f the form o f synthetic ratio estim ato r (3) docs n o t have m inim al prediction variance am o n g all linear «^-unbiased prcdictors for sim ple regression su p crp o p u la tio n m odel assum ed for strata. F ro m R oyall’s theorem (1976) it is know n th a t B L U predictor for the considered sup crp o p u latio n m odel with assu m p tio n s (1) and (2) is as follows: f B L V - r a , = Z ( Y x d . + $ cX n d . ) (4 ) c = l where V X i ^ l
I
= = ľ *„ у , , . =.z
y, V ' * i e & rc á * Í G S cé* Ic-se v ( x i )Let inclusion probabilities in stra ta be co n stan t (e.g. sim ple ra n d o m sam ple w ithout replacem ent is d raw n from strata) and V;v(xj) = x ľ Hence:
С y -rp S Y N — rat _ V ы * V /с\ 1 л- = 2. ~ ч г (5) С = 1 Л SC where Yx = £ Y i , X K = Y , X i and i e s f i e a f f%íru ~Tat = I и ^ Х гсЛ (6) c= 1 \ A sc J
It easy to notice th a t if above-m entioned assum ption s an d the following conditions are fulfilled:
- n o n e o f elem ents o f dom ain d* are d raw n to the sam ple,
- fo r each stra ta from which elem ents o f i/*-th do m ain were draw n follow ing eq u a tio n holds — =
^ SC X s c d *
- for each stra ta from w hich elem ents o f d *-th d o m ain were draw n follow ing eq u a tio n holds sc = scd.,
then r p B lV - r a t _ f ' SY N i d * — í d * C y Z j\ C(l * y C— 1 Л ЯСУ.,. (7)
L et us derive equ atio n s o f prediction variances o f pred icto rs (3) and (4) assum ing th a t co n d itio n (2) is fulfilled. It should he stressed th a t they arc correct even when co n dition (1), which defines sim ple regression super- pop u latio n m odel, is n o t fulfilled.
A fter som e algebra prediction variance o f the prcdictor o f the form o f synthetic ra tio estim ato r is as follows:
V a r ^ Ť T d. ) 2 = X X 2 C - 1 x cd. % vi^i) X cd* „ v(Xj) „ ~| I 6 J , л с 1 е а ы * / L i i e O c i» (8) If first ord er inclusion probabilities are co n stan t is s tra ta and if V ^ x ,) = x ;, then p rediction variance o f the prcdictor o f the form o f synthetic ratio estim ato r will be given by follow ing equation:
V a r,(rj* 'v T V )2 = X a-C = 1 Xcd' _ - x cd, x „ X sc ď + X cd* (9) where X s c d * I X , if'cd'
Prediction variance o f predictor (4) for su p erp o p u la tio n m odel with assum p tion (2) can be derived using R o y alľs theorem (1976). L et us stress th a t it is correct even w hen co ndition (1), which defines sim ple regression su p erp o p u la tio n m odel, is n o t fulfilled. P redictio n variance o f p redicto r (4) is as follows: V ar i( T Bi l lJ- ' at- T d. ) 2 = 2 > c c — 1 Y 2 Л. rc ( x f ' \v ( x ;) icQ„I V( * i ) (10) If Vjvix,) = x„ then prediction variance will simplify to the follow ing form:
Let us co m p are prediction variances o f b o th predictors w hen VjvCx,) = x ; and for c o n sta n t first o rd e r inclusion probabilities in strata.
V a r{( f J .“ ' - ' - - T d. ) 2 - V a r j if J J 1»-"* - T d*)2 = - £ <x2 C = 1
( 12)
Let us notice, th a t the value o f X Kä. is closer to zero (w hat holds when ncj, decreases), the sm allest precision difference o f b o th predictors is. In
X * discusscd case, the m axim um value o f equation (12) is received for = 0.5.
X x
X * X *
T he difference (12) equals 0 for = 0 and for = 1. F o r small area
X sc X sc
X *
statistics p urposes considerations can be limited to 0 < ’c < 0 .5 . In this X s c
X
case, the lower value o f is, the lower value o f precision difference X , c
(12) is observed. P rediction variances o f the considered p red ictors are equal w hen e q u a tio n (7) holds.
III. S IM P L E R E G R E S S IO N S U P E R P O P U L A T IO N M O D E L IN D O M A IN S
S ynthetic estim ators use assum ption th a t som e relatio nsh ips which occur in populatio n (or in strata) hold in dom ains (or dom ains and stra ta products) too. In the previous p a rt o f the p ap e r two ^-unbiased p redictors for simple regression su p crp o p u latio n m odel in stra ta were presented. Let us add that predictor (4) have m inim al prediction variance am ong all ^-unbiased predictors (hence its m ore precise th an predictor (3)). A ssum ption th a t sim ple regression sup crp o p u latio n m odel in stra ta is tru e can be incorrect. F o r exam ple simple regression su p crp o p u latio n m odel in dom ains can be true. In the following p art o f th e p aper accuracy o f the predictors (3) and (4) for sim ple regression su p crp o p u la tio n m odel in d om ains will be considered. It will be proved th at both predictors arc ^-biased and equations o f their ^-biases and prediction M SEs will be derived.
L et us assum e th a t sim ple regression su p crp o p u la tio n m odel in dom ains is true. T h e assum ption is as follows:
Let us consider tw o add itio n al alternative assum ptions. It is assum ed th at ran d o m variables У ,,..., Y N are independent and:
a l = D U Y c d = D f (eci) = a 2 v ( x el) (14) as in e q u a tio n (2) or
= D ? ( Ydd = D 2 (edi) = a j v ( x di). (15) In previous p arag ra p h it was stressed th a t if assu m p tion given by equation (2) (th e sam e is p re sen ted by e q u a tio n (14)) is tru e , th en V a r ^ T ’®/'*7 -"'' — T d) < V ,dTl (T^XN~rat — T d). Let us consider prediction v arian ces o f b o th p redictors when eq uation (15) is true.
Prediction variance o f the predictor o f the form o f synthetic ratio estim ator fo r assum ption (15) after som e algebra is received as follows:
с V a r^ T ^ ™ - ™' — T d) = £ С — 1 ( í ŕ ľ i i z ~ r + ° i i к * , ) -\ л с / d = 1 i e S c s n i i е О ы * cd* 2 V 1 J + 2 ^ - '< j J . I - I- (16)
If Vdv (x i) = x i and first o rd e r inclusion probabilities will be co n stan t in stra ta , th en above eq u atio n simplifies to the follow ing form:
V ar{( 7 * ™ T d) 2 = i f Í a 2X 3Cd + a 2d. X cd. - 2 а 2. Х х Л
c = l \ ^ a c d = l Л sc J
(17)
Let us derive prediction variance o f pred icto r (4) for assu m ptio n (15). T he follow ing result can be received:
(18)
If Vdv(Xj) = x j; then above eq u atio n simplifies to the follow ing form:
V ari(7'J.L£/_rat - T d) =
W
X 2d. X x 2 £ a j X scd + a 2d. X rcd\ (19)If V X x ,) = X; and first o rd e r inclusion probabilities arc c o n sta n t in strata, then fo r assum ption (15):
с = 1 C = 1 V ar{( f J , " '- '* - T d) - V ar{(f2.™-™‘ - T d) = r i Xy £ ( X K- X xä. )(Xcd. + X rcd. - X sc) - I o i XYf X KAXcä* + X nd.) Л s c d ¥■ d * = 1 * * 8C (20)
Let us notice, that the value of Xscd. is closer to zero (w hat holds when ncd. decreases), the sm allest precision difference o f b o th pred icto rs is. A bove eq u a tio n is sum for stra ta o f sum s o f two elem ents. Let us assum e th at ^dx i > o.
F o r each stra ta second elem ent is negative. T h e first elem ent is negative for every stra ta if and only if X cd. + X rcd. < X x . H ence,
VcX cd. + X rcd> < X K=> V a r - T d) < Vax(( f dl N~ rat - T d). Based o n eq u a tio n (20) it can also be proved th a t
Ľ X ľ l 71 Xi 1 <1=1 Vc^ 2. E V a r ^ f j r 17- * - T d) < - T d). Л sc
It was show n th a t p re d ic to r (4) can be m ore precise th a n p re d ic to r (3) for assu m p tio n (15).
L et us derive e q u a tio n o f £-bias o f the p re d ic to r o f the fo rm o f synthetic ratio estim ato r (3) for the supcrp o p u latio n m odel with assu m ptio n (13). A fter som e algebra it is obtained th at:
E = £ Xf I (ßd ~ ß d. ) X cd (21) с = 1 с d = l where „ X , x <ä = E i
W h a t was expected, the predictor o f the form o f sy nth etic ra tio estim ato r is (^-unbiased, when sim ple regression sup crp o p u latio n m odel is true in stra ta to w hich do m ain o f interest belongs (su p c rp o p u latio n m odel with assu m p tio n (1)).
L et us derive eq uation o f £-bias o f the p redicto r (4) for su p erpo pu latio n m odel assum ed in this p a rt o f th e paper.
I x J l f x ) ' Ы - М 2 & а з ) c= l \ i e s r VVX i ) J d = X i e ic iV\ X i)
Sim ilarly to the p redictor o f the form o f synthetic ra tio estim ator, the p red icto r (4) is ^-unbiased if simple regression su p erp o p u la tio n m odel in do m ain s becom cs sim ple regression su perpo pu lation m odel in s tra ta (simple regression su p erp o p u la tio n m odel in s tra ta with assum p tion (2) is true).
Let us assum e th a t V;v(Xj) = x, and th a t first ord er inclusion probabilities arc c o n sta n t is strata. T hen, equations (21) and (22) o f ^-bias o f predictors (3) and (4) simplify to the following form s:
E {(Tу и - ш _ T r ) = £ V * ^ (ß ä ~ ßä * ) XKi (23) С — 1 A SC d = 1 4 f BJ - U~ra‘ - Td-) = i X “d' i (Ä - ßd*)X,cd. (24) sc ii = 1 H ence, Ч Т ^ - Ш _ T d t ) _ 4 f ^ Y S - r a , _ Tdt) = _ £ X d. ° {fid_ ß(i,)Xxd C = 1 A s c d = 1 (25) First, let us remind th at if both predictors are ^-unbiased (i.e. simple regression superpopulation m odel in strata is true) or if equality (7) holds, then difference given by e q u a tio n (25) will equal zero. Let us noticc, th a t the value o f X xd, is closer to zero (w hat holds when ncil. decreases), the sm allest difference o f d;-biases o f both predictors is.
Let us consider tw o cases w ith addition al assum ptions th a t VjX, > 0 and f B L u - r a t ^ f s Y N - u o r jn t ^ e first case for cach s tra ta to which elem ents
1 D
o f d* d o m ain belong follow ing inequality occurs — £ (Д, — ß d ' ) Xscd > О,
SC d = 1
w h a t can hold w hen 'id ß d > ß d.. H ence, E i( Ť d*N~ ral- - T d,) > 0 and E(( f d * U rat - T d.) > 0 and"finally E {( f dB.Lt/“ rflt - T d.) - E (( f sJ N~rat - T d.) < 0. L et in the second case fo r each stra ta to which elem ents o f d* dom ain
1 D
belong follow ing inequality occurs — £ ßd ~ ßd*)Xxd < 0, w hat can hold sc d = 1
When V, ßd < ß d.. H ence, < 0 and Е }( Т ^ и rat - T d.) < 0 and
d + d *
finally E(( Ť ^ v - ra,- T íl. ) - E l( Ť sdI N- ra,- T d. ) > 0 . In b o th cases absolute value o f í-b ia s o f Ť d* u ~rat predictor is lower then absolute value o f Ť d*N~ral. Let us stress th a t when elem ents o f d* d om ain were draw n to the sam ple only from one stra ta , only one o f these tw o situ atio n s can hold.
P redictio n M S E o f the predictor o f the form o f synthetic ra tio estim ato r for sim ple regression sup erp o p u latio n m odel in dom ain s is ob tained by sum m ation o f prediction variance (8) for assum p tion (14) o r prediction variance (16) for assu m ption (15) and squared f-b ia s (21). P rediction M SE o f p red icto r (4) for sim ple regression su p crp o p u latio n m odel in do m ains is received by sum m ation o f prediction variance (10) fo r assum p tio n (14) or prediction variance (18) for assum ption (15) and sq uared f-b ia s (22).
Because analytical results o f M SE com parison are qu ite m o dest, in p art V sim u latio n study will additionally be conducted.
[V. P O L Y N O M IA L S U P E R P O P U L A T IO N M O D E L IN S T R A T A
In the previous section the m isspecification o f su p erp o p u la tio n m odel was considered in the case when simple regression su p erp o p u la tio n m odel in dom ain s is true. In the follow ing section polynom ial su p erp o p u latio n m odel in s tra ta is assum ed.
It is assum ed th a t
Е Д У „ ) = i ß P x b (26)
)= о
P articu lar form o f polynom ial superpo p u latio n m odel w ith assum p tio n (26) is regression su p erp o p u la tio n m odel w ith follow ing assum ption:
E i( Y tl) = № x ci + ß ? ). (27)
W h at should be rem inded is th a t for m odels assum ed fo r stra ta equation (2) holds. It im plies th a t, prediction variances o f b o th p red ictors are given by eq u a tio n s (8) and (10) and
V ar{(T j.LC,_ret - 7 » < V a r j f g " - ' * - 7 » .
Let us derive eq u atio n o f £-bias o f the p red icto r o f the form o f synthetic ratio estim ato r for polynom ial superpopulation m odel in stra ta (superp op ula tion m odel w ith assum ption (26)). A fter som e algebra it is o b tain ed , th at
E{(ŕ?.™-™‘-
7V)= X
с - 1 с ießc,* j=0 iI
I E # W ■td* _ У.У.Ш У
. Y№x{
If regression su p erp o p u la tio n m odel is assum ed for stra ta (su p erp o p u latio n m odel w ith assum ption (27)) and if first ord er inclusion p rob abilities are co n stan t in stra ta , the eq u atio n will simplify to the follow ing form :
In the considered case if fo r each stra ta the m ean value o f auxiliary variable for d o m ain d* and stratu m products equals the m ean value o f auxiliary variable for sam pled elem ents from stratu m , the p re d ic to r o f the form o f synthetic ra tio estim ato r will be ^-unbiased.
Let us derive eq u a tio n o f £-bias o f p redicto r (4) fo r polynom ial sup er p o p u la tio n m odel in s tra ta (su perpopulatio n m odel w ith assu m ptio n (26)). T h e result is as follows:
I f regression su p erp o p u latio n m odel in s tra ta is tru e (su p erp o p u latio n m odel with assu m p tio n (27)) and if У;у(х^) = x ;, then the eq u atio n will simplify to th e follow ing form :
(29)
E t (f$* -pBL U - rat1 — T d») =
(30)
In the considered case if for each s tra ta th e auxiliary variable m ean value for non-sam pled elem ents o f intersection o f do m ain d* and stratu m equals
the m ean value o f auxiliary variable for sam pled elem ents from stratu m , prcd icto r (4) will be ^-unbiased.
Let us co m p are ^-biases o f b o th predictors for regression su p erp o p u la tion m odel in stra ta when V;v(jc^ = x, and first o rd e r inclusion probabilities arc c o n sta n t in strata. Let us assum e th a t equ ality (7) does n o t occure. Ilen ce,
E ( ( f B L U - r « _ T d t ) _ E ( ( f S Y N - r a , _ ^ = _ £ „ Г h * . _ * - 1
с = 1 X я с ^ f t c d * f t с
(32) Let us notice, th a t the value o f пы. is closer to zero, the sm allest difference o f ^-biases of b o th predictors is. If for each s tra ta the auxiliary variable m ean value for sam pled elem ents o f intersection o f do m ain d* and stratum equals the m ean value o f auxiliary variable for sam pled elem ents from stratu m , values o f £-bias for b o th predicto rs will be equal.
Let us consider tw o cases assum ing th a t V;X( > 0 and 4 ß cO)> 0 . Let in the first case for each s tra ta from which elem ents o f d*-th d o m ain were d raw n follow ing inequalities ap p ear
Ncd. nc N rcd. nc ’ ncd. nc ' can hold for exam ple w hen d om ain o f interest consists o f elem ents with the highest values o f auxiliary variable. H ence, E i( T d.ł'JV“ rat — 7 » > 0 and
E ť( f « . ! / - « , _ T i <) > 0 a n d f m a l l y E ( ( f B L V - r a , _ T i m) _ E ^ f S Y N - m , _ < Q
Let in the second case for each stra ta from which elem ents o f d*-th dom ain were d ra w n follow ing inequalities appear
Ncd* nc N nd. nc ’ ncd. nc ' It can hold fo r exam ple w hen d om ain o f interest consists o f elem ents with the lowest values o f auxiliary variable. H ence, E (( f d™ ~ rat — T d.) < 0 and
4 f B L V - r a , _ Tdm) < о a n d f i n a l l y E i ( f B i i / - r « r _ T d t ) _ E { ( 7 * ™ - ™ < _ T d. ) > 0 .
In b o th cases absolute value o f £-bias o f p re d ic to r is low er than ab solute value o f £-bias o f f dI N~ ra\ w hat implies low er value o f prediction M S E o f p re d ic to r (because value o f p re d ic tio n v aria n c e o f f B i u - ra t js iow er) Let us add t^ at fjje sam c conclusions can be received for b o th cases fo r assum ptions V;X; > 0 and V ^ 0' < 0.
P rediction M S E ’s o f predictors (3) and (4) fo r sim ple regression sup er p o p u latio n m odel in dom ains arc received by su m m ation o f prediction variances (8) and (10) and squared ^-biases given by eq u atio n s (28) and (30) adequately.
V. S IM U L A T IO N S T U D Y
S im ulation study is conducted based on artificial p o p u latio n which consists o f 200 elem ents divided into 3 s tra ta and 6 do m ains. F irst stratu m , which consists o f 80 elem ents, includes 20 elem ents from first do m ain , 20 elem ents from second d om ain and 40 elem ents from third d om ain. Second stra tu m , which consists o f 70 elem ents, includes 30 elem ents from first d o m ain , 30 elem ents from fo u rth d om ain and 10 elem ents from fifth do m ain . T h ird stratu m , which consists o f 50 elem ents, includes 20 elem ents from second d o m ain , 10 elem ents from fifth do m ain an d 20 elem ents from sixth do m ain . Values o f auxiliary variable were generated using norm al d istrib u tio n s with follow ing param eters set arb itrarily: in first stratum d istrib u tio n N (100, 20), in second s tra tu m - N ( 120, 30) an d in third stra tu m - N (1 5 0 ,4 0 ). E lem ents in stra ta are assigned to d o m ain s at ra n d o m .
T hree p redictors arc considered: predicto r given by e q u a tio n (3) (in tables d eno ted by synt), predictor given by eq u atio n (4) with v(x;) = \ J x t for every i = \ , . . . , N (in tables denoted by BLU 1) an d p re d ic to r given by e q u a tio n (4) w ith v(x^) = 1 for every i = 1, ..., N (in tables deno ted by B LU 2). A ccuracy o f the three predictors is considered fo r fo u r sup er p o p u latio n m odels with following p aram eters set arb itra rily . Let us add, th a t fo r all follow ing sup erp o p u latio n m odels ran d o m co m p o n en ts are generated by using N ( 0, 1) distribution. F irst m odel is sim ple regression superp opulation m odel in strata as follows: Yci = ßcx ci + EcisJ x ci, where ß l — 1, ß 2 = 2, ß 3 = 3. Second m odel is regression su p crp o p u latio n m odel in strata as follow s: Yci = (rcl)x ci + $ 0) + EctyJ x ci, w here ß \ l) - 1, ß ^ ) = 2, = 3, /Д0) = 200, = 250, ß ^ ) = 300. T h ird m odel is polynom ial su p erp o p u latio n
2
m odel in stra ta as follows: Y ei = £ (№xJci + i-cis/ x ci, w here //,2) = 1.5, /42) = 1, #,2) = 0.5, № = 1, $su = 2, Ä 1’ = 3° M0) = 200, #>0) = 250, = 300. F o u rth m odel is sim ple regression superpo p u latio n m odel in d om ain s as follows: Ydi = ßiXa + EdiJ x di, where ß v = 1, ß 2 = 3, ß 3 = 5, /?4 = 7, ß s = 9, ß e = 11. It should be underlined, th a t although m odel ap p ro ach is con dition al a p p ro ach , results in sim ulation study are averaged by tak in g sam pling design d is trib u tio n in to co n sid e ra tio n . S ym bol E p d e n o te s expected value o f sam pling design distribution. In the follow ing tables bias (in % )
E E ( T * _T *) denotes ap p ro x im ated in sim ulation study value o f p ^ d* d* x 100,
E i(7 'd.)
- T , .) - E {(7 V - T » ) 2 v ... , , .
— 4— x 100 and ro o t M SE (in % ) denotes
ap-E{(i <(*)
/g ££ f'p _ T
proxim ated in sim ulation study value o f L -|, — —— x 100. It is w orth
E t( T d.)
stressing th a t č, p-bias, p-expected value o f prediction variance and p-expccted value o f prediction M SE arc com puted instead o f p ^-bias, <j;-expected value o f p-variancc and <!;-expected value o f p-M SE. Values o f above-m entioned statis tics are eq ual because sam pling design is noninform ativ e.
Stratified ran d o m sam pling with p ro p o rtio n al allo catio n is considered. R esults received in sim ulation arc based on 500 ra n d o m sam ples and are ad ditionally averaged with respect to 1000 realizations o f su p erp o p u latio n m odel. T h is way for sim ulation purposes 500 000 values o f each predictor are generated. T hree sizes o f sam ple are considered: 40, 60 an d 80 elem ents which a m o u n t to 20 % , 30% and 40% o f p o p u latio n size. H igh fractions o f draw n elem ents are considered because it was proved, for cases discussed in previous p arts o f the paper, th at for small sam ple sizes precision difference o f b o th p redictors is small.
Let us co m p are accuracy o f analysed predictors w hen sim ple regression su p crp o p u la tio n m odel in s tra ta is true.
R esults presented in the T able 1 show th a t ro o t <!;-expected values of p-M SEs fo r all o f predictors in all dom ains except o f d om ain three equal less th an 1% o f <!;-expected d om ain total. In d om ain three they docs n o t exceed 3% . It is w orth stressing th a t although accuracies o f the considered predictors are sim ilar, ro o t ^-expected value o f p-M S E o f th e p re d ic to r o f the form o f synthetic ratio estim ator is higher com paring to predictor (4) with misspecifica tion o f variance structure (in table denoted by B LU 2). If statistician specifies correct form o f <!;-expected value o f random variables (i.e. he decides th at simple regression su perpopulation m odel in stra ta is tru e) and incorrect form o f their «^-variance (i.e. he decides th a t m odel is hom oscedastic), the choice o f BLU p red icto r with w rong specification o f variance stru ctu re will be better th an choice o f the p red icto r o f the form o f synthetic ra tio estim ato r. Interes ting is th a t in sim ulation study the decrease o f ro o t <!;-expected p-M S E s for synthetic estim ator due to the increase of sample size is slower com paring with o th e r predictors. Let us add, th a t the highest values o f ro o t <!;-expected p-M S E s are observed in dom ain three, because it is the only d o m ain which belongs only to first stra ta - s tra ta with the low est Д. coefficient. Because d istrib u tio n s o f auxiliary variable in s tra ta are sim ilar, in the first s tra ta the higher dispersion o f variable o f interest with respect to Č d istrib u tio n is observed. N otice th a t the sm aller is sam ple size the sm aller is difference in accuracy o f synthetic estim ator and B LU predictor (denoted by B LU 1) w hat was proofed fo r different assum ptions in p a rt 2 o f th e paper.
Table 1. A ccuracy o f p re d ic to rs fo r sim ple regression s u p c rp o p u latio n m odel in s tra ta
D o m a in P red icto r
Bias (in % ) R o o t v a rian ce an d ro o t M S E (in % )
Sam ple size S am ple size
40 60 80 40 60 80 1 synt 0.00 0.00 0.00 0.86 0.72 0.65 B L U 1 0.00 0.00 0.00 0.84 0.68 0.57 B L U 2 0.00 0.00 0.00 0.85 0.68 0.58 synt 0.00 0.00 0.00 0.63 0.54 0.48 2 B L U 1 0.00 0.00 0.00 0.61 0.50 0.43 B L U 2 0.00 0.00 0.00 0.61 0.50 0.43 3 synt 0.00 0.00 -0.01 2.52 2.04 1.77 B L U 1 0.00 0.00 0.00 2.48 1.95 1.63 B L U 2 0.00 0.00 0.00 2.50 1.97 1.64 4 synt 0.00 0.00 0.00 0.85 0.70 0.62 B LU 1 0.00 0.00 0.00 0.83 0.67 0.56 B L U 2 0.00 0.00 0.00 0.84 0.68 0.57 5 synt 0.00 0.00 0.00 0.58 0.52 0.49 B L U 1 0.00 0.00 0.00 0.56 0.47 0.41 B L U 2 0.00 0.00 0.00 0.56 0.47 0.41 6 synt 0.00 0.00 0.00 0.55 0.46 0.40 B L U 1 0.00 0.00 0.00 0.54 0.44 0.37 B L U 2 0.00 0.00 0.00 0.55 0.44 0.37
Let us consider results for regression su p erp o p u latio n m odel in strata which are presented in the T able 2. A ccuracy o f the considered predictors will be discussed in the case o f m odel m isspecification. Let us notice th at values o f ro o t ^-expected p-M SEs do n o t exceed 3,5% o f <!;-expected dom ain totals and they arc determ ined by values o f £-p-bias. It should be underlined th a t in this case none of predictors have better accuracy in com p ariso n with others. F o r polynom ial m odel in stra ta (result are n o t presented) values o f ro o t <!;-expected p-M S E s exceed 6% o f <!;-expected do m ain to tals only in few cases for sam ple size 40 elem ents. T hese results are determ ined by £, p-bias, values o f ro o t <!;-expected p-variances do no t exceed 0.04% of ^-expected do m ain totals. It should be stressed th a t in som e cases ^-expected p-M S E s o f synthetic ratio estim ator increase d ue to the increase o f sam ple
size, w hat for p-M S E s was discussed earlier by W yw iał, Ż ąd ło (2003). T he sam e p ro p e rty can be observed for <!;-expected p-M S E s, because sam pling design is n oninform ative.
T able 2. A ccu racy o f p re d ic to rs fo r regression su p e rp o p u latio n m odel in s tra ta
Domain P red icto r
Bias (in % ) R o o t va ria n ce (in % ) R o o t M S E (in % )
S am ple size S am ple size S am ple size
40 60 80 40 60 80 40 60 80 1 synt -1 .7 5 -1 .8 7 -1.91 0.44 0.37 0.33 1.81 1.90 1.94 B L U 1 -2 .4 2 -2 .3 0 -1 .9 0 0.43 0.35 0.30 2.46 2.33 1.92 B L U 2 -3 .3 3 -3 .1 0 -2 .6 0 0.43 0.35 0.30 3.36 3.12 2.62 2 synt -1 .3 4 -1 .4 3 -1 .5 0 0.38 0.32 0.29 1.39 1.47 1.53 B L U 1 -1 .9 3 -1 .8 4 -1.61 0.37 0.30 0.26 1.97 1.86 1.63 B L U 2 -2 .7 6 -2 .7 3 -2 .2 6 0.37 0.30 0.26 2.78 2.74 2.27 3 synt 1.73 1.52 1.50 0.84 0.68 0.59 1.93 1.67 1.62 B L U 1 0.38 0.12 0.10 0.83 0.66 0.55 0.91 0.67 0.55 B L U 2 -1 .6 5 -0.77 -0 .6 2 0.83 0.66 0.55 1.84 1.01 0.83 4 synt 0.42 0.30 0.27 0.50 0.41 0.36 0.66 0.51 0.45 B L U 1 -0 .8 7 -0 .6 7 -0 .5 0 0.49 0.39 0.33 1.00 0.78 0.60 B L U 2 -1 .5 7 -1 .4 9 -1.21 0.50 0.40 0.33 1.65 1.55 1.25 5 synt 2.14 2.06 2.03 0.39 0.35 0.33 2.17 2.09 2.06 B L U 1 0.74 0.58 0.46 0.37 0.32 0.28 0.83 0.66 0.53 B L U 2 -0 .1 2 -0 .1 2 -0 .1 2 0.38 0.32 0.28 0.39 0.34 0.31 6 sy n t 1.71 1.57 1.56 0.40 0.33 0.29 1.76 1.60 1.59 B L U 1 0.50 0.33 0.31 0.39 0.31 0.26 0.63 0.46 0.40 B L U 2 -0 .4 0 -0 .4 0 -0 .3 4 0.40 0.31 0.26 0.56 0.51 0.43
F inally, in the T ab le 3 results o f sim ulation study fo r sim ple regression su p erp o p u la tio n m odel in d om ains are presented. A t the beginning it m ust be stressed th a t p rediction accuracy is n o t sufficient m ainly because o f high values o f the bias. It should be noticed th a t p red icto r (4) (b o th in cases o f correct and incorrect specification o f variance structure) has better accuracy com p arin g to the p redictor o f the form o f synthetic ra tio estim ato r. T he highest values o f £ p-bias and ^-expected p-M S E are observed in first and second dom ain. It results form fact, th a t elem ents o f these d o m ain s belong to stra ta in which m ost o f elem ents are from d o m ain s w ith higher ßd th an
in the first and second dom ain. It should be stressed th a t, as in T able 2, in som e cases <j;-expected p-M SEs oť the prcd icto r o f the form o f synthetic ra tio estim ato r increase due to th e increase o f sam ple size.
Table 3. A ccu racy o f p re d ic to rs fo r sim ple regression su p e rp o p u latio n m odel in d o m ain s
Domain P red icto r
Bias (in % ) R o o t v ariance (in % ) R o o t M SI! (in % )
Sam ple size Sam ple size Sam ple size
40 60 80 40 60 80 40 60 80 1 synt 336.89 336.39 336.06 1.97 1.66 1.49 336.89 336.40 336.06 B L U 1 227.68 241.19 206.71 1.92 1.56 1.32 276.69 241.20 206.71 B L U 2 280.01 244.17 209.23 1.94 1.57 1.33 280.02 244.17 209.23 2 synt 93.09 95.69 95.86 0.68 0.59 0.53 93.09 95.69 95.86 59.05 B L U 1 76.90 70.01 59.05 0.66 0.55 0.46 76.90 70.01 B L U 2 78.37 71.16 60.05 0.67 0.55 0.47 78.38 71.17 60.05 3 synt -28.55 -28.89 -28.99 0.50 0.41 0.35 28.55 28.89 28.99 B L U 1 -2 3 .5 8 -20.16 -17.36 0.50 0.39 0.33 23.58 20.16 17.37 B L U 2 -23.23 -1 9 .8 9 -17.11 0.50 0.39 0.33 23.24 19.89 17.12 4 synt -3 1 .0 6 -31.37 -31.41 0.36 0.30 0.27 31.06 31.37 31.41 B LU 1 -24.66 -2 1 .8 6 -18.90 0.36 0.29 0.24 24.66 21.86 18.90 B L U 2 -24.07 -2 1 .3 2 -18.45 0.36 0.29 0.24 24.07 21.32 18.45 5 synt -30.41 -2 9 .8 2 -29.67 0.26 0.24 0.22 30.41 29.82 29.67 B L U 1 -24.06 -2 0 .5 6 -17.74 0.25 0.21 0.19 24.06 20.57 17.74 B L U 2 -23.48 -20.10 -17.33 0.25 0.21 0.19 23.48 20.10 17.33 6 synt -3 1 .7 9 -30.72 -30.54 0.25 0.21 0.18 31.79 30.72 30.54 B L U 1 -2 5 .4 2 -21.60 -18.77 0.25 0.20 0.17 25.43 21.60 18.77 B L U 2 -2 4 .8 6 -21.18 -18.40 0.25 0.20 0.17 24.86 21.18 18.40 VI. C O N C L U S IO N
In the p ap e r properties o f the p redicto r o f the form o f synthetic ratio estim ato r based on su perpo pulation ap p ro ach were studied. It was proved th a t it is ^-unbiased for simple regression su p erp o p u la tio n m odel in strata. F o r the m odel BLU pred ictor was presented an d situ atio n s when both p re d ic to rs are eq u al w ere show n. P ro p erties o f b o th p re d ic to rs were
additio n ally studied in the case o f su p erp o p u la tio n m odel m isspecification. A nalytical con sid erations were supported by sim ulation study. It was shown th a t for discussed d a ta both predictors gives sim ilar results bo th for correct and incorrect m odel specification. F o r correct m odel specification and for simple regression m odel assum ed in dom ains, accuracy o f th e B LU predictor is higher com p arin g to accuracy o f the p redictor o f th e form o f synthetic ra tio estim ato r in sim ulation study. W hen problem o f m odel m isspecification for analysed artificial pop u latio n is discussed, both p red ictors gives better results for incorrect m odels assum ed for stra ta th a n for incorrect m odels assum ed fo r dom ains.
R E F E R E N C E S
B olfarine H ., Z acks S. (1992), Prediction Theory fo r Finite Population.';, Springer-V erlag, New Y ork.
B rach a Cz. (1994), M etodologiczne a sp ekty badania m ałych obszarów, S tu d ia i M ateriały , Z Prac Z a k ła d u B a d ań S taty sty czn o -E k o n o m iczn y ch , 43, G U S , W arszaw a.
B racha Cz. (1996), Teoretyczne podstaw y m eto d y reprezentacyjnej, P W N , W arszaw a.
D o m a ń sk i C z., P ru sk a K . (2001), M e to d y sta ty s ty k i m ałych obszarów, W yd. U niw ersytetu Ł ódzkiego, Ł ó d ź.
G etk a-W ilczy ń sk a E. (2000), E stim atio n o f to ta l d o m ain in finite p o p u la tio n , S ta tistics in Transition, 4, 4, 711-728.
R oyal! R .M . (1976), I'he linear least sq u ares pred ictio n a p p ro a c h to tw o-stage sam pling, Journal o f the A m erican S ta tistica l A ssociation, 71, 473-657.
V alliant R ., D o rfm a n A .H ., R oyall R .M . (2000), Finite P opulation Sam pling and Inference. A Prediction Approach, Jo h n W iley & Sons, N ew Y ork.
W yw iał J., Ż ą d ło T . (2003), On M ean Square Error o f S yn th etic R atio E stim ator, S tudia E k o n o m iczn e, A E K atow ice, 2003.
T o m a s z Ż ą d ło
O S Y N T E T Y C Z N Y M E S T Y M A T O R Z E IL O R A Z O W Y M Z P U N K T U W ID Z E N IA P O D E J Ś C IA M O D E L O W E G O
Streszczenie
W opracow aniu rozw ażane są z p u n k tu widzenia podejścia m odelow ego w łasności pred y k to ra p ostaci syntetycznego esty m ato ra ilorazow ego w artości globalnej w d om enie znanego z podejścia ran d o m izacy jn eg o . P rzed staw io n y jes t d o w ó d jeg o f-n ieo b ciążo n o śc i d la p ro steg o regresyjnego m odelu n a d p o p u la cji w w arstw ach. D la tego m odelu z ap rezen to w an y je s t także p re d y k to r typu B L U . W yprow adzone są wzory opisujące w ariancje predykcji obu pred y k to ró w d la w spom nianego m odelu n a d p o p u la cji. D la obu p re d y k to ró w ro z w aża n y je s t także p ro b lem niepraw idłow ej specyfikacji m odelu nadpopulacji i d la tego przypadku w yprow adzone są błędy średniokw adratow e predykcji. P o ró w n an ie d o k ład n o ści obu p red y k to ró w w sp arte je s t an alizą sym ulacyjną.