A C T A U N I V E R S I T A T I S L O D Z I E N S I S
F O L IA O E C O N O M IC A 194, 2005
K r y s t y n a P r u s k a *
T E ST S FOR R A I I O OF TW O M E A N S IN CASE OF SM A LL AREAS
Abstract
T h e re la tio n s betw een ch aracteristics for su b p o p u la tio n s an d for th e w hole p o p u latio n are very im p o rta n t in sm all area investigations.
In the p a p e r there are p ro p o sed testing p ro ced u res fo r verification o f hyp o th esis which says th a t there is n o difference betw een the ra tio o f sm all a re a m ean and p o p u la tio n m ean for analysed v ariab le and auxiliary variable. T h e p ro p e rties o f one considered p ro ced u re arc investigated w ith the use o f sim u latio n m ethods.
Key words: sm all area, synthetic estim ato r.
I. IN T R O D U C T IO N
In sm all area statistics estim ation o f unknow n p aram eters fo r su b p o p u la tio n is con sidered as general p roblem . D ifferen t e stim a to rs are constructed and applied and their pro perties arc investigated. T h e synthetic estim ato rs are considered, too. They can be used w hen som e assum ptions arc true. In this p ap er we consider possibility o f verification w hether these assum ption s are fulfilled.
II. S Y N T H E T IC E S T IM A T O R S
In statistical literature different definitions o f synthetic estim ators are given (see: D ol, 1991; B racha, 1996; S ärndal et al., 1997; K o rd o s, 1999, D om ański and P ru sk a, 2001). G enerally, their constru ctio n is possible w hen som e relations betw een param eters for subpo p u latio n and p o p u la tio n are constant.
In the p ap e r there arc considered finite p o p u latio n only. W e will in troduce the follow ing notations:
Y - investigated variable, X - auxiliary variable,
T Y P - to tal value o f variable Y for p op u latio n , T X P - to tal value o f variable X for po p u latio n ,
T Y g - to tal value o f variable Y for stratu m g o f p o p u latio n , TX.g - to tal value o f variable X for stratu m g o f p o p u latio n ,
TYhe — to tal value o f variable У for stratu m g and sm all area h o f pop u latio n ,
7 X hg — to tal value o f variable X for stratu m g an d sm all area h o f po p u latio n ,
T Y ,,. — to tal value o f variable У for small area h o f p o p u latio n , w here g = 1, ..., G; h — l, ..., H and G is n u m b er o f s tra ta in the po p u latio n , H is num ber o f small areas in the p o p u latio n .
If we assum e (see D ol, 1991):
th en we can consider the synthetic estim ato r o f to tal value o f variable Y for small a re a h o f the follow ing form:
where Д, is estim ato r o f value Т ^ сге are different form s o f statistic Д, (see: D ol, 1991).
E stim a to r (2) can be used when assum ptions (1) are fulfilled. In em pirical investigations we ou g h t to verify w hether the con ditio ns are true. We m ay apply the estim ato r o f T X hg instead T X hg in fo rm u la (2).
0
)
G
T Y k. = 1 1 T X hg (2)
ÍU . F O R M U L A T IN G O F H Y P O T H E S IS
We can consider verification of possibility o f the use o f synthetic estim ator as the verification o f a suitable statistical hypothesis. S ynthetic estim ato r (2) is constructed for the pop u latio n which is divided into stra ta . A ssum ptions
(1) can be verified on the basis o f independent sam ples d raw n from each stra ta . All hypotheses which have the form:
я . П л ш П м m
09' Т Х . в Т Х * (- }
for д = 1, G an d for the fixed h from the set {1, H} against an altern ate hypothesis:
H i g : ~ H 0g (4)
can be verified analogously as hypothesis:
h ° ' ' W =1 Л р j1 Л мо^ (5)
against hypothesis:
H i : ~ H 0, (6)
where T Y M0 and T X M0 are to tal values for variables Y and X for fixed sm all area.
In this p ap e r we will consider the following equivalent form o f hypothesis H n:
H o .ßrP = MrMO (7)
Mx p Hx m o
where ц УР, ц Хр> Hy m o, Hx m o are m eans for variables У and X for po p u latio n and for sm all area, respectively.
IV. T E S T P R O C E D U R E S
T est statistic fo r the verification o f hypothesis (7) can be ra n d o m variable:
Z = b - J ü ? - (8)
Л P л МО
o r its fu nction where Y P, X P, Y мо , X M0 are sam ple m ean s fo r variables У and X fo r p o p u latio n and for sm all area, respectively.
У MO ~ X m o X р - м о У P - MO j $ Y M O I T 2л MO ■j V2YP MO V N M O X 2P„M0 N P M O Y .___ У MO X р - м о л MO У P - M O I Sx m o у 2 ' MO Ь х р - м оV2
D eterm ining the d istrib u tio n of statistic Z is difficult when wc do not know the d istrib u tio n o f variables У and A".
N o w th e test p ro c ed ú re fo r verification o f h y p o th esis (7) will be proposed. In this p ro cedure the con ditio nal d istrib u tio n s (fo r fixed values o f statistics X u o , X p - M o or У m o, X ^p- m o respectively) o f the following
statistics are used:
Ur - -
77
, (9
) and U x = — === —^ ^ —--- (10) I ß x M O У M O S X P - M O V N M O Y 2p„m o N P M O whereY p m o, X p m o arc sam ple m eans for set which is difference between
p o p u latio n and small area and for variables У and X , respectively; Sx m o, Sx p- m o are sam ple variances for variable X fo r sm all area and for set which is difference between pop ulation and sm all area, respectively;
Symo, $yp - mo arc sam ple variances for variable У for small area and for set which is difference betw een population and sm all area, respectively;
N M O is the n u m b er o f these elem ents o f sam ple from p o p u latio n , which belong to sm all area;
N P M O is the num ber o f these elem ents o f sam ple from p op ulatio n, which do no t belong to small area.
T h e test algorithm is the following:
1. We draw independently iVP-element sam ple from the whole population. T h e elem ents o f the sam ple belonging to sm all area are the sam ple for the sm all area and the elem ents w hich do n o t belong to sm all area are the sam ple for set which is difference between p o p u latio n an d small area, respectively.
2. W e determ ine the value o f the follow ing statistics: I MO, 1CP - M0,
Sx m o, $х р- м о> У m o, У р - м о , $y m o, Sy p- m o, U x, U y
-3. We verify w hether /их/ ^ 1,96 or / uľ/ > 1,96 where ux and uY are values o f variables Ux , U Y, respectively. I f one inequality is n o t tru e then we reject hypothesis (7).
W e can notice that:
P(/Uxl > 1.96 or / Url > 1.96) = P(IUXI > 1.96) + P(IUrl > \ . % ) - P ( / U x/ > l . % and / Uy / > 1.96) < P ( / U x/ > 1.96) + P ( / U r/ > 1.96) = 0.05 + 0.05 = 0.1,
when we con sider the probability for suitable cond itio n al d istrib u tio n o f statistics U X and UY which are asym ptotic norm ally.
If we know to tal values o f auxiliary variable for p o p u latio n and small area (i.e. wc know T X P and T X M0) then we can use the follow ing random variable as test statistic:
Y ___t x mo y r МО гт ч у * p — MO * Л. p- Mn (1 1) / Symo T X l t o Ьу р - моV2 v N M O T X 2P- MO N P M O
A sym ptotic d istrib u tio n o f statistic (11) is no rm al N ( 0; 1). In this case rejection region o f hypothesis (7) is determ ined in th e classic way. W e reject hypothesis w hen the value o f statistic (11) calculated on the basis o f sam ple belongs to the rejection region.
V. M O N T E C A R L O A N A L Y S IS O F P R O P O S E D T E S T P R O C E D U R E P R O P E R T IE S
M o n te C arlo analysis deals with the first presented test pro ced ure, it m eans the case w hen the to tal values o f auxiliary variable for p op ulation and small area are unknow n.
T h e aim o f the conducted experim ents was d eterm ining the num b er of cases in w hich the hypothesis (7) was rejected in 1000 rep etitio n s for fixed d istrib u tio n o f po p u latio n .
T h e experim ents were conducted in the follow ing way:
1. C re a tin g the p o p u la tio n consisting o f 50 000 values o f variable ( Y , X ) w hich are generated from fixed d istrib u tio n o r two fixed dist ributions.
2. D eterm in in g the sm all area which consists o f 5000 elem ents. 3. D raw ing и-elem ent sam ple from p o p u latio n (n = 2000, 2500).
4. V erification o f hypothesis (7) by m eans o f test w hose statistics are variables (9) and (10).
5. C o n d u ctin g 1000 repetitions o f stages 3. and 4.
6. D eterm in ing the nu m b er o f cases in w hich the hypothesis (7) was rejected in 1000 repetitions.
T he results o f M o n te C arlo experim ents arc presented in fa b le s 1. and 2.
T able 1. N u m b e r o f cases o f rejection o f h y p o th esis (7) a m o n g 1000 experim ents fo r the sam e d istrib u tio n o f (У, X ) in p o p u la tio n a n d sm all area
No. Distribution of variable (Y, -Ю Size of sample from population Minimal size of sample from small area Average size of sample from small area Maximal size of sample from small area Number of rejection of //» 1 N (m , I ) , m — [100; 20 2000 156 200 251 123 1 = TOO 36“ 2500 207 250 299 113 36 16 2 N (m , £ ), T m = [10; 20 2000 156 200 251 136 E =Г1 0,8 2500 207 250 299 140 L0,8 l j 3 X ~ N ( 10; 1) 2000 149 200 247 0 Y = m 2500 196 250 305 0 4 X ~ N (60; 12) 2000 149 200 247 0 Y = [X] 2500 196 250 305 0 5 X Y = x + z 2000 161 200 252 0 Z ~ N / 5 v 5N 2500 188 250 312 1 \ 2 10,
1
6 X ~ P 10 У = X + Z 2000 158 200 253 0 Z ~ N is 2500 209 250 293 0 \ 10J
T able 2. N u m b e r o f cases o f rejection o f hyp o th esis (7) a m o n g 1000 experim ents fo r the d ifferen t d istrib u tio n s o f (У, X) in p o p u la tio n an d sm all area
No. Distribution of variable (У, X) in small area Distribution of variable (У, X) out of small area Size of sample from popu lation Minimal size of sample from small area Average of sample from small area Maximal size of sample form small area Number of cases of //„ 1 N (m , I ) , N (m , 1 ), m 1 — [120; 25 m 1 = [1 0 0 ; 20 2000 156 200 251 120 1 = 100 3 6 ' [ 1 0 0 3 6 ' 2500 207 250 299 113 1_3 6 16 I 36 1 6 . 2 N(m , £ ), N(m , I ) , „ 7 _ Ш = [12; 21 m r = [10; 20 2000 156 200 251 1000 ľ = '1 0,8" i - Г * ( 2500 207 250 299 1000 .0 ,8 1. L0,8 1. 3 X = U - 1-2 У = У + 1 X ~ N ( 1 0 ; 1) 2000 149 200 247 1000 U ~ N (10; 1) Y - [ X ] 2500 196 250 305 1000 V= [{/] 4 X = U + 2 Y V + \ X ~ N (60; 12) 2000 149 200 247 0 U ~ N (60; 12) *■< II 5 2500 196 250 305 0 V= 1Щ 5 X = U + 2 Y V + \ X ~ P S U ~ P > Y = X + Z / 5 V5N 2000 161 200 252 1000 V= u + z z ~ n( - ; — 1 in 2500 188 250 312 1000 (5 ■J 5\ Z ~ N [-'■ — I \ 2 1 0 / 6 X = U + 2 Y = V + \ V ~ P , a У = X + z / 5 y / } \ 2000 158 200 253 1000 K = U + Z z ~ n( - ; — ) 2500 209 250 293 1000 (5 . \ 2 1 0 / Z ~ N ---- j \2 1 0 /
W e can notice th a t the considered test procedúre does not reject hypothesis (7) o r rejects in a few cases am ong 1000 repetitions when hypothesis is true. If hypothesis (7) is no t tru e and distrib u tio n s o f investigated variable in populatio n and small area differ significantly then hypothesis (7) is rejected in all repetitions for a given case. If the distributions d o not differ significantly th en hypothesis (7) is rejected in som e repetitions or in no repetitions. Such results are no t typical for significance tests. T h e proposed test procedúre can be m odified and its properties ou g h t to be investigated.
VI. FIN A L R E M A R K S
Different m ethods for the verification o f hypothesis ab o u t relations between to tal values for su b p o p u latio n and p opu latio n can be co nstructed. Some propositions arc presented in the paper, another one (using b o o tstrap m ethod) is given in the P ru sk a ’s pap er (2002). T he problem s, considered in the p ap e r, arc im p o rta n t in investigations o f small area. It seems neccssary to co n tin u e th e conducted analyses.
R E F E R E N C E S
B rach a Cz. (1996), Teoretyczne podstaw y m eto d y reprezentacyjnej, P W N , W arszaw a.
D oi W. (1991), S m a ll A rea Estim ation. A Synthesis between Sam pling T heory a n d Econom etrics, W o ltcrs N o o rd h o ff, G ro n in g e n .
D o m a ń sk i Cz., P ru s k a К . (2001), M e to d y sta ty s ty k i m ałych obszarów, W yd. U niw ersytetu Ł ódzk ieg o , Ł ódź.
K o rd o s J. (1999), P roblem y estym acji d an y ch d la m ałych obszaró w , W iadom ości S ta tystyczn e, 1, 85-101.
P ru sk a K . (2002), S taty sty czn a w eryfikacja m ożliw ości zasto so w an ia esty m a to ró w syntetycznych w b ad an iac h m ałych obszarów , referat w ygłoszony w Ł agow ie n a konferencji „ S taty sty k a re g io n a ln a w jednoczącej się E u ro p ie ” (2-5.09.2002).
S ärn d al C ., Sw ensson B., W retm an J. (1997), M o d el A ssisted S u rvey Sam pling, Springer-V erlag, N ew Y ork.
Krystyna Pruska
T E S T Y D L A S T O S U N K U D W Ó C H Ś R E D N IC H W P R Z Y P A D K U M A Ł Y C H O B S Z A R Ó W
Streszczenie
R elacje pom iędzy ch arak te ry sty k am i p o d p o p u lac ji i całej p o p u lacji są b a rd z o ważne w b ad an iac h m ałych obszarów .
W pracy tej zaproponow ane są procedury testowe, służące d o weryfikacji hipotezy o równości stosunku średniej d la m ałego obszaru d o średniej z populacji d la analizow anej zmiennej i zmiennej pom ocniczej. W łasności jednej z zap ro p o n o w an y ch p ro ced u r b ad an e są za p o m o c ą m etod sym ulacyjnych.