• Nie Znaleziono Wyników

On stratification of population on the basis of auxiliary variable and the selected sample

N/A
N/A
Protected

Academic year: 2021

Share "On stratification of population on the basis of auxiliary variable and the selected sample"

Copied!
8
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FO LIA O EC O N O M IC A 156, 2002

Janusz W yw iał*

O N ST R A T IF IC A T IO N O F P O P U L A T IO N O N T H E B A S IS O F A U X IL IA R Y V A R IA B L E A N D T H E S E L E C T E D S A M P L E

Abstract. Survey sampling conditional m ethods are usually connected with post- stratification estim ators for dom ains and with inference on the basis o f regression models o r contingency tables. These problems were considered e.g. by R a o (1985), T i l l é (1998), W i l l i a m s (1962). The problem o f stratification o f a population on the basis of observations on the variable under study in a sample was considered by e.g. D a l e n i u s (1957).

We deal with the problem o f appropriate division o f a simple sample into sub-samples of equal sizes. This partition of the sample leads to clustering a population into sub-populations. Each o f these sub-populations includes one and only one previously created sub-sample. The linear combinations o f statistics from the sub-samples are used for estimation o f a population mean. The coefficients o f this linear com bination are proportionate to the sizes o f the sub-populations. This statistic is the unbiased estimator o f the population mean. The variance o f the estimator has been derived. The example of determining of the estim ator param eters is presented. M oreover, some generalisations o f proposed estim ators are suggested.

Key words: stratification after sample selection, conditional estim ation, conditional m ean, conditional variance.

I. ESTIM ATOR

Let us assume th at the values o f an auxiliary variable are know n in a population of size N. Its i'-th value is denoted by x iy i = 1, N. An

i-th value o f a variable under study is denoted by yt, i = 1, N. Let us

assume that the elements o f the population U = [ 1, N] are ordered in

such a way that x t < X j for each i < j = 1, N. T he simple sample s o f

the size n is drawn without replacement from a fixed and finite population U.

(2)

Let us divide each sample s = {f„ ik, ik+l, in}, where ij < ih if i < h , into II following sub-samples o f size m: sh( xk) = {i„r* - j ) + 1, imh), h = 1, 2, H < N. Hence, sh(xk) n s,(xk) = 0 for each h ^ t = 1, II and

U.v* = s. Let Ult = {í:x,sS xÄ}, USt = {i: x*_, < x, < x„}, h = 2, H - 1 h= 1

and U = {i: x, > x ,,^ } . Hence U,Hn U,t = 0 for each h Ф t = 1, II, and и

U U,t = U, k = 1, H.

A = 1

Let £1 = {.v} be a space sample. In our case the set ÍÍ consists of sam ples s. Let Í2(xt ) be the set o f such sam ples s e Q th a t

x k - L**,. х кг -x * „ . J is fixed. Hence, О = Ш ( х А) and í l ( x j n í l ( x t) = 0

for к Ф h (see general considerations e.g. in F l a c h s m e y e r J. (1977)). The value x k can be treated as the outcom e o f th e ran do m vector X = [Ar1 ... .Хи-i]- Its probability distribution function is determined by the expression:

. size(0(xt ))

P(X = x k) = ^ k” (1)

Let us assume that the simple sample s is draw n w ithout replacement and its size is n = Hm, where m ^ 1 and n < N. M oreover, let s = {ix, in}

and x (j< x i# and i j < i e if and only if j < e . The sample s is divided into

II sub-samples sh = ..., imh}, h = ..., H. Let us assume that

imh = /i = l, ..., II — 1. Hence, x kt is the sample quantil o f order mh/n of the auxiliary variable. The num ber kh identifies the position of the sample quantil in the population.

W i l k s (1962), p. 252, considered the distribution of the order statistics in the simple sample drawn without replacement from a finite population.

I he particular case of this distribution is the probability distribution o f the random vector K = [K 1, ..., If m > \ and n = I I m < N :

P(K1 — kl t ..., KH - 1 — kH-x) = A i - Л А1 [кИ- 2 - к Н- j - l \ Í N - к н - Л \ т - 1 Д m- 1 J ^ m — 1 ) \ m J N Ilm where: m ^ k i < k 2 ... /си _ х < N — m or

(3)

J i m where fc0 = 0.

Particularly, if I I = 2 the k t = к is the sample m edian and: - 1 \ / N - к

£

2 f

P(K = k) = - - * • k = m, N - m (4) If m = 1, n = II: P i K ^ k , ... iCH- i = f c H- 1) = N 7^ » fc = l, .... N - l J l j

In the case when II = n = 3, m = 1 and N = 5 the distribution o f the variable [Aľ, K 2] is determined by the Table 1.

T a b l e 1

(*1. * i) fc, = l fc, = 2 fc, = 3

Ла = 2 0.3 0 0

fc2 = 3 0.2 0.2 0

fc2 = 4 0.1 0.1 0.1

If II = 2, m = 1 and и = 2, the distribution is reduced to one determined by the equation:

p ( K - t , = § £ w * = 2...N - J

If II = 2, m = 1 and n = 3 then

F or instance, if N = 5, m = l and n = H = 2 then P 2(X = 1) = 0.4, P 2(K = 2) = 0.3, P 2(K = 3) = 0.2, P 2(K = 4) = 0.1. I f N = 5, H = 2, m = 1

(4)

and n = 3 then P2( K = 2) = 0.3, P 2(K = 3) = 0.4, P 2( K = 4) = 0.3. If N = 5,

m = 2, H = 2 and n = 4: P 2( K = 2) = 0.6, P 2(X = 3) = 0.4. F o r N = 6, m = 2, H = 2 and n = 4: P 2(K = 2) = 0.4, P 2( K = 3) = 0.4, P 2( K = 4) = 0.2.

Let us consider the following conditional estim ator o f the population average ý:

where: S'h = Sh — {K*} and

The expected value o f this statistic is derived in the following way: (6)

E(9sik) — E K(ESIK(j>s/KlK) —

= я * (ý) = ý where: U'k = Uh — {К*} and

Hence:

Es/k( ýsi k) — У E(ys/k) = У

(8)

In conclusion, the statistic ys/JC is a conditionally and unconditionally unbiased estim ator o f the population mean.

(5)

The derivation of the variance is as follows: D 2 ( 9s i k) = E k( Ds i k( 9s i k\ K ) ) + D k( Es i k( 9s i k\ K ) ) = Е к ( 0 1 / к ( У х / к \ К ) ) + 0 = - + о + ( , -

-к.-к.-.-"

.Л к „ .,у м

к „ ^ - т \ "

Ч.?Л

»

)

(К.

■- ü- л - ó * Т " ' " J

(N -

if.:.)- s

(9) The unbiased estimator o f the variance D2(yS/K) *s shown by the equation:

,2, - > * - Ч К к - К ^ - 1 \ * K k — K h- i — m ds (ys,K) 1 Д N ) ( К ь - К ь - О С т -1) s‘ + ( , K g - x \ 2N — K g - I — m л ч + ( _ N ) ( N - K a - J m °s" ( ) where: Ds; = —Ц I O', - У^)2, h = 1, Я - 1 Ds . = ~ Е ( У /- У 5.) 2 (11)

II. EX A M PLE O F SIM U LA TIO N STUDY O F T H E ESTIM A TIO N EFFICIEN CY

Let us consider the particular case when H = 2. The distribution o f 30 observations (x ; y) o f a two-dimensional variable is shown by the Fig. 1. The basic param eters of this variable in the population consisting o f 30 elements are as follows: the average o f auxiliary variable x = 68.6824, the m ean o f the variable under study ý = 93.6536, the variances o f auxiliary variable and the variable under study vx = (89.1094)2, vy = (17.6015)2, respectively and Finally the correlation coefficient between these variables r = 0.9940.

Let the population average be estimated by m eans o f the estim ators

ys/K■ The simple sample draw n without replacement has 5 elements. The sample space consists o f ( ) samples. On the basis of all these possible

5 x

samples, the conditional (and the unconditional) expected values and variances o f both estim ators have been calculated. The variance o f the simple sample mean is D2(ys) = 51.6356 and D2(ýs/x) = 42.9823.

(6)

Fig. 1. The scatter plot for variables x and у in the population & С 01 3 ar OJ 12 000 10 000 8 000 6 000 4 000 2 000 0

Fig. 2. The distribution o f the random variable К in the case o f the estim ator y s/K

T he relative efficiency is defined by the expression:

e = ( №0 %) Dz(ýs/K)/D2(ýs). In our case e = 83.24%. Hence, the precision

o f the conditional estim ators ys/K is better than the precision o f the simple sample mean.

As it was defined by the expression (4), the outcom e к o f the random variable К is the num ber o f the population element dividing the sample into two sub-samples. The probability distribution o f the random variable

(7)

к Fig. 3. T he conditional variances o f the estim ator ys/K

The conditional variances of the estim ator ys/K are represented by the Fig. 3.

The above considered conditional m ethod of estimation can be generalised in several directions. Firstly, in the case o f two auxiliary variables the sample quantils let us divide the population into m2 non-em pty and disjoint sub-populations. Secondly, instead o f a one-dimensional auxiliary variable and a variable under study, the multidimensional ones can be considered because, usually, the vector o f population means is estimated and the vector o f auxiliary variables can be available. F o r instance in this case the precision of the estimation o f m ean vector can be determined by trace o f variance-covariance m atrix or by generalised variance.

A CK NO W LED GEM ENT

The research was supported by the grant num ber 1 H02B 008 16 from the State Com m ittee for Scientific Research (KBN).

REFERENCES

D a l e n i u s T. (1957), Sampling in Sweden. Contribution to Methods and Theories o f Sample Survey Practice, Almqwist & Wiksells, Stockholm.

(8)

R a o J. N. К . (1985), Conditional Inference in Survey Sampling, „Survey M ethodology", 11, 1, p. 15-31.

T i l I é (1998), Estimation in Surveys Using Conditional Inclusion Probabilities: Simple Random Sampling, „International Statistical Review” , 66, 3, p. 303-322.

W i l k s S. S. (1962), Mathematical Statistics, John Wiley & Sons, Inc. New Y ork, London. W i l l i a m s W. H. (1962), The Variance o f an Estimator with Part-Stratified Weighting,

„Journal o f the American Statistical A ssociation” , 57, p. 622-627.

Janusz Wywiał

O W ARSTW O W AN IU P O P U L A C JI NA PO D STA W IE Z M IE N N E J P O M O C N IC Z E J I PRÓBY PO J E J W YLOSOW ANIU

(Streszczenie)

Problem estymacji wartości przeciętnej w populacji na podstawie próby prostej losowanej bezzwrotnie z populacji ustalonej i skończonej jest rozw ażany. Z akładam y, że w artości zmiennej pomocniczej są obserwowane w całej populacji. Próba prosta, po jej wylosowaniu, jest porządkow ana zgodnie z rosnącymi wartościami zmiennej pomocniczej. N astępnie próba ta jest dzielona na H > 1 równolicznych podprób. Potem zlicza się, ile jest elem entów populacji pomiędzy elementami rozdzielającymi podpróby. Udziały tych liczebności stanowią współczynniki kombinacji liniowej, m.in. średnich z podprób. Taki warunkowy estym ator daje nieobciążone (w arunkow o i bezwarunkowo) oceny w artości średniej w populacji. Pokazano przykład oceny wartości średniej w populacji z wyznaczeniem wartości wariancji warunkowych i bezwarunkowych estym atora.

Cytaty

Powiązane dokumenty

(bottom) Tidal straining (Simpson et al., 1990) in the Rhine ROFI during stratified conditions is due to the systematic interaction of the cross shore velocity shear with the

Przedzierali się przez kordony gra- niczne; szlachta, chłopi, mieszczanie, żołnierze, młodzi i starzy, z wiarą w nieśmiertelność Polski. &#34;Drogi tułaczki ścierały tytuły,

Włączenie do dydaktyki akademickiej przedmiotów dyskursologicznych wydaje się waż- ne nie tylko dlatego, że jest konsekwencją stabilizacji istnienia określonych poglądów, szkół

1) There were obtained precise estimates of the parameters for Poland and by regions (CVs up to 5% for mean expenditures, but up to 19% for the household structure).

Poniższe opracowanie opiera się na wybranych trzech najwcześniej powstałych teoriach: podsta- wowej teorii hierarchii potrzeb Abrahama Maslowa, teorii potrzeb ERG Claytona

standardy mainstreamu, pogłębiło z kolei orzeczenie prokurator Giseli Sjövall, która uznała, że na terenie Szwecji flaga Państwa Islamskiego jest legalna, jako

w zabytkowych wnętrzach Muzeum Piastów Śląskich w Brzegu odbywały się obrady XIII Colloquium Prawno-Historycznego przebiegające pod hasłem „Miasto – idea,