• Nie Znaleziono Wyników

Estimation of Population Averages on the Basis of a Vector of Cluster Means

N/A
N/A
Protected

Academic year: 2021

Share "Estimation of Population Averages on the Basis of a Vector of Cluster Means"

Copied!
9
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FO LIA O E C O N O M IC A 175, 2004

Ja n u s z W y w ia ł*

E S T IM A T IO N O F P O P U L A T IO N A V E R A G E S O N T H E B A SIS O F A V E C T O R O F C L U S T E R M E A N S **

Abstract. The estim ation o f a vector o f mean values is being considered. T he vector estimator consists o f sim ple cluster sam ple means. It is assumod that a pop ulation o f a fixed size is divided into m utually disjoint clusters each o f the same size. T h e variance-covariance matrix o f the vector estim ator is derived. It is a function o f a hom ogeneity matrix o f m ultidim ensional variable which describes w ithin-cluster spread o f the m ultidim ensional variable under research. The accuracy o f estim ation is measured by m eans o f standard deviations o f particular sam ple cluster m eans as well as by m eans o f the trace or the determinant or the maximal eigenvalue o f the variance-covariance matrix o f the vector estimator. T he accuracy o f the vector o f simple sample cluster m eans is com pared with the accuracy o f the vector o f the sim ple sample means. T he accuracy o f the vector o f simple sample cluster m eans increases when the degree o f within-cluster spread o f the distribution o f a m ultidim ensional variable increases. Hence, the population should be divided into such clusters that the within-cluster spread is as large as possible.

Key words: cluster sam ple, vector estim ation, clustering m ethods, generalised variance relative efficiency, h om ogeneity coefficient o f m ultidim ensional variable, eigenvalue o f variance- covariance matrix.

1. T H E BA SIC P R O PE R T IE S O F T H E VEC TOR OF C L U STER M E A N S

A fixed population o f the size N is denoted by П. It is convenient to treat the p o p u latio n as a subset o f the n atu ra l num bers: Q = {1, 2, N j . Let us

assume th a t the popu latio n Q is divided into G such m utually disjoint clusters с

(P = 1»

G)

th at 1J Qp = Q. If each cluster is o f the sam e size deno ted by

. _ p= i

M , the population О is o f the size N = GM. Let S be the cluster sam ple o f the size g. T h e ra n d o m sam ple S is draw n according to the follow ing design:

* Prof., D epartm ent o f Statistics, University o f Econom ics, K atow ice, e-mail: wy- 'via@ lae.katow i ce.pl.

** The research was supported by the grant number 1 H 02B 015 10 from the Polish Scientific Research Com m ittee.

(2)

W = 77. 1

G

9

A k -th (k = l, N ) o u tco m e o f an i-th (i = 1, m) variable is d eno ted

by y ki. T h e sum o f observ atio n s o f an i-th variable in a p-th clustcr is as follows:

Zip Ун -kell.

T h e m ean value o f an i-th variable in a p-th cluster is:

y = M Zpľ

T h e m ean value o f an i-th variable per cluster is:

z i = r I zpi

-G P = 1

T h e p o p u latio n m ean o f an i-th variable takes the follow ing form : 1 e

y ‘ = zi>r iyp= i

T h e v ariance-covariance m atrix is denoted by: С = [cov(y;, уД], where:

1 G

CO v ( y if y j ) = —— - £ E ( У и ~ У д ( У ы - У j

)-N P — 1 k e i i p

T h e v a ria n c e -c o v a ria n c e m a trix o f clu ster sum s is d e n o te d by: C : = [cov(z„ Zj)], where:

1 о

COv(z„ Zj) = - — - £ (Zpi - Zi)(zpi - Z k).

u — 1 P = 1

T h e estim ato r o f the vector y = [ ý ^ y m] is defined as the vector

(3)

£ 1Ĺ Ун — TT} Y , z pi

9 M pts и а, 9M

T h e vector y ieS is the unbiased estim ato r o f th e m ean vector y.

T h e covarian ce o f the estim ators y ^ , y JgS (i # / ' = 1, .... m) can be derived sim ilarly as variance o f y igS (i = 1...m), see e.g. W. G . C o ­ c h r a n (1963) o r C. E. S ä r n d a l , В. S w e n s o n , J. W r e t m a n (1992).

G - g

COv(yiiS) y jeS) = — - -2c o v (z „ Zj) (2)

T h e v ariancc-co variance m atrix o f the y gS can be w ritten d ow n in the follow ing way:

V (y ,s . P , ) - | ^ C ( z ) (3)

where: C (z) = [cov(z(, Zj)\.

T h e unbiased estim ato r o f the covariance is obtain ed th ro u g h su bstitu tion o f the follow ing statistic for the p aram eter cov(z(, z}):

COVs (zi; Zj) = --- X (Zpi- Z ()(zw - Z j ) .

9 1 peS

2. H O M O G E N E IT Y C O E FFIC IE N T OF M U L T ID IM E N S IO N A L V A RIA BLE

L et C b = [соу4(у;, уД] be the betw een-cluster m atrix o f the variances an d covariances, where:

1 c

c o v í , ( V i , y,) = r — 7 Z (yip - y i)(y jp - yj)-

17 1 p=i

T h e w ithin-cluster m a trix o f the variances and covariances is d en o ted by C w = [ c o v ^ . , у Д where:

1 c

co v wcv,., y j ) = —— — - £ E Су* - yip)(y» - yjp)■ l ) p = l k e a.

(4)

Sim ilarly to the one dim ensional case (sec e.g. C o c h r a n 1963, p. 243) the v ariance-covariance m a trix С can be decom posed in the follow ing way:

( N - 1 )C = (G - 1 ) M C b + (N — G)CW (4)

T he m atrix C (z) can be rew ritten as follows:

C(z) = M 2C h (5)

T h is expression and the eq u a tio n (4) lead to the follow ing results:

C <z> = G —1 ((N " 1)C ~ {N ~ G ) C J

C(z ) = M c ( l + ^ ^ á ) (6)

where:

G - 1

A = I — C -1C W (7)

In the case o f an one-dim ensional variable y t, w hen С reduces to the v ariance v a r y ; an d C w is th e w ith in -clu ster v aria n ce v a rw th e m a trix A reduces to th e h o m o g en eity coefficient (see S ä r n d a l , S w e n s o n , W r e t m a n 1992, p. 130): = 1 <8> where: var(y f) = - X I (Ул - y t) 2, У i = Tr t I yia (9) p = 1 teil, ^ р =1*бП, 1 e j v a rwCy,) = —— — - X I ( y i k - y ip) 2, У ip = x . Z У* (10) G (M — 1) p= ! M

T h en, the m a trix A can be treated as generalization o f th e ho m ogeneity coefficient Ö. T h a t is w hy the m atrix A can be nam ed as hom o gen eity m atrix o f m u ltid im en sio n al variable.

(5)

Theorem 1. I f the variance-covariance m atrix С is n o n -sin g u lar th en the eigenvalues A, (i = 1, m) o f the m atrix A fulfill the follow ing inequalities:

G - 1

N — G< A f < ; l , for each i = l , m ( I D

P roof. T h e ch a rac te ristic eq u a tio n for the m a trix A can be tran sfo rm ed as follows: IA — AI I = 0 |I —C ”1C WAI| = 0 |C _1C w- k I | = 0

(

12

)

(13) where к = (1 — A). Since th e m atrix C _1C W is positive sem i-definite its eigenvalues k , ^ 0 for each i = 1, m. H ence, the eigenvalues o f th e m atrix

A are: A ,< 1 fo r each i = 1, m.

Since the m atrix C b is positive sem i-definite the e q u a tio n (4) leads to the m atrix

A j = (N — 1)C — (N — G)CW

which is positive sem i-definite, to o . Because the m atrix С is positively defined the follow ing m a trix is positive sem i-definite:

A , = 1 С % „ , . N — 1 — _ I _ C 1C-„ , , ,

N — G N - G

A fte r sim ple alg ebraic tran sfo rm atio n s we have:

й

'

Let us d o the follow ing tran sform ations: IA — A11 = 0,

N - G N - G = 0,

(14)

(6)

where:

Since the m a trix Л2 is positive sem i-definite the eigenvalue ę, > 0 for each i = 1, m. H ence, on the basis o f the expression (16) we have:

W e can say th a t the w ithin-clustcr spread o f ob serv atio n s o f a m u lti­ dim ensional variable is less th a n their p o p u latio n spread if th e m a trix A is positive definite. W hen A is negative definite, then we say th a t the po pulatio n spread o f values o f a m ultidim ensional variable is less th an the w ithin-cluster spread.

Let y s be the vector o f the m ean from the sim ple ra n d o m sam ple o f the size n, selected w ith o u t replacem ent from a p o p u la tio n o f th e size N. Its variance-covariance m a trix is o f the follow ing form :

for i = l, ..., m. T his com pletes the proof.

N — G

3. A C C U R A C Y O F A C L U STER SA M P L E M EA N VECTOR IN R ELA TIO N T O S IM P L E SA M P L E M EAN VECTOR

(17) where:

O n the basis o f the eq u atio n s (3) and (6) we have:

(7)

Hence:

V(ys , P.) - V (y „ , ;>,) = N ~ Ng -_ ° С Д (20)

or

V(ys.P ,) - V ( y , s,P ,) = V d - ľ ' C - C . ) ( 2 0

T his leads to th e follow ing property:

Theorem 2. If the m a trix ( C - C w) is non-positive definite (non-negative definite) then the strategy V(y(S, P s) is n o t w orse (n o t b etter) th a n the strategy V(ys , P s). P articu larly , if the m atrix С is no n sin g u lar and the A is non-positive definite (non-negative definite) then th e strategy V(yiS, Pg) is no t worse (n o t better) th a n the strategy V(yS, P S).

H ence, T h e strategy V(ygS, P e) is no t w orst th an th e strategy V(yS, P S), if the w ithin-cluster spread o f a m ultidim ensional variable represented by the m atrix C w is larger than its population spread represented by th e m atrix C.

Let us d en o te the variance o f a strategy, the d eterm in a n t, the trace and the m axim al eigenvalue o f a variance-covariance m a trix o f a vector strategy by D 2(., .), det(., .), tr(., .) and Я1(., .), respectively. T h e relative efficiency coefficients are defined as follows:

e°‘ =

= 1 + | ž f а д - * - 1... m

<22>

where 0(yt) expresses the form ulas (8- 1 0).

- - d « V » . ^ = d e t ( l + ^ A ' | (23) d et V(ys , P s) V G - 1 e _ t r V ( y eS, P e) N - G where: <5= Ž<5(y,)a„ i= i

(8)

var (у,) Z v a r (yf) i=i

ß3 Я Д Ь Р з )

(25)

Theorem 3. If the m atrix С is positive definite and m atrix A is non-positive (non-negative) definite, th en ek ^ 1 for fc = 1 ,2 , 3 and e0i ^ 1 for i = 1, ..., m. P articu larly , if th e m a trix A is negative (positive) d efin ite, ek < 1 fo r

k = 1,2 , 3 and е0 < < 1 for i = 1, m and e0j < l for a t least one index 7= 1, ..., m.

C. R. R a o (1982, p. 89), showed: if В is positive definite and ( A - B ) is no n-negative definite then dct(A ) > dct(B ). T his and the expression (7) lead to inequality < 1. T h e properties o f the trace o f a sum o f m atrix es lead to th e inequality e2< l . If the m atrix A is n on -p ositive definite, the m atrix (C — C w) is non-positive definite, to o . Let Aj(A) be the m axim al eigenvalue o f a m atrix A. Hence:

If (C — C w) is non-negative defined then for all non -zero vectors y: A,(C) = m a x { a TC a}, «T« - i A1(C w) = m a x { p TC wP}-7tCy - YTC WY s* o (26) Hence: a TC a - a TC wat = AX(C) - a TC wa ^ 0, PTC p - p TC J = ßTC ß - A 1(C w) > 0 , Aj(C) — Aj(Cw) > pTCp — Aj(Cw) ^ 0,

A ^ O ^ A ^ C J .

T his leads to inequality: e3< l . T he inequality (26) let us derive the inequalities e o i ^ 1> 1 = 1> m w hen we assum e th a t the elem ents oi the

(9)

T h e strategy (y eS, P e) can be b etter th an the strategy (yS , P S) if the m atrix (С - C J is negative definite. It m eans th a t the w ithin-cluster spread o f values o f the m ultidim ensional variable (u nder research) should be bigger th an the p o p u latio n spread o f observations o f those variables.

REFEREN CE

C o c h r a n W. O . (1963), Sam pling Techniques, John W iley, N ew York. R a o C. R. (1982). M odele liniowe s ta ty s ty k i m atem atycznej, PW N , W arszawa.

S ä r n d a l C. E., S w e n s o n B., W r e t m a n J. (1992), M o d el A ssiste d Survey Sam pling, Springer-Verlag, N ew Y o r k -B e r lin -H eid elb e r g -L o n d o n -P a r is-T o k y o -H o n g K on g-B ar- celona-B udapest.

J a n u s z W y w i a ł

E S T Y M A C JA W A R T O ŚC I PR ZEC IĘTN Y C H W P O P U L A C JI N A P O D S T A W IE W EKTORA ŚR E D N IC H Z PR Ó B Y K R U P O W E J

Zakłada się, że skończona i ustalona populacja jest podzielona na rów noliczne i rozłączne grupy. N a podstaw ie prostej próby grupowej jest wyznaczany wektor średnich, który daje oceny wektora przeciętnych w populacji. W yprow adzono macierz wariancji i kowariancji wektora wartości średnich z próby grupowej. Jest ona zależna od macierzy wewnątrzgrupowej jednorodności rozkładu wielowym iarowej zmiennej. Precyzja estymacji jest oceniana za p om ocą wariancji poszczególnych średnich z próby grupowej, śladu, wyznacznika lub m aksym alnej wartości własnej macierzy wariancji i kowariancji. Precyzja wektora średnich z próby grupowej jest porów nyw ana z precyzją wektora średniej z próby prostej. Okazuje się, że wektor średnich z próby grupowej jest precyzyjniejszy od wektora przeciętnych z próby prostej, gdy stopień w ewnątrzgupowego zróżnicow ania wartości zmiennych jest dostatecznie duży.

Cytaty

Powiązane dokumenty

We obtained, arising out of the proof, the rather unexpected sufficient condition for equality to occur in (2) for the classes Kd- However, because the proof used a

The torsion of the null vector field on M relative to the affine connection T equals to the torsion tensor of the linear connection T whose extension by the canonical form is

On Integral Means of the Convolution Średnie całkowe dla

[r]

Hogarth więc nie mógł się był urodzić po Byronie, a dziś urodzony wcale byłby inaczej malował.. Sztuka stała się dziś surowszą: wstąpił w nią duch namysłu, duch

The aim of the current paper is to present a way of estimating the number of residents included in the study population frame using the cross-classification of the population

64 AAN, PRM, sygn. Wytyczne polityki wobec mniejszości niemieckiej; A. 2241, Ściśle tajna notatka z konferencji międzyministerialnej z dnia 15 listopada 1937 pod przewodnictwem

Węzeł może być podobny do selektora i zwrócić sukces, gdy którykolwiek potomek zwraca sukces lub podobny do sekwencji i zwracać sukces, gdy wszyscy potomkowie zwracają