• Nie Znaleziono Wyników

OF HORVITZ-THOMPSON STATISTIC UNDER POISSON SAMPLING DESIGN

N/A
N/A
Protected

Academic year: 2021

Share "OF HORVITZ-THOMPSON STATISTIC UNDER POISSON SAMPLING DESIGN "

Copied!
10
0
0

Pełen tekst

(1)

Janusz L. Wywiał

Uniwersytet Ekonomiczny w Katowicach

ON LIMIT DISTRIBUTION

OF HORVITZ-THOMPSON STATISTIC UNDER POISSON SAMPLING DESIGN

Introduction

Let UN be a fixed population of the size N , so, N =2,3,... The elements of the population are identified. So, the population can be represented by the set:

{

N

}

UN = 1,..., . The observation of a variable under study is denoted by yk,N, N

k =1,..., , N =2,3,... So, the vector yN =[y1,N y2,N ...yN,N] is attached to the set UN. Particularly, when we assume that UNUN+1 then the observa- tions of a variable in the population can be represented more simply by the vec- tor: yN+1=[yNyN+1] where yN =[y1, y2...yN]. A more particular case is as follows. Let (yi,wiNt) means that the value yi is wiNt times replicated in the population, where Σki=1wi =1 and 0< w1<1 for i=1,2,...,k, t=1,2,..., and the vector yN =[yk yk,..., yk] is fixed. The size Nt of the population UNt is determined in such a way that wiNt is an integer for all i=1,...,k. So,

)]

, ( ...

) , ( ) ,

[( 1 1 t 2 2 t k k t

N y wN y w N y w N

t =

y .

We assume that the all elements of the population can be selected for the sample with different probabilities. A k-th population element, kUN, be selected for the sample with the inclusion probability 0<πk,N <1, k=1,...,N . More precisely, let SN =[S1,N ...SN,N] be the vector of independent binary ran- dom variables and

(2)

) 0 (

1 )

1

(Sk,N = = k,N = −P Sk,N =

P

π

(1)

So, sN =[s1,N ...sN,N] is the realization of the random sample

S

.

The probability distribution of the random sample

S

is known as Poisson sampling design [see, e.g. Tille 2006]:

N N k

k s

N k s

N k N N k

N

s

S

P ( = ) = ∏

=1

π

,,

( 1 − π

,

)

1 , . The population total

=

UN

k N

y

k

y

,

~

can be estimated on the basis of the Horvitz-Thompson statistic [1952]:

=

N

N k U kN

N k N k HTS

S y y

, , ,

π

.

It is well known that

E ( y

HTSN

) = ~ y

if

π

k,N

> 0

for all

k = 1 , ..., N

. Because of P(Sk,N =1,Sh,N =1)=

π

k,N

π

h,N for all

k = 1 , ..., N

,

h = 1 , ..., N

and

k ≠ h

the variance of the statistic

HTSN

y

is:

( )

= −

N N

U

k kN

N k N k HTS

y y V

, , 2

,

( 1 )

π

π

. (2)

Its unbiased estimator is:

( )

= −

N

N k U kN

N k N k N k HTS

S y y

V

2

, , , 2

,

( 1 )

π

π

. (3)

Let

=

UN

k

N k

N y

b3, N1 , 3

and

=

UN

k N k

N y

v2, N1 2,

=

UN

k N k

N y

v4, N1 4, .

(3)

The original version and the proof of the Lapunov's [2001] theorem, which is slightly less general, can be found in the monograph by Fisz [1963]. On the basis of the books by Billingsley [2009] or Jakubowski and Sztencel [2004], the following more general version of the theorem is presented.

Theorem 1. Let

Z

k,N,

k = 1 , ..., N

,

N = 1 , 2 , ...

be a sequence of independent random variables and for some

δ > 0

2

→ 0

=

δ+δ

β

N N

N

C

B

if

N → ∞

(4)

where:

=

+

= N

k

N k N

N EZk EZ

B

1

2 ,

, ( ) δ

δ ,

=

= N

k

N

N V Zk

C

1

, )

( . (5)

Under these Lapunov's conditions, the random variable:

( )

N N

k

N k N

k

N

C

Z E Z Z

=

=

1 , ,

) (

converges in distribution to the normal standard distribution if

N → ∞

. Hájek [1964] considered a limit distribution for the following statistic

( )

=

= N

k

k HTS k

S S

N y r

H N

1

π

where:

( )

( )

=

=

=

N

k

k k N k

k N

y

k

r

1 1

,

1 1

π π

π

.

He proved that the probability distribution of the statistic HS tends to the normal distribution because it fulfils the well known Lindeberg condition.

In the next section, the limit theorem for the estimator

HTSN

y

will be con- sidered.

(4)

1. Limit theorem

Firstly, let us formulate the following statistics and the theorem.

) (

~

N N

HTS HTS

N V y

y

T y

= ,

) ( ˆ ~

N N

N

HTS S HTS

N V y

y

T y

= . (6)

We say that

π

k,N

= O ( N

α

)

if for all

0 ≤ α < 1

there exists such a1 and

a

0 that

0 ≤ a

1

a

0

< 1

and

{ }

{ max } 1

max

0 < a

1

N

α

N=1,2,... k=1,...,N

π

k,N

a

0

N

α

<

(7)

or

{ }

{ }

max 1

0 <

1

≤ max

=1,2,... =1,..., ,

a

0

<

a

N

N

αk N

π

kN

Particularly, if

α = 0

,

{ }

{ max } 1

max

0 < a

1

N=1,2,... k=1,...,N

π

k,N

a

0

<

Moreover,

π

k,1N

= O ( N

α

)

because for all

0 ≤ α < 1

there exists such

0 1 0 1

1

1 1 c

a ca < ≤

< that

α α

π c N

N c

N k N k

N 0

, ,..., 1 ,...

2 , 1 1

max 1 max

1 ≤

⎪⎭

⎪ ⎬

⎪⎩

⎪ ⎨

⎪⎭

⎪ ⎬

⎪⎩

⎪ ⎨

≤ ⎧

<

= = (8)

) (

1

1

,

π

kN

− = O N

α because for all

0 ≤ α < 1

there exists such

0 0

1

0<d1cNα <cNαd that

(5)

α α

π d N

N d

N k N k

N 0

, ,..., 1 ,...

2 , 1 1

max 1 max

0 ≤

⎪⎭

⎪ ⎬

⎪⎩

⎪ ⎨

⎪⎭

⎪ ⎬

⎪⎩

⎪ ⎨

≤ ⎧

<

= = (9)

) ( ) ( )

( N

α

O N

γ

= O N

α+γ

O

because for all

α ≥ 0

and

γ

≥0 from the inequal- ities 0<d1Nαd0 and 0< g1Nγg0 results the following one

0 0 0 1

1

0<e1d gNαd ge (10)

Finally,

O ( N

α

) O ( N

γ

) = O ( N

αγ

)

because for all

α ≥ 0

and

γ

≥0 from the inequalities 0<d1Nαd0 and 0<g1Nγg0 results the fol- lowing one

0 1 0 0

1

0

1

l

g N d

g

ld ≤ ≤ ≤

<

αγ (11)

Theorem 2. Let

{ } π

k,N

= O ( N

α

)

for all

0 ≤ α < 1

and

<

<

0 2, 2

0 v v

N

v

and

b

3,N

b

3

< ∞

for

N = 1 , 2 , ...

When

N → ∞

then TN ⎯⎯→d T ~ N(0,1). When additionally

v

4,N

v

4

< ∞

then

) 1 , 0 (

ˆ T ~ N

T

N

⎯ ⎯→

d .

Proof: On the basis of the theorem 1, it is sufficient to assume that

δ = 1

. The Horvitz-Thompson statistic can be rewritten in the following way.

= ∑

= N k kN

HTS

Z

y

N

1 ,

where:

N k

N k N k N k

S Z y

, , ,

,

= π k = 1 , ..., N

On the basis of the expression (1), we have:

⎪⎩

⎪⎨

=

= =

=

0 1

) , (

, ,

, , ,

, ,

,

N k N k

N k

N k N k N k N

k N k

z z y z

Z P

if if

π π

π

(12)

(6)

and

N k N

k

y

Z

E (

,

) =

, ,

N k

N k N k

Z y E

, 2 2 ,

,

)

( = π

,

⎜⎜

⎛ −

= 1 1

) (

, 2

, ,

N k N k N

k y

Z

V

π

,

k = 1 , ..., N

( )

( +

=

=

kN kN

N k

N k N

k N k N k

N k N

k N

k

S y y E

Z E Z

E

3 , 3 ,

, 3 3 ,

, 3 ,

, 3 3 ,

,

,

( ) 1 π π

π π π

( )) ( )( ( ) ) (

kN

)

N k

N k N k N

k N

k N

k N k N

k N k

y y

2 , ,

3 2 ,

, 2 , 2 ,

, 3 , ,

3

,

1 1 1 1 π

π π π

π π π

π − = − − + ≤ −

+

.

This and the expression (5) for γ=1 lead to the following.

( )( ( ) )

=

+

=

N

k kN kN kN

N k

N k N

B y

1

2 , 2 , 2 ,

, 3

,

1 π 1 π π

π

,\

( )

) 1 (

1 ,

, 2

,

HTSN

N

k kN

N k N k

N

y V y

C − =

= ∑

=

π

π

, On the basis of the expressions (7)-(11) we have:

( )

( )

⎟ ⎟

⎜ ⎜

⎟ ⎟

⎜ ⎜

⎛ −

+

=

=

=

=

2 3

1 ,

2 , 1

2 , 2 , 2 ,

, 3 ,

3

1 1 1 ) 1

(

N

k kN

N k N

k

N k N

k N

k N

k N k

N N N

y y

C B

π

π π

π π β

( ) ( ) ( ) ( )

( )

=

⎟⎠

⎜ ⎞

⎛ −

=

⎟⎟

⎜⎜

⎟⎟

⎜⎜

⎛ −

⎟⎟

⎜⎜

⎛ −

=

=

=

=

2 3

1 2

, 1

3 ,

2 3

1 ,

2 ,

1 , ,

3 ,

1 1

1 1 1 1 1

N k

N k N

k N k

N

k kN

N k N

k kN kN

N k

N O y

N O N O y

y y

α α α

π π π

( ) ( )

( )

( ) ( )

( )

3/2

(

( 1)/2

) .

0 2 / ) 1 ( 3

3 1 2 2

/ 3

, 2 2 / ) 1 ( 3

, 3 1 2

2 3

1 2

, 1

3 , 2

+

+ +

+

=

=

= ≤ =

⎟ ⎠

⎜ ⎞

=

α

α α α

α

α α

N v O

N O

b N O v

N O

b N O y

N O

y N

O

N N N

k N k N k

N k

(7)

It is easy to show that

β

→0, when

N → 0

to and 0≤

α

α

1 <1. This and the theorem 1 lead to the conclusion that

T

N

T ~ N ( 0 , 1 )

.

In order to prove the second part of the theorem, we firstly show that

)

( / )

(

N N

N HTS HTS

S

N

V y V y

R =

converge in probability to 1. The expression (3) leads to the variance of the sample variance of Horvitz-Thompson statistic:

⎟=

⎜⎜

⎛ −

=

= N

k k

N k N k k HTS

S N

S V y

y V

V N N

1 2

, , 2

) (

) 1

)) ( (

(

π

π

( )

=

=

⎟ =

⎜ ⎜

⎛ −

− =

=

U k

k U

k kN

k N

k

k k

N k

k

V S y y O N

N

y

4 3

3

, 4 ,

1 4

2 , 4

1 ) ( 1 1

) ) (

( ) 1

(

α

π π

π

(

(3 )

) ∑

4

(

3 1

)

4,

.

=

+

=

U k

N

k

O N v

y N

O

α α

Hence, on the basis of the expression (2) we have

( )

( ) ⎟⎟ =

⎜⎜ ⎝

⎛ −

=

⎟ ⎟

⎜ ⎜

⎟ ⎟

⎜ ⎜

⎛ −

⎟ ⎟

⎜ ⎜

⎛ −

=

=

2 2

4 3

2

, 2

3

, 4

2

1 ) (

1 ) (

1 1 1 1 )

(

)) (

) ( (

α α

π π

N O y

N O y

y y y

V y V R V

V

N N N

N N N

U k

k U k

k

N U k

k k

N U k

k k

HTS HTS S N

) ) (

(

) (

) (

)

(

1

2 0 ) 1 ( 2

4 1 3

2 , 2 ) 1 ( 2

, 4 1 3

+

+ +

+

=

=

αα αα

O N

α

v N

O

v N O v

N O

v N O

N N

Hence, V(RN)=O(Nα1)→0 when

N → ∞

and

0 ≤ α < 1

. So, this and the well known Tchebyshev's inequality lead to the conclusion that that

) ( / )

(

N N

N HTS HTS

S

N

V y V y

R =

converges in probability to 1 (in short:

⎯→1

p

RN if

v

0

> 0

, v4 <∞ and

N → ∞

. Let us note that

N N

N R

U =ˆ U .

(8)

Hence, when

N → ∞

then TN ⎯⎯→d T ~ N(0,1) and RN ⎯⎯→p 1. So, this and the well known Slucky's lemma, see e.g. Van der Vaart [2007], let us conclude that

T ˆ

N

⎯ ⎯→

d

T ~ N ( 0 , 1 )

. So, the proof of Theorem 2 has been completed.

2. Applications

The Poisson sampling design is frequently used to model non-response. In this case,

π

k ,N is the probability that a

k

-th population element will respond.

The Poisson sampling design can be treated as a model of the Internet research.

In this case,

π

k ,N is the probability that a

k

-th Internet user will respond.

Moreover, the Poisson sampling design can be considered in an audit sampling.

Let us note that, in the cases mentioned the probabilities

π

k ,N,

k = 1 , ..., N

,

...

,

= 2

N

are usually defined as follows.

=

=

N

1 i

N

x

k,

n

N i N

k

x

,

π

,

where

n

is the expected sample size and

x

k is a value of a positive auxiliary variable

x

observed in all the population. Let us assume that

<

< x

k ,N

b

0 α

and

n = wN

for all

k = 1 , ..., N

,

N = 2 , ...

where

1

0 < < ≤ b

w a

. So, in this case, the first assumption of the Theorem 2 is ful- filled, because

1 0< 0 = = ≤ , ≤ = =a1 <

a wb aN

nb bN

na b wa

a

π

kN

Theorem 2 lets us construct the confidence interval for the mean value es- timated by means of the Poisson-Horvitz-Thompson strategy. Let

γ

be the con- fidence level and let

u

γ be such a quantile that

2 ) 1

( γ

φ u

γ

= +

where

φ

(u) is the distribution function of the standard normal variable. When

N

is sufficient-

(9)

ly large, the confidence interval for the population mean is determined by the expression:

( −

γ

( ) < ~ < +

γ

( ) ) = γ

N N N

N N

N S HTS HTS S HTS

HTS

u V y y y u V y

y

P

.

It is possible to test the hypothesis on the population mean. The hypothesis

0 0

: y ~ ~ y

H =

can be tested on the basis of the statistic defined by the expression (6) when

N

is sufficiently large.

Finally, let us note that if the Lapunov's condition is fulfilled, the Lindeberg's condition is fulfilled [see, e.g. Billingsley 2009], too. Hence, if the assumptions of the above theorem 2 are fulfilled, the assumptions of Hájek's theorem are fulfilled, too. Moreover, it seems that in our case the assumptions of the theorem 2 are verified more simply than the Lindeberg's ones.

Acknowledgements

The research was supported by the grant number N N111 434137 from the Ministry of Science and Higher Education.

Literature

Billingsley P. (2009): Prawdopodobieństwo i miara (Probability and Measure). Wydaw- nictwo Naukowe PWN, Warszawa.

Fisz M. (1963): Probability Theory and Mathematical Statistics. Wiley and Sons, New York.

Hájek J. (1964): Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population. “The Annals of Mathematical Statistics”, No. 35, 4.

Horvitz D.G., Thompson D.J. (1952): A Generalization of the Sampling without Replacement from Finite Universe. “Journal of the American Statistical Associa- tion”, No. 47.

Jakubowski J., Sztencel R. (2004): Wstęp do teorii prawdopodobieństwa (Introduction to Probability Theory). SCRIPT, Warszawa.

Lapunov A.M. (1901): Nouvell forme du theorem sur la limite de probabilite. „Mem.

Acad. Sci. St. Pétersburg”, No. 12.

Tillé Y. (2006): Sampling Algorithms. Springer, New York.

Van der Vaart A.W. (2007): Asymptotic Statistic. Cambridge University Press, Cam- bridge, New York, Melbourne, Madrit, Cape Town, Singapore, Sao Paulo.

(10)

O ROZKŁADZIE GRANICZNYM STATYSTYKI HORVITZA-THOMPSONA DLA PRÓBY DOBIERANEJ ZGODNIE Z PLANEM LOSOWANIA POISSONA

Streszczenie

W pracy na podstawie znanego twierdzenia centralnego Lapunowa jest wyprowa- dzany rozkład graniczny prawdopodobieństwa znanej statystyki Horvitza-Thompsona (HT). Okazało się, że jeśli określane przez plan losowania Poissona prawdopodobień- stwa wylosowania do próby poszczególnych elementów populacji spełniają pewne zało- żenia oraz rozmiar populacji rośnie nieograniczenie, to rozkład standardowej postaci sta- tystyki HT zmierza do rozkładu normalnego standardowego. Taki sam wynik otrzymano przy dodatkowym założeniu narzuconym na prawdopodobieństwa wylosowania elemen- tów populacji do próby, gdy w standardowej postaci statystyki HT jej odchylenie stan- dardowe zastąpimy przez pierwiastek z nieobciążonego estymatora tej wariancji.

Rezultaty pracy znajdują zastosowania np. w pewnych typach badań ankietowych, a w szczególności internetowych, wykorzystujących wnioskowanie statystyczne, czyli estymację przedziałową lub testowanie hipotez statystycznych.

Cytaty

Powiązane dokumenty

Then there exists a Riemannian metric on GR(F ) in which the foliation by fibres of the natural projection onto M is totally geodesic.. Taking into account Theorem 1 we have

Fundamental rights, as guaranteed by the European Convention for the Protection of Human Rights and Fundamental Freedoms and as they result from the constitutional traditions

The original proof used the transfinite induction method and was rather

The proofs above were given for the sake of completeness and because of their simplicity, but it should be noticed that they are only special cases of well known, far more

It should also be noted that funds for the new member states from Central and Eastern Europe account only for approximately one half of the budget of the policy, despite the fact

Liczba wszczętych przez Komisję postępowań naruszeniowych w stosunku do każdego państwa członkowskiego w ramach nadzorowania implementacji równościowego.. i

Wynikiem szczytu powinno być potwierdzenie wspólnych pod- staw: wartości takich jak godność, uczciwość i solidarność; zasad niesienia pomocy humanitarnej; poszanowania

The aim of the present paper is twofold: first, we give a complete proof of Jouanolou’s theorem together with a detailed discussion of all its steps; second, we show on examples