• Nie Znaleziono Wyników

On Some Composite Estimator of the Population Mean

N/A
N/A
Protected

Academic year: 2021

Share "On Some Composite Estimator of the Population Mean"

Copied!
9
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S ____________ FOLIA OECONOMICA 216,2008

Wojciech G a m r o ť

O N S O M E C O M P O S I T E E S T I M A T O R O F T H E P O P U L A T I O N M E A N

ABSTRACT. In this paper an estimator o f the finite population mean in the unit nonresponse situation is proposed. It is constructed as a combination o f the well-known regression estimator derived from the linear model and a reweighting-type estimator based on a logistic regression model. Combination weights depend on goodness o f fit of respective models. Hence, the estimator for which the corresponding model better describes observed sample data dominates in the combination. Some Monte Carlo simulation results revealing its properties are presented.

Key words: nonresponse, regression, weighting adjustment.

I. INTRODUCTION

Consider a finite and fixed population U o f size N. A mean value 7 - N " L . „ y . ° f some characteristic Y taking values y \...уц, is to be estimated. A sample s o f size n is drawn from U according to the sampling design p(s) determining the inclusion probabilities o f the first order denoted by Л| for i ,je U . Assume stochastic nonresponse that does not depend on the sample. Hence, an individual response probability P| may be associated with each unit and the sample s is randomly divided into subsets: Si and s2, containing responding and non-responding units respectively. Under nonresponse, the well- known Horvitz-Thompson estimator o f the population mean is biased, when computed solely on the basis o f responding units. Bethlehem (1988) considers the following modification o f this estimator:

Ph.D. Department o f Statistics, University o f Economics, Katowice. [71]

(2)

and shows that its bias is aproximately equal to: B ( y MHT) = С и ( у (, р ; ) / р , where P = N ' ' ] T ieUp, and С и ( у , , р () = N ’ ^ ^ í y , - Y ) ( p j - p ) . Hence,

the lower the covariance between y t and pi, the lower the bias. This estimator will be denoted by the symbol MHT.

Consider the superpopulation model ę, stating that values yi,...,yN are realizations o f independent random variables Уь ...,Ум> satisfying:

p W ) = ßx,

l W ) = a 2 (2)

for i=l,..,N . The vector ß = [ß | v .,,ß k] and scalar a ’ are model parameters, while X| = [ x x denot es vector o f auxiliary characteristics Х|,...,Хц associated with i-th unit. Denoting X = [ x ,,...,x N] and у = [ у , y N] and applying ordinary least squares we obtain the best linear unbiased (with respect to £) estimator o f ß :

b = (X 'X )-1X 'y . (3)

The quantity b may be estimated from the sample by the statistic:

b = ( X XЛ ' 1 z V(6i| 71 i ) ŕ V « i 71 ‘ J (4)

Consider the regression estimator o f the population mean

У REG = У M H T ^ ( Х н т — Х м н т )

where x MT= N - | ^ . es( x i / n i ) and x MHT = (x i /7 ti ) / Z i6, , ( 1/7ti)- U is more accurate than y MHX when (2) accurately reflects reality. It will be denoted further by the symbol REG.

Another approach to construct nonresponse-corrected population mean estimator relies on assumed dependencies between auxiliary variables and response probabilities. These relations are represented by parametric models,

(3)

such as logistic model (see Rizzo ct al. 1996, Ekholm and Laaksonen 1991), stating that units respond independently with probabilities:

Pi=! ^

(6>

tor ieU , where X . . Д н ] is a parameter vector. Its maximum likelihood

» A A

estimate к = [A.| ] may be obtained using iterative methods considered e.g. by Minka (2001). Consequently, by replacing unknown pi’s with estimates

p, = ( l + e ') we obtain the following weighting-adjustment mean value estimator:

Утю = 7ľX ! (7)

N i«, *iPi

This estimator should be more acurate than y Min. when the model (6) accurately describes the behavior o f pi’s. In the following study it will be denoted by the symbol RHO.

II. COMPOSITE ESTIMATOR

I he attractiveness o f both regression and reweighting-type estimator depends on the ability o f underlying models to describe the behavior o f yi or p;. One may attempt to measure this ability. The goodness o f fit o f the regression model (2) may be measured by means o f the respondent subset determination coefficient given by formula:

R =

*i i

Z (b-x,--LS b V | /

(8)

Moreover, the goodness o f fit o f the logistic model may be measured by the log-likelihood function In L = L „ In Pj + ^ . es In (1 - p j or by the standarized quantity:

(4)

Rl = 2(^/ľn~Tľ — 0.5) (9)

that shall take values from the <0,1 > interval. Let us now consider a composite estimator:

У COM = ® REG У REG ® RHO У RIIO ( Ю )

where a REC = R / ( R + RL) and a RII0 = RL /(R + R, ) arc weights proportional to the goodness o f fit o f respective model. The composite estimator should behave like regression estimator when auxiliary information is more suitable for linear model, and behave like weighting adjustment estimator when available auxiliary information is more appropriate for logistic model. Hence the composite estimator should inherit the virtues o f both. In the following paragraphs it will be denoted by the symbol COM.

III. SIM ULATIO N RESULTS

A simulation study was carried out to examine the properties o f four estimators: MHT, REG, RHO and COM. Experiments were executed using pseudo-random number generator of multivariate Gaussian distribution. Four variables: Y, X b X2, X 3 were generated for 10000 population units with Y being variable under study, X| being auxiliary variable for regression estimator, X2 being auxiliary variable for logistic model and X3 being unknown to sampler, determining individual response probabilities according to univariate logistic model: p ( = (1 + e x' . Simple samples were repeatedly drawn without replacement from the population. Survey behavior o f each unit was independently simulated assuming the response probability equal to p j. All estimators were computed using resulting incomplete data and their empirical distributions were examined. Three simulation experiments were carried out for correlation matrices between variables respectively equal to:

1 0.7 0.75 0.75 ' 1 0 0.9 0.9

0.7 1 0.7 0.7 0 1 0 0

R l =1 R i =

0.75 0.7 1 0.75 * 1 0.9 0 1 0.9

(5)

1 0.9 0 0.9

0.9 1 0 0.9

0 0 1 0

0.9 0.9 0 1

All standard deviations were set to one. Mean value vector was always equal to ц = [10,10,10,0]. Matrix Ri represents the situation when auxiliary data is suitable for both REG and RHO estimators. With R2 it is suitable only for RHO. With R3 it is suitable only for REG. All simulations were carried out for n = 50, 100, ..., 1000. Efficiency o f estimators relative to the MHT estimator is shown on graphs 1-3. In all three experiments it is computed for any estimator T as

rM S E (T ) = M S E (T ) / M S E (y MHT) . Ш C/D -REG -RHO -COM

(6)

o o o o o o o o o o

ю ю ю ю ю т ю ю т ю

T - C N C O ^ i O í O h - O O O í

Pic. 2. The relative efficiency as a function o f sample size n for correlation matrix R2

For all estimators the relative efficiency diminishes with growing sample size and then stabilizes for large values o f n, with notable exception o f REG estimator and R2 where it is approximately constant. Estimators REG and RHO are more accurate than MHT when their respective models fit well to the data. The estimator COM is more accurate than MHT when at least one o f these models fits well to the data. One may say that this estimator is more robust with respect to model misspecification than REG and RHO. M oreover for large sample sizes it has usually the lowest MSE, although the advantage over REG and RHO is modest.

n

(7)

0 -0 ,0 5 -0,1 -0 ,1 5 m -0,2 -0 ,2 5 -0 ,3 -0 ,3 5

Pic. 4. The bias as a function o f sample size n for correlation matrix R|

The bias o f all estimators is shown on graphs 4-6. All of them are biased negatively. The bias is constant or slowly diminishes with growing sample size n to stabilize when n is large enough. In absolute terms, the bias of MHT estimator was the highest in most cases. The estimators REG and RHO provide substantial bias reduction when auxiliary information is suitable and respective models fit well to the data. Otherwise, their bias is very close to the one o f MHT estimator. The bias of the COM estimator is contained between the biases of REG, RATIO and MHT. Usually it does not differ much from the lowest observed bias. In all experiments it is significantly lower than the bias of MHT so the composite estimator provides substantial bias reduction when at least one model fits the data well.

ю й ю ю ю ю й ю ю 4 - C M C O ^ f r l O C O r ^ O O O > I" I I » H I ' I I I ( Z H D o a o c i o a i H i o o o o o D o o i r a -MHT -REG -RHO -COM

(8)

o o o o o o o o o 0 1 П 1 П Ю Ю Ю Ю 1 0 Ю Ю Ю ч - с м 0 0 - 4 - ю ( 0 ь - 0 0 0 5 ■MHT ■REG ■RHO ■COM

Pic. 6. The bias as a function o f sample size n for correlation matrix R3

CONCLUSIONS

All simulations were carried out assuming strong dependency between the variable under study and response probabilities, which is highly unwelcome from the estimation viewpoint. Both regression and weighting-adjustment estimators allow to reduce bias and improve accuracy provided that respective model fits the data well. The proposed composite estimator reduces the bias and improves accuracy when any o f these two models fits well.

REFERENCES

B ethlehem J.G. (1 9 8 8 ) R eduction o f N on resp on se B ias Through R eg ressio n E stim ation J o u r n a l o f O ffic ia l S ta tis tic s V ol 4. N o. 3. 1988, 2 5 1 -1 6 0 .

E kholm A . L aaksonen S. (1 9 9 1 ) W eighting via R esponse M od ellin g in the Finnish H ousehold B udget Survey, J o u r n a l o f O ffic ia l S ta tis tic s V o l 7. N o 3. 3 2 5 -3 3 8 . M inka T .P .(2 0 0 1) A lg o r ith m s f o r M a x im u m L ik e lih o o d L o g is tic R e g r e s s io n T echnical

Report URL: http://w w w .stat.cm u.edu/tr/tr758/tr758.pdf.

R izzo L. K alton G. B rick J.M. (1 9 9 6 ) A C om parison o f S om e W eigh tin g A djustm ent M ethods for Panel N on resp on se, S u r v e y M e th o d o lo g y . V o l 22. N o 1. 4 3 -5 3 .

(9)

W ojciech G a m ro t

O PEWNYM ESTYMATORZE ZŁOŻONYM ŚREDNIEJ W POPULACJI

W artykule zaproponow no estym ator z ło żo n y średniej w populacji skończonej przy brakach o d p o w ied zi. Jest on kom binacją estym atora regresyjnego opartego na m odelu lin iow ym i estym atora w ykorzystu jącego w ażen ie danych opartego na m odelu logistyczn ym . W agi kom binacji uzależn ion o od miar dobroci dopasow ania tych m odeli do danych. Przedstaw iono w yn ik i sym ulacji w ykonanych dla zbadania j e g o w łasn ości.

Cytaty

Powiązane dokumenty

It contains general variables used in searching for dates: Julian day Number, Julian and Gregorian dates, week day name, Long Count date, 260-, 365- and 9-day cycles, year bearer of

The reverse processor keeps simulating while the forward processors exchange grid information.Then the position of the reverse particles are broadcasted and followed by the

Pierwsze kolo Przyjaciół Związku Strzeleckiego w powiecie powołano w W ieluniu 8 III 1931 r. Prezesem zarządu kola został starosta Bogdan Kaczorowski, a

In this paper the well-known two-phase sampling procedure is applied to estimate the finite population skewness under nonresponse. The properties o f proposed estimator

Dlatego tak istotne jest dla życia społeczeństwa, by każdy człowiek, także ten, który pobiera najniższe wynagrodzene, mógł z pracy na jednym etacie zaspokoić potrzeby

Elżbieta Ura – kierownik Katedry Prawa Publicznego, zwracając uwagę, że w konferencji udział biorą nie tylko pracownicy naukowi różnych dyscyplin naukowych (prawa

One immediately striking feature of this result is that the rate of convergence is of the same order as the rate of convergence of histogram es- timators, and that the

The aim of this study is to enhance bone regeneration of large bone defects using porous titanium scaffolds incorpo- rated with nanostructured colloidal gelatin gels for time-