• Nie Znaleziono Wyników

On Some Robust Against Outhers Predictor of the Total Value in Small Domain

N/A
N/A
Protected

Academic year: 2021

Share "On Some Robust Against Outhers Predictor of the Total Value in Small Domain"

Copied!
7
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FO LIA O E C O N O M IC A 175, 2004

J a n u s z W y w i a ł * , T o m a s z Ż ą d ł o * *

O N S O M E R O B U S T A G A IN S T O U T L IE R S P R E D IC T O R O F T H E T O T A L V A LU E IN S M A L L D O M A IN

Abstract T he problem o f prediction o f the total value in a dom ain based on simple regression superpopulation model (with one auxiliary variable and n o intercept) is considered. The problem o f robust estim ation against outliers o f regression fun ction’s parameter is show n. T he presented robust estim ator is m edian value o f gradients o f all straight lines each determined by the origin and one o f n points (x , y), where n is sam ple size, у - the variable o f interest and x - auxiliary variable. This estimator is simplified form o f the estim ator presented by H. T h e i l (1979). The equation o f the mean square error o f the robust predictor based on the robust estim ator o f regression’s parameter is derived for asym ptotic assumptions. The best linear predictor based on the considered superpopulation m odel is presented. T he equation o f m ean square error o f the BLU predictor is derived. The accuracy o f these predictors is com pared for the assum ption o f normal distribution o f variables o f interest.

Key words: small area statistics, m odel approach, robust estim ation.

1. IN T R O D U C T IO N

In survey sam pling, including small area statistics, tw o ap p ro ach e s are considered - ra n d o m and m odel ap p ro ach . T h ere is also a prob lem o f ro b u st estim ation , extrem ely im p o rta n t in respect o f practical aspects o f sam ple surveys especially su p p o rted by m odel ap p ro ach . T h e reason is, th a t the statistician in the case o f m odel a p p ro a c h m u st assum e som e su p erp o p u la tio n m odel and estim ate its param eters. In this p ap er, som e ro b u st p re d ic to r o f to ta l value in dom ain will be p ro p o sed , and it will be com pared with B LU p re d ic to r fo r assum ed, presented below , su p e rp o p u la ­ tion m odel. R ob u stn ess is considered in the co n tex t o f th e presence o f outliers.

* Prof., Departm ent o f Statistics, U niversity o f Econom ics, K atow ice. ** M A , Departm ent o f Statistics, U niversity o f Econom ics, K atow ice.

(2)

2. S U P E R P O P U L A T IO N M O D EL

F o llo w in g c o n s id e ra tio n s are based on sim p le re g ressio n m o d el assum ed fo r th e e n tire p o p u la tio n . It is a ssu m ed , th a t valu es o f auxiliary variable arc know n for all elem ents o f the p o p u la tio n . W ith regard to £ d istrib u tio n describing su p erp o p u la tio n m odel it is assum ed, th at У ,, Y n are independent and У, = щ + Ei, Hi = Е <(У,) = ß x h ЕДе,) = 0,

o f = D 2( y f) = D ^ e ,) = ст2 у ( х ,) , where ß , o 2 are u nk no w n and x if x N are know n for every i ( i = l , . . . , N ) . In follow ing co n sid eratio n it will be ad ditionally assum ed, th a t v(x,) = x f .

C o n sid e ratio n s are con d u cted for any sam ple design. It is assum ed, th a t the sam ple s is d raw n from the entire po pu latio n by sam ple design P(S) with first o rd e r inclusion p robabilities n t, w here i = 1, N . F o r any sam ple s with size n d raw n from p o p u latio n ę w ith size N, Ĺ1 = S u Š , where 3 den o tes elem ents o f the p o p u latio n , which were n o t d raw n to the sam ple. Let Sd — S n C l d, w here the d-th d o m ain is d en o ted by ęd. T h e size o f Sd equals nd (ran d o m variable) and the size o f ęd eq uals N d. T h e set o f elem ents o f p o p u latio n s which belong to d-th d o m ain ęd could be w ritten as Qd = Sdu S d, w here S d d enotes elem ents o f the d-th d o m ain , w hich were no t draw n to the sam ple.

In follow ing co n sid eratio n s n o tatio n s presented below will be used. G ra d ien ts o f all stra ig h t lines, each determ ined by the origin an d o n e o f n points (x, y), w here n is sam ple size, у - the value o f th e variab le o f interest and x - the value o f auxiliary variable, are considered:

3. C O N ST R U C T IO N OK PRED IC TO R

d )

where i = 1, ..., n.

Based on assum ed su p erp o p u la tio n m odel it is kn ow n , that:

E A ) = ß

(3)

Let us discuss tw o pred icto rs o f the to tal value in the d om ain :

T \ s t - ndY Sl + b u Y , x i (3)

where:

where:

b 2, = Me{/i} (5)

T h e estim ato r b2, is the m edian o f a sequence o f ra n d o m variables {/ij, ha}. It is p a rtic u la r form o f the estim ato r considered by H . T h e i l (1979). Let us n o te th a t the estim ato r b2S is ro b u st again st possible outliers. F ro m th e theorem presented by R. M . R o y a l l (1976) it is k n o w n , th a t T lSi statistic is B LU p red icto r for assum ed in section one su p erp o p u la tio n m odel. It m eans, th a t it is £ - unbiased p red icto r o f th e to ta l value in small area Y d = £ Y t and it m inim ises £ - variance fo r assum ed s u p e r­ p o p u latio n m odel.

Let us ad d itio n ally assum e th a t random variables е(, ť = 1, ..., N has continues d istrib u tio n s w ith different variances. H ence, from the e q u a tio n (2) it is kn ow n, th a t { ht , ..., hs } are sequence o f in d ep en d en t ra n d o m variables w ith the sam e d istrib u tio n s given by density fu n ctio n / ( . ) w ith the sam e expected values and variances. F inally , fro m k n o w n resu lts on distrib u tio n s o f sam ple q uantiles (e.g. F i s z 1976) it results, th a t b 2S is consistent estim ato r o f ß and for large sam ple size b2S statistic is well a p p ro x im a te d by n o rm a l d is trib u tio n w ith fo llo w in g p a ra m e te rs

F irst, th e m ean sq u are e rro r o f T 1Si statistic (given by the e q u a tio n (3)) will be analysed assum ing, th a t su p erp o p u la tio n m o d el GR is tru e. F ro m R oyall theorem ( R o y a l l 1979) it results, th a t

(4)

Е {Е р( Г 15, - У ä)2 = ° Z \ E p( z * < ) + ст2Ер ( 1 > ? ) (6) n \ l e S t J V i c S , /

Sccond, m ean sq u are e rro r o f T 2St statistic (given by e q u a tio n (4)) will be analysed assum ing, th a t su p erp o p u la tio n m odel GR is true. Let us notice, th a t because o f p aram eters o f asym ptotic d istrib u tio n o f b2S, for large sam ple size E i(b2S) & ß ' H ence it is easy to prove, th a t p red icto r T 2Si is approxim ately £, - unbiased p redictor o f the to tal value in sm all dom ain:

Z ^ + ^ Z * , - - 1 у, - £ у, =

_ l e S t i e S , ieOj J U e S , l e S j l e S , le.1, _

^ 2 S

E

X i —

E

^ i l

=

E < ( ^ 2s )

E

X l ~ E f ( X ^ i ) ~ ß E Xl ~ ß E Xi = 0

ieS, ieSj J leg, \ieSd / US, IzSä

(7) T h e m ean sq u are e rro r o f predictor T 2Si for n o n in fo rm ativ e sam ple design is as follows:

EťEp(T2S, -

Yd) 2

= E ,

Et (

£ У, +

h2S

£ x , - E

у)

=

\leSé ieSi ieíij J

= E p El ( b 2SY x i - Y yX = E pE i ( h 2 i Y Jx - l Y - J > , + =

V l e S d i e S 4 J \ ieSd i e S , l e S , ieSt J

tasí*!- Ей) +(I

Yi -

ľ л) - 2^ b2s Z xí - E 4 E

Yi -

X л)

A le S t ie S , j \ i e S d ie S j ) \ l e S , IeS, J \ l e S 4 ie S ä / .

because b 2S i £ У, are independent ra n d o m variables, we receive: ieSi

= E Á s Z x , - l á l l Y , - l ^ h l b - l f i k ^ I Z Y r E f t ) =0 V ie S t le S j / \ i e S j ie S j j \ ieSj ie S j ) VieS, ieS, /

and then:

E l E p( T 2Si- Y d) 2 = E pE i

(

ь и

Е ^ - Е

а

) + (

е

у<- E / 4*)

_ \ i e ie S d J V eS * ieS g J _

= EpE {

f Z

Xi] (b2S - ß ) 2

+ f Z

Yl

- Z л)

A í e S j / \ i e S j ie S , / _

(5)

2

= E , £ * < MSE{(b2S) + i D j i Y , ) =

{(

2

= E „ E x , M S E ^ + ^ Z x ? ,

where MSE<.(/>2s) — D 2( h 2S) + E^(b2S) — ß)2 ä D 2(b2S)

In p a rtic u la r, if ra n d o m variables У, have n o rm a l d is trib u tio n s , th en

Let us co m p are m ean square erro rs o f b o th p re d ic to rs assum ing, th a t У, have n o rm al d istrib u tio n s. O ne should rem em ber, th a t o u tliers can occur in the sam p le because o f som e d istu rb an ce s in d is trib u tio n o r e rro rs connected w ith d a ta edition. T his situation can im ply significant bias o f estim ates in th e case o f usage o f n o n -ro b u st pred icto r. B ut there is also problem o f a value o f the difference o f ro b u st pred icto r and B LU p re d ic to r’s m ean square erro rs. It equals:

T his difference is positive. T herefo re the B LU p re d ic to r is m o re ac cu ra te th an the ro b u st one. T h e difference is o f o rd er 0 ( n - 1 ). It m eans, th a t it decreases d ue to the increase o f sam ple size. H ence in large sam ples the accuracy o f b o th p re d icto rs is sim ilar b u t T 2Sd is ad d itio n ally ro b u st. It should be noticed, th a t the difference (8) is the sm aller, th e sm aller is P - expected value o f the to tal value o f auxiliary variable fo r n on -sam pled small d o m a in ’s elem ents. P roposed predictor should give good results for sam ple design p ro p o rtio n a l to the to tal value o f auxiliary v ariab le executed by i.e. L ahiri sam pling scheme.

(6)

5. C O N C L U SIO N

Sum m ing up, wc w ould like to state, th a t by analogy one can receive sim ilar results assum ing different sup crp o p u latio n m odels for d o m ain s or strata, bu t one c a n n o t forget, th a t asym ptotic con dition s m u st be m et to use d istrib u tio n p aram eters o f discussed regression coefficient. A lth o u g h possibility o f usage o f the p redictor can be seemed as lim ited, its stro ng advantages m ust be underlined. It should be stressed, th a t ro b u st pred icto rs different from presented predictor one can find in the book o f R. V a 11 i a n t, et al. (2000). T h eir idea is based on the proposal o f excluding in fo rm atio n on o u tliers fo r estim a tio n pu rp o ses. T h is a p p ro a c h req u ires subjective assessm ent, which o f sam pled elem ents are outliers. T h e co n stru c tio n o f the predictor p roposed in this p ap e r does n o t require to tak e such a subjective decision.

R EFEREN CES

C a s s e l C. M. , S ä r n d a l C. E., W r e t m a n J. H. (1977), Foundations o f Inference in Survey Sam pling, John W iley & Sons, N ew York London Syd ney-T oron to.

F i s z M. (1967), R achunek prawdopodobieństw a i sta ty s ty k a m a tem a tyczn a [in Polish], PW N , Warszawa.

R o y a l l R. M . (1976), The L in e a r Least Squares Prediction Approach to T w o-Stage Sam pling, “Journal o f the A m erican Statistical A ssociation", 71, 657-664.

T h e i l H. (1979), Z a sa d y eko n o m etrii [in Polish], PW N, Warszawa.

V a l l i a n t R. , D o r f m a n A. H. , R o y a l l R. M. (2000), Finite Population Sam pling and Inference. A Prediction A pproach, John Wiley & Sons, N ew Y ork-C hich ester-W ein- h eim -B risb ane-Singap ore-T oron to.

J a n u s z W y w ia ł, T o m a s z Ż ą d ło

O P E W N Y M O D P O R N Y M N A W AR TOŚCI O D D A L O N E P R E D Y K T O R Z E W A R T O ŚC I G LO BA LN EJ W M A Ł Y M O B S Z A R Z E

R ozw ażany jest problem predykcji w artośd globalnej w dom enie przy założeniu prostego m odelu regresyjnego nadpopulacji (model regresyjny z jedną zmienną objaśniającą i bez stałej). Podjęty zostaje problem odpornej na wartości oddalone estymacji parametru funkcji regresji. Zaprezentowany estym ator parametru funkcji regresji jest m edianą wszystkich w spółczynników kierunkowych prostych przechodzących przez początek układu współrzędnych i jeden z n punktów (x, y), gdzie n oznacza liczebność próby, x - zmienną dod atkow ą, а у - zm ienną badaną. Estymator ten jest uproszczoną formą estym atora prezentowanego w: H. T h e i l (1979).

(7)

A utorzy przy asym ptotycznych założeniach wyprowadzają wzór na błąd średniokw adratowy predykcji rozw ażanego predyktora odpornego. Przedstawiony zostaje także predyktor typu DLU dla zakładanego m odelu nadpopulacji wraz z błędem średniokw adratowym predykcji. D okładność obu predyktorów zostaje porównana przy założeniu norm alności rozkładu badanych zmiennych losow ych.

Cytaty

Powiązane dokumenty

I f the addi- tional deadweight capacity is not required, advan- tage of the saving in weight might be taken i n producing a finer model, and so provide a vessel easier to

Celem pracy jest wykazanie, w  oparciu o  dane li- teraturowe dróg przenikania kadmu i  ołowiu do po- szczególnych części roślin, wykazanie zawartości tych pierwiastków

Nowe kierunki studiów | Katarzyna Nieszporek, Janusz Fyda, Małgorzata Grodzińska-Jurczak | EDUKACJA BIOLOGICZNA I ŚRODOWISKOWA 3/2016 134 KR Ó TK O NA UK A SZK OŁA..

Konieczność krzyżowa- nia się osobników zakażonych tym samym szczepem Wolbachia, na skutek powstania bariery rozrodczej, może zapoczątkować specjację (ryc. 5) (Futuyma, 2008;

Paryż od kilku stuleci był i pozostaje nadal niekwestionowaną stolicą europejskiej i światowej mody, chociaż w ostatnich czasach konkurencją w tej dziedzinie stają się dla

[r]

18 Gráfico de Google Ngram viewer que muestra las frecuencias de aparición del término destape a lo largo del tiempo en el corpus español...

Daarmee krijgt jetten Yachting een dikke pluim voor de nieuwe koers die de Friese werf na twintig jaar heeft ingezet.. Het aluminium schip is ontworpen door