• Nie Znaleziono Wyników

Properties of the Cox Consistency Test in the Case of Income Distribution Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Properties of the Cox Consistency Test in the Case of Income Distribution Analysis"

Copied!
10
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

F O L IA O E C O N O M IC A 194, 2005

Al i na J ę d r z e j c z a k *

P R O PE R T IE S OF TH E COX C O N SIST E N C Y T EST IN T H E C ASE OF INCO M E DIST R IBU TIO N ANA LY SIS

Abstract

T estin g the consistency o f theoretical incom e d istrib u tio n s w ith the em pirical o n es is a very im p o rta n t p ro b lem in incom e d istrib u tio n analysis. M o st o f well k n o w n g o o dness-of-fit tests c a n n o t be used to solve th is pro b lem because the p a ra m ete rs o f the p o p u la tio n are usually n o t k n o w n an d th e sam ples a re very large.

In th e p a p e r we p resen t the m ain p ro p e rties o f the C ox statistic w hich is based on likelihood ra tio . T h e p resented results were o b tain ed by m ea n s o f the M o n te C a rlo experim ent. T h e th eo re tic al d istrib u tio n s m o st often used in incom e d istrib u tio n analy sis as th e g am m a, lo g n o rm a l, D a g u m an d S in g h -M a d d ala were tak e n in to c o n sid era tio n .

Key words: statistical inference, incom e d istrib u tio n , consistency test.

I. IN T R O D U C T IO N

T esting the consistency o f em pirical d istrib u tio n s w ith the theoretical ones is a very im p o rta n t problem in wage and incom e d istrib u tio n analysis. A lot o f consistency tests have been proposed in the literatu re. T a k in g into co n sid eratio n the co n stru c tio n o f a test statistic, they can be divided into the follow ing groups:

- tests based on the com parison o f density functions,

- tests based on the com parison o f cum ulative d istrib u tio n functions, - tests con structed on the basis o f positional statistics,

- tests based on th e m om ents o f prob ab ility d istrib u tio n s.

In spite o f a great variety o f consistency tests, one can be faced with m a n y p ro b lem s try in g to apply them to the analy sis o f incom e d ist­ ributions.

(2)

F irstly , the last tw o groups com prise only the no rm ality tests, so they can hardly ever be used in incom e d istrib u tio n analysis. Secondly, the p o p u latio n param eters arc usually no t know n, w hat lim its th e possibilities o f the application o f K olm ogorov Smirnow statistic based on the com parisons o f cu m ulativ e d istrib u tio n functions. M oreover, it has been proved th a t for very large sam ples the well know n Pearson %2 statistic rcjccts the null hypothesis even when the discrepancies between the distributions arc negligible. T he sam ples com ing from the H ousehold Budgets o r L ab o u r F o rce Surveys, being the m ain source o f inform ation on personal incom e, are usually very large.

II. T H E C O X S T A T IS T IC S

C ox (1961) developed a general large-sam ple test p ro c ed u re to verify the com posite hypothesis ab o u t the consistency o f a d istrib u tio n with a theoretical one. T his procedure was based on a m odification o f the Neyman- P earson m axim um likelihood ratio. It was proposed to o b ta in high pow er for a com p osite alternative hypothesis th a t a d istrib u tio n is differen t from th e one indicated in the null hypothesis.

Suppose th a t the observed value o f a ran d o m vector Y = (У 15..., Y„) is to be used to test the null hypothesis, H f , th a t th e p ro b a b ility density function is f ( y , 0 ) , w here a is an unknow n vector o f p aram eters. Let it be required to o b ta in high pow er for the alternative hypothesis H , th a t the p ro b a b ility density fu n c tio n is g(y, rj), w here t) is u n k n o w n vector o f param eters. It is w o rth m entioning, th a t the hypothesis H g serves only to in dicate the type o f alternative fo r which high pow er is req uired. W hat is im p o rtan t, / (y, 0) and g(y, rj) are separate families o f distrib u tio n s. T h a t m eans th a t for an arb itra ry p aram eter 0o, the density function f { y , 0) cann ot be approxim ated by g(y, tj) arbitrarily closely. W hen the families o f considered prob ab ility density functions are no t separate, it is adv isable to use the likelihood ra tio test.

T h e test statistic proposed by C ox is the following:

T f — Lj-(ß) — L t f ) - E e[Lj(ß) - L g(fjj\ (1)

where: L f (6), L g(f/) denotes m axim um log likelihoods un d er H f or Hg respectively.

W hen H f is true, T f statistic is asym ptotically no rm ally d istrib u te d with expected value equal to zero:

(3)

7 , - a s N (0, D ( T j ) )

I f the roles o f I I f and I I g as null and alternative hypothesis are interchanged we o b tain a test statistic given in th e form:

where: T g ~ as N ( 0 , D ( T g)).

T he statistics T f and T g arc, in general, different functions o f observations. U nder П г, T g should be approxim ately zero, w hereas und er H g, T f should be negative. H ence, using T { statistic three decisions are possible:

- rejection o f I I f in direction o f H g, w hen T f is significantly different from zero and negative,

- rejection o f H f aw ay from H g, w hen T f is significantly different from zero and positive,

- no reasons for rejection o f H f when T f is near zero.

III. D IS T R IB U T IO N A N D P R O P E R T IE S O F T H E C O X S T A T IS T IC F O R S E L E C T E D P A IR S O F D IS T R IB U T IO N S

T h e aim o f o u r research was to investigate th e p ro p e rtie s and the d istrib u tio n o f the C ox statistic. We to o k in to co n sid eratio n the p ro bability density fu nctions m o st often used in th e analysis o f incom e d istribu tion s. T h e problem s we were particularily interested in were the following:

- researching the asym ptotic d istrib u tio n o f the test statistic fo r selected pairs o f theo retical distrib u tio n s,

- assessing the influence o f sam ple size on a test decision, - ev alu atin g the test pow er for selected alternative d istrib u tio n s.

In all the experim ents the H f hypothesis stated th a t th e d istrib u tio n is o f the D agum type. T h e cum ulative distrib u tio n fu n ction o f the D agum d istrib u tio n s can be w ritten in the form (D agum , 1977):

where: X, ß, ô - d istrib u tio n param eters.

As alternative distributions gam m a, lognorm al and B urr type X II distribution were used. T h e gam m a density function is the following:

T g = L g (i,) — L f ( 6 ) — E , [Lg 0j0 - L f (6)\

(

2

)

F{y) = ( \ + X y - > y l i (

3

)

(4)

while the lognorm al density curve takes th e form :

^ ( l n y - ^ H , У > 0

where: ц, a - d istrib u tio n param eters.

T h e B urr type X II distrib u tio n function, intro du ced to the incom e distrib u tio n analysis by Singh and M ad d ala (1976), can be w ritten as follows:

where: a x, a 2, a 3 - d istrib u tio n param eters.

In case o f th e theoretical d istributions m entioned above, th e com p u tatio n o f th e test statistic and its stan d ard e rro r is n o t a trivial m a tte r. Hence, we had to carry o u t th e M o n te C arlo experim ents. T h e m ean and the stan d ard d eviation o f T f statistic were obtained by m ean s o f an experim ent on the assu m p tio n th a t H f is tru e and the em pirical m axim um likelihood estim ate is the tru e param eter.

The stages of the M onte C arlo experim ent. T h e first stage consisted o f generating a ran d o m sam ple u n d er the hypothesis th a t H f is tru e (f (y) is the density fu nction o f the D agum d istribu tion ). T h e gen erating was m ad e by m ean s o f the inversion o f the D agum cum ulative d istrib u tio n function. We dealt with a ran d o m sam ple w here individual o bserv atio n s were g ro u ­ ped into intervals. T h e sam ple sizes were th e following: 200, 500, 1000, 2000, 5000.

T h e second stage was to calculate th e m axim um likelihood estim ators for the D agum and alternative distribution s (gam m a, lo gn orm al, Singh- M ad d a la ). T he density o f an altern ative d istrib u tio n is d en o ted by g(y).

T h e third step was evaluation, on the basis o f N repetition s o f the first tw o steps, the expected value and the variance o f th e follow ing statistic:

T* — L m — LgW)

(7)

O n the basis o f the experim ent we obtain ed th e p aram eters and the histogram s o f the C ox statistic d istrib u tio n for different sam ple sizes. They are presented in T ables 1-3 and on figures.

(5)

T able 1. E xpected values and s ta n d ard d ev iatio n s o f T * statistic (N = 500)

Sam ple size

A lternative d istrib u tio n

lognorm al g am m a B urr type X II

E (T *) D (T *) £ (T * ) D (T *) E(T*f ) D (T *) 200 0.0102 0.0132 0.0384 0.0249 0.0054 0.0078 500 0.0085 0.0070 0.0361 0.0139 0.0034 0.0031 1000 0.0081 0.0050 0.0359 0.0099 0.0030 0.0019 2000 0.0078 0.0036 0.0355 0.0073 0.0028 0.0014 5000 0.0076 0.0023 0.0352 0.0046 0.0027 0.0008 Source: A u th o r’s calculations.

T able 2. E xpected values and sta n d a rd d ev iatio n s o f T * statistic ( N = 1000)

Sam ple size

n

A ltern ativ e d istrib u tio n

lognorm al g am m a B urr type X II

E (T * ) D ( T *) £ ( 7 7 ) 0 ( 7 7 ) £ ( 7 7 )

0

(

77

) 200 0.0099 0.0125 0.0377 0.0239 0.0053 0.0075 500 0.0085 0.0071 0.0364 0.0140 0.0035 0.0030 1000 0.0080 0.0057 0.0358 0.0103 0.0030 0.0019 2000 0.0077 0.0042 0.0352 0.0071 0.0028 0.0013 5000 0.0076 0.0022 0.0353 0.0046 0.0027 0.0008 Source: A u th o r’s calculations.

fab le 3. Stan d ard ized values o f C ox statistic T r

Sam ple size

A ltern ativ e d istrib u tio n

log n o rm al g am m a B urr type X ll

N = 500 N = 1000 N = 500 N = 1000 N = 500 N = 1000 200 0.80 0.64 - 0.61 - -0 .1 4 500 1.21 1.39 - 2.73 - 0.24 1000 1.62 1.82 - 1.60 - 0.64 2000 2.17 2.54 - 2.38 - 1.06 5000 3.29* 4.88* - 3.70* - 1.80

(6)

-0.02 -0.01 0.00 0.01 0.02 Tf

Figure 1. E m pirical d istrib u tio n o f T f statistic (n = 500)

о с D er о V—* о,

A

-0.02 -0.01 -0.01 -0.00 0.00 0.00 0.01 0.01 0.02 T,f Figure 2. E m pirical d istrib u tio n o f T f statistic (n = 1000)

(7)

о о <M о CD О $ 0 -0.01 -0.01 -0.00 0.00 0.00 Ö.01 0.01 T f

Figure 3. E m pirical d istrib u tio n o f T f statistic (n = 2000)

о см о о (М о (D

(8)

T h e sccond M o n te C arlo procedure was focused on the evaluation of the C ox test pow er for selected alternative distribu tion s. F irst, we had to generate a ra n d o m sam ple assum ing th a t th e d istrib u tio n is: a) lognorm al, b) g am m a, c) B urr type X II.

T hen the value o f T f statistic was calculated from this sam ple, using the value E ( T J ) estim ated in the previous experim ent (T ables 1 o r 2). F o r the calcu lation o f the standardized value o f T f statistic, stan d ard deviations calculated in the first experim ent were necessary. W hen \ T fsrAND\ > T a we rejected the null hypothesis th a t the general p o p u latio n d istrib u tio n is of the D agum form .

A fter N repetitions o f the experim ent we got the em pirical test power. It was calculated as the num bers o f good decisions (rejections o f H 0) divided by the to ta l n u m b e r o f ra n d o m sam ples g en erated from the a p p ro p ria te alternative d istribution. T h e results are presented in T ables 4-5.

'ta b le 4. C ox test pow er fo r selected altern ativ e d istrib u tio n s (N = 500)

a 0.1 0.05 0.025 0.1 0.005 L o g n o rm a l d istrib u tio n 200 0.350 0.188 0.106 0.032 0.006 500 0.758 0.616 0.504 0.336 0.246 1 000 0.964 0.910 0.836 0.378 0.660 2 000 1.000 1.000 1.000 1.000 1.000 5 000 1.000 1.000 1.000 1.000 1.000 G a m m a d istrib u tio n 200 0.978 0.948 0.908 0.830 0.754 500 1.000 1.000 1.000 0.998 0.988 1 000 1.000 1.000 1.000 1.000 1.000 2 000 1.000 1.000 1.000 1.000 1.000 5 000 1.000 1.000 1.000 1.000 1.000

B u rr type X II d istrib u tio n

200 0.104 0.066 0.050 0.032 0.024

500 0.182 0.142 0.086 0.050 0.038

1 000 0.408 0.254 0.158 0.114 0.086

2 000 0.718 0.524 0.336 0.200 0.150

(9)

Table 5. C ox test pow er fo r selected altern ativ e d istrib u tio n s (N = 1000) a n ... 0.1 0.05 0.025 0.1 0.005 L o g n o rm al d istrib u tio n 200 0.362 0.218 0.119 0.047 0.020 500 0.780 0.643 0.516 0.357 0.252 1 000 0.935 0.846 0.747 0.594 0.483 2 000 0.997 0.991 0.976 0.921 0.876 5 000 1.000 1.000 1.000 1.000 1.000 G a m m a d istrib u tio n 200 0.976 0.951 0.917 0.848 0.784 500 1.000 1.000 1.000 0.999 0.998 1 000 1.000 1.000 1.000 1.000 1.000 2 000 1.000 1.000 1.000 1.000 1.000 5 000 1.000 1.000 1.000 1.000 1.000

B u rr type X II d istrib u tio n

200 0.096 0.072 0.048 0.030 0.021 500 0.200 0.139 0.092 0.057 0.45 1 000 0.427 0.286 0.187 0.123 0.090 2 000 0.792 0.624 0.454 0.289 0.204 5 000 0.995 0.985 0.959 0.901 0.826 Source: A u th o r’s calculations. IV. C O N C L U S IO N S

F igures 1-4 present em pirical frequencies o f T f statistic for different sam ple sizes. T hey w ere calculated on the basis o f th e M o n te C arlo experim ent conducted for the D agum ( f (y)) and logn orm al (g(y)) dist­ ributions. It can be easily noticed th a t with increasing sam ple size the shapes o f th e p resented em pirical d istrib u tio n s becom e sim ilar to the n orm al d istrib u tio n . S im ultaneously, the stan d ard dev iation s o f the con ­ sidered test statistic arc very sm all, tending to zero as the sam ple size increase (T ables 1-2) and as discrepancies betw een the com pared d ist­ rib u tio n s becom e relatively small (the ease o f D agum and B urr type X ll d istribution ).

(10)

T h e results o f the second experim ent are presented in T ables 4 -5 . They show em pirical test pow er calculated for the selected pairs o f distributions. T h e best results were obtained when the altern ativ e d istrib u tio n was the gam m a curve - em pirical test pow er are near one fo r all sam ple sizes. Very high test pow er fo r relatively sm all sam ples was obtained also for the lognorm al d istrib u tio n . F o r the B urr type X II d istrib u tio n em pirical test pow er tends to one for sam ple size 5000 and m ore. It can be explained by high sim ilarity a t the com pared distributions.

S um m ing up, one can say, th a t the Cox statistic can be useful in the analysis o f incom e distributions in Poland. T he test pow er for the alternatives m o st often used is very high for relatively sm all sam ples.

R E F E R E N C E S

C ox D . R . (1961), Test o f Separate Families o f H ypothesis, Proceedings o f 4 lh Berkeley Sym p., p p . 105-123.

D a g u m C .(1977), A new m odel o f personal incom e d istrib u tio n . S pecification and estim ation,

Econom ie Apliqueé, 413-436.

D o m a ń sk i Cz. (1990), T esty sta tystyczn e, P W E , W arszaw a.

Singh S., M a d d a la G . (1976), A fu n ctio n fo r size d istrib u tio n o f incom e, Econom etrica, 44, 963-970.

S ingh S., M a d d a la G . (1977), E stim atio n p ro b lem s in size d istrib u tio n o f incom e, Econom ie

A pliqueé, 4 6 1-479.

Alina Jędrzejczak

W Ł A S N O Ś C I T E S T Ó W Z G O D N O Ś C I С О Х Л

W P R Z Y P A D K U B A D A N IA Z G O D N O Ś C I R O Z K Ł A D Ó W D O C H O D Ó W Streszczenie

W analizie rozkładów płac i dochodów istotnym problem em jest b adanie zgodności rozkładów em pirycznych z teoretycznym i. W iększość znanych testów zgodności nie m oże być sto so w an a d o b a d a n ia teg o zag ad n ien ia ze względu n a fa k t, że p a ra m e try zbiorow ości generalnej nie są n a o g ó ł znane, a ro zw ażan e p ró b y są często b ard zo liczne.

W arty k u le p rzedstaw ione zostały po d staw o w e w łasności testu zgodności C oxa o p a rteg o n a ilo razie w iarygodności. P rezen to w an e w yniki o trzy m an o m eto d ą M o n te C a rlo . R ozw ażane były ro zk ład y teo rety czn e najczęściej w ykorzystyw ane w an alizie p łac i d o c h o d ó w : gam m a, lo g ary tm iczn o -n o rm aln y , D a g u m a i S in g h a -M a d d ali.

Cytaty

Powiązane dokumenty

It follows from the main result of this paper that one encounters serious problems even in the classical situation of rotations of the unit circle equipped with the Lebesgue

Losonczi [9] proved the stability of the Hosszú equation in the class of real functions defined on the set of all reals and posed the problem of the stability of this equation in

in this case, the type I error is higher than the significance level assumed for each simple test..... ANOVA test

Therefore the quasi-homogeneity of N implies the existence of a smooth (analytic) vector field V which is tangent to N and has positive eigenvalues at the singular point 0.. Is

• Generalized Large Cardinals: This is a large coherent family of Axioms including such axioms as generic huge embeddings with critical point ω 1. • An apparently isolated example:

Experimental data and many successful calculations in different versions of the interacting model confirm that for low excitation energy the nucleon pairs can be

The population ageing is not a worldwide problem yet, it concerns mostly world developed countries. On the other hand, it is a known fact that the population ageing

However, there is a difference in the final objective of facilitating decision support sessions and facilitating simulation games: the former focusses on the creation of