• Nie Znaleziono Wyników

Usefullness of Durbin method for investigating normality in one-dimensional linear model

N/A
N/A
Protected

Academic year: 2021

Share "Usefullness of Durbin method for investigating normality in one-dimensional linear model"

Copied!
8
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FO L IA O EC O N O M IC A 141, 1997

Wiesław Wagner*

U SEFU L LN ESS O F D U R B IN M E T H O D F O R IN V E ST IG A T IN G N O R M A L IT Y IN O N E -D IM E N SIO N A L L IN E A R M O D E L

Abstract. In this paper the verification o f hypotheses o f univariate norm ality by D urbin random ized m ethod is presented.

The m ethod o f elimination o f the nuisance param eters by calculating the residual vector and connected residual vector is presented, too.

Key words: linear model, D urbin random ized m ethod, tests for norm ality o f residuals.

1. IN T R O D U C T IO N

Investigating the norm ality of one-dim ensional simple sample has been done by m any statisticians. Vast literature on the subject is given by M a r d i a (1980). M ost frequently the problem m ay be reduced to constructing tests statistics which are functions o f sample estim ators o f unknow n expectation ц and variance а г, w hich in the problem o f testing for norm ality are nuisance param eters. We eliminate them by transform ing observable random variables. One of the ways is given by D u r b i n (1961). We will extend it to the case of investigating the norm al distribution of random errors in a linear model, which perform ed on the transform ed residuals obtained from the LSM in order to get a simple sample. F o r such samples which comprise independent random variables with identical distributions we apply norm ality tests.

In the paper the D urbin m ethod for investigating norm ality o f random errors in a linear m odel is given. Its characteristic is supplem ented with assisting results.

(2)

2. LIN EA R M O D E L A N D R ESID U A LS

Let y\ nxl be a vector of n observable random variables given by a linear model

y = X ß + e, (2.1)

where: X: nxq is a known m atrix of order r (X ) = m < q < n, ß : qx 1 - a vector of unknow n constant param eters,

while e: лх1 is a vector o f unobserved random variables called random errors. The linear model is expressed by a triple (y, X ß , a 2l ) , where a 2 > 0 is unknow n and I: n x n is the identity m atrix. This m eans that E(e) = 0 and D(e) — a 2I.

The vector o f residuals from the LSM is expressed by r = фу, where Ф — I — ( X ( X ' X ) ~ X ' and ( X ' X y is the g-converse m atrix o f X' X.

F o r the vector of residuals r we have properties which proofs can be found in m any m onographies concerning linear models: r 'l = 0, r ' X = 0, r'r = у'фу = е'фе, E (r) = 0, D (r) = а 2 ф, E(rr') = Cov(y, r) = Cov(e, r) = а 2ф and s2 = r ' r / ( n - m ) is the BLUE estim ator o f a 2. The property D(r) = а 2ф states that the com ponents o f vector r are correlated and (usually) have different variances. F o r presenting the issue o f investigating norm ality of the distribution of vector e we give some assisting results which are used in the applications of the D urbin m ethod.

3. A SSISSTING R ESU LTS

Let X t , . . . , X n be a simple random sample, being a sequence o f n in-dependent values of random variable X which is assumed to have distribution N (fa a 2). By X and S 2 we denote the sample mean and unbiased estim ator o f variance from this sample, for which some properties hold: X ~ N ( p , a 2In), (n — 1)S2 ~ a 2x l - x,_ Cov(X, S 2) = 0, X and an a rb itra ry function g ( X v — X,..., X n — X ) are independent (n —1)S2 and are independent and ( X t — X)/(n — 1)S are independent.

Let us define standardized variables Ut = ( X t - X ) / S from sam ple X t , . . . , Xn. The variables t/„ i = 1...n, are not independent but we have:

C o v ( X !, Ui) — 0, Cov(X, [/,) = 0, Cov(S2, Ut) = 0.

The density function f ( u , ) = f ( u ) for random variable Ui is ( P e a r s o n , S e k a r 1936, C r a m e r 1958, p. 273).

(3)

/ ( u ) _ _ ^ . p . , ) ( n - l ) V n 9 n - z|_ ( " - 1) J

for — (n — 1)1 J n 4: и < (n — l)/\Jn, where g„ = Г(п/2).

Density (3.1) is a p articular case o f random variable U w ith the generalized beta distribution

f ( u ) = 1 + t _ ! • -Г О + q) (u - a)p Ч Ь - и ) 9 l , a < и < b, (b — a)p+q~ l Г(р)Г(Я)

when and a = — (n — 1 )/\Jn, b = ( n — 1 )Д /п and p = q — (n — 2)/2

Density (3.1) is a symmetric function, so all m om ents ab o u t the origin o f odd degree are equal 0, while the ones of even degrees are given by the form ula

E( Uk) = . (3.2)

V i! nkl2 9n + k - l

M om ents about the origin o f degree к o f standard deviation S are given by (e.g. P a w ł o w s k i 1976, p. 46)

2*/2 a

E(Sk) = k = 0’ 1’ 2> " (3 3>

4 . D U R B IN M E T H O D FO R A SIM PLE SA M PLE

Let us suppose that we verify a composite hypothesis H C th at X u ..., X„ is a simple sample from the distribution N(p, a 2), with unknow n p and a 2. Let X and S 2 be the sample m ean and unbised variance estim ator. M oreover, let us denote by ? and S'2, the sample m ean and variance from the population with the standardized norm al distribution N ( 0, 1). Thus, we have ? ~ N ( 0 , 1/n), ( n — l ) S ' 2 ~ X n ~ i and Y and S'2 are independent. W hat’s m ore, after replacing a by 1 and к by 2 we get E(S'2) = 1.

As we have already m entioned in the first paragraph the param eters p. and a 2 in the problem of investigating norm ality are nuisance param eters. D u r b i n (1961) suggests a random ization process to eliminate them. The idea of it is to consider two further random variables Y and S'2 which have the distributions m entioned earlier. According to this form ula, we determ ine such a random sample Yy,..., Yn that

(4)

We will show that sequence Y„ is a simple sample from the population with the distribution N( 0, 1). The relation (4.1) we write as

where Ui is given in point 3 and Y and S' are random variables generated independently o f X l t ...t X n.

Lemma 4.1. Random variables X , S 2, Ub ?, S'2' are pairwise independent. Proof. The independence o f X , S 2, Ut follows from the results given in point 3. O ther independences follow from assum ption that X ' and S'2 are independent o f ATl v .., X n.

Lemma 4.2. E ( ? + S ' U ,.) = 0 and D2( ? + S ' U i) = 1 Proof. From Lemma 4.1 we get

E ( ? + S' Ut) = E ( Y ) + E( S' UJ = E ( Y ) + E i S' UJ = В Д В Д ) = 0

which follows from disappearing o f the m om ents of odd orders of va-riable Ut. F rom the fact, that C o v ( Y + S ' U ^ = 0, we get for the variance

where we used E(S'2) = 1.

We have shown that the first two m om ents o f the left side expression Y, = Y + S' Ui are identical with those for the variable with distribution N( 0, 1). Now, we will give a lemma in which we will prove that variable S ’U is norm ally distributed and the first one has the chi distribution and the second has the symmetric beta distribution.

Lemma 4.3. R andom variable Z = S’U has the norm al distribution N ( 0, (n-l)/ri).

Proof. We use the result given by F i s z (1967, p . 71/ If S ’ and U are independent random variables w ith densities f ^ s ' ) and f 2(u) then the distribution o f Z = S' U is given by the density

Yt = ? + S V b i

D \ ? + S'U;) = D 2( ? ) + D 2(SU) = - + E(S'2Uf) - [E(S')E(Ui)]2 =

---j---n n

(5)

* п- 1),29п-1 2 ( n - l ) J n g n- 2

Using the density f t (sf) given, am ong others, by P a w ł o w s k i (1976, p. 45) and density (3.1) we get

m

= cic>ji.*->«p{—^ i } [ i

J " 4” * ’ =

We change the variable s' by consecutive substitutions t = (n— l ) s '2/2, v = 2(n — l ) t — nz2, w = /2(n — 1). Then we get

/ ( г ) » ' « ^ х р Г

-Com ing back to the constants C t and C2, from the definition o f the gam m a function, we get

f(z ) = 2(n — 1)("~ 1)/2 yfk g„~i 2("~4)/2(n — 1)<"~4)/2 2 (n - 1)/2^„_! ( n - 1 ) g„-2 ( n - 1 ) " -3

[ nz2 } 1 I n Г nz2

^ " - 2 ex p l 2(n 1) J ^/2ГГ\/ ЙГ-ТeXP j2 (n ^ T )

Finally, we get the density function of the distribution N ( 0, ( n - l ) / n ) i.e. random variable Z has the norm al distribution.

5. DURBIN METHOD FOR A LINEAR MODEL

In point 2 we gave the vector o f residuals from LSM for m odel (2.1). We are transform ing it further, using D u rb in ’s random ization procedure. Its use is connected with the elimination of the nuisance param eter a 2 on which the covariance m atrix o f the vector o f residuals depends. It requires the use o f a transform ation which expresses the quotient of two random variables with the chi-square distribution. On the other hand, the transfor-m ation of the vector o f residuals should give such randotransfor-m variables which are uncorrelated. Both problem s can be solved in two ways.

The idea o f the first one is to use some random variables, exactly as m any as there are unknow n nuisance param eters, in the problem s of

(6)

investigating norm ality. The problem was developed in th at direction by S a r k a d i (1960, 1967), S t o r m e r (1964), T h e i l (1968) and S a l l y and S a r k a d i (1982).

The other way considered is to extend the set o f random variables by the num ber o f them equal exactly to the num ber o f nuisance param eters. T hese additional variables are treated as generated random variables, independent o f the observed ones. Such reasoning was already presented in point 4. W hen the elements o f the sample are correlated, one should still generate a random vector with the и-dimensional N n(0, I) distribution in order to use it to eliminate correlated variables. This idea was presented by G o l u b et al. (1973) and W agner (1982, 1990).

In our considerations we apply the D urbin form ula. We have one nuisance param eter (variance a 2) and n correlated random variables being the com ponents o f the random vector r o f the residuals. We generate a random variable (n — \ )S' 2 ~ X n - i and independently of it random vector v with an arbitrary distribution with the m om ents E ( v ) = 0 and D ( v ) — I. We create a corrected vector of LSM residuals

r* = y + ( I - i t / ) v , (5.1)

where S2 = r'r/(n — m).

T o prove that the com ponents of vector r* are uncorrelated we use the result given in points 2, 3 and 4.

Lemma 5.1. E(r*) = 0 and D(r*) = I.

Proof. From independence of S'2 and vector v of vector y, we get

Cov(S', r) = Cov(S', v) = Cov(S, r) = Cov(S, v) = 0 and Cov(r, v). F o r the expectation we have E(r*) = E ^ - r^j + (I — ф)Е(у) — E ( S ' ) E ^ j . But every com ponent of vector r/S has the same distribution with expectation equal 0 according to Lem ma 4.3, i.e. E(r*) = 0. F urther, due to the earlier m entioned covariances, we get

D(r*) = + tp)D(v)(I — ф)' — E [ J rr'J

-«!■

+ 1 - Ф = ~ ^ S 2j £ (rr') + 1 - ф = ^ а 2Ф + 1 ~ Ф = 1, w hat follows from E(S ) = 1 and (3.3) at к = 2.

Given lemma shows th at the com ponents o f vector r* are uncorrelated and get a simple sample.

(7)

6. TEST IN G FO R N O R M A L IT Y O F R A N D O M E R R O R S

The result o f Lem ma 5.1 will be used to test norm ality of random errors in m odel (2.1). Let N = <t2I): ц = X ß , a 2 > 0} be a class of и-dimensional norm al distributions with the given param eters. We set the null hypothesis th at the distribution o f vector e belongs to class N, which we write as H 0: e e N against the alternative

We verify the hypothesis H 0 with vector of r* o f corrected residuals. It is the sum of two vectors. The first is created from the transform ation of observable random vector у and generated random variable w ith the chi-square distribution with n-1 degrees of freedom. If the hypothesis H 0 is true then, according to Lem ma 4.3 each of its com ponents is norm ally distributed. A bout the second vector we can assume that, in particular, it is a random vector with the distribution N„(0, I). Thus, the sum o f the two independent vectors, each norm ally distributed, gives a random vector norm ally distributed. And conversly, with the help o f Cram er Lem ma, (see e.g. R a o 1982, p. 525) assuming that vector r* is norm ally distributed and, at the same tim e, it is com posed o f the two earlier m entioned independent random vectors, then each o f them is norm ally distributed. It implies th at the vector of random errors is norm ally distributed.

The verification o f H 0 is done with the help of a simple sample, which is created by the com ponents o f vector r* and w ith the help o f test for norm ality. A t n < 50 the S h a p i r o , W i l k (1965) test is recom mended, and at 5 0 < n < | 1 0 0 t he D ’ A g o s t i n o (1971) test. The tests m entioned are om nibus tests i.e. they are both sensitive to de-partures from the sym m etry and curtosis o f the norm al distribution. They are characterized by power and their critical values are known. T he Shapiro-W ilk and S hapiro-F rancia tests have left-side critical re-gions. This m eans that the big values o f the tests statistics do n ot lead to rejection of H 0. Furtherm ore, the D ’A gostino test has a two-sided critical region. The H 0 hypothesis is not rejected with this test when the value o f the test statistic lies between the upper and lower critical values.

R EFE R E N C E S

C r a m e r H. (1958): M etody m atematyczne w statystyce, PW N, W arszawa.

D ’ A g o s t i n o R. В. (1971): An omnibus test o f normality fo r moderate and large size samples, „B iom etrika’', 58, p. 341-348.

(8)

G o l u b G. H., G u t m a n I ., D u t e r R . (1973): Examination o f pseudo-residuals o f outliers fo r detecting suporsity in the general univariate linear model, [in:] (eds) D . G. K a b e , R. P. G u p t a , M ultivariate statistical inference, N orth-H olland Publishing Company, A m sterdam .

F i s z M . (1967): Rachunek prawdopodobieństwa i statystyka matematyczna, PW N , W arszawa. M a r d i a K.. V. (1980): Tests o f univariate and multivariate normality. Handbook o f statistics,

Vol. 1, p. 279-320.

P a w ł o w s k i Z. (1976): Statystyka matematyczna, PW N, Warszawa.

P e a r s o n E. S., S e k a r C. A. (1936): The efficiency o f statistical tools and a criterion fo r the rejection o f outlying observations, „B iom etrika” , 28, p. 308-320.

R a o C. R. (1982): Modele liniowe statystyki matematycznej, PW N, W arszawa.

S a l l y L., S a r k a d i K. (1982): Beurteilung der Normaliat an Handmehrerer Stichproben kleinen Umfangs, „Q ualität und Zuverlässigkeit” , 27, p . 197-199.

S a r k a d i K. (1960): On testing fo r normality, Publ. M ath. Inst. Hungar. Acad. Sd. 5, p. 269-275. S a r k a d i K. (1967): On testing fo r normality, Proc. 5th Berkeley Symp. M ath. Statist.

Prob. 1, p. 373-387.

S h a p i r o S. S., F r a n c i a R. S. (1972): Approximate analysis o f variance test fo r normality, JA SA , 67, p. 215-216.

S h a p i r o S. S., W i l k M . B. (1965): A n analysis o f variance test fo r normality (complete samples), „B iom etrika” , 52, p . 591-611.

S t o r m e r H. (1964): Ein Test zun Erkennen von Normalverteilungen, Z. f. W ahrscheinlich-keitstheorie, 2, p. 420-433.

T h e i l H. (1968): A simplification o f the B L U S procedure fo r analyzing regression distribuance, JASA, 63, p. 242-251.

W a g n e r W. (1982): Testing fo r normality o f errors in linear models, Studia Sei. M ath. H ungarica, 17, p . 393-401.

W a g n e r W. (1990): Test normalności wielowymiarowej Shapiro-W ilka i jego zastosowania w doświadczalnictwie rolniczym, Roczniki AR w Poznaniu, Rozprawy N aukowe, 197, Poznań.

Wiesław Wagner

PR ZY D A T N O ŚĆ M E T O D Y D U R B IN A D O B A D A N IA N O R M A LN O ŚC I W JED N O W Y M IA R O W Y M M O D E L U LIN IO W Y M

W pracy przedstaw iono metodę randam izacyjną D urbina do testow ania norm alności błędów losowych.

Zaprezentow ano również metodę eliminacji param etrów zakłócenia poprzez obliczanie w ektora resztowego i skorygowanego w ektora resztowego.

Cytaty

Powiązane dokumenty

The density of a three-parameter gamma distribution of a random variable X which is of the form (1) is equal to the density of the infinite product f[£Li xk of indepedent

In the following by N we shall denote a positive integer-valued random variable which has the distribution function dependent on a parameter 2(2 &gt; 0) i.e.. We assume that

Schulzer, Die Benützung der Schriften „ De monogamia ” und „De ieiunio ” beiHeronymus „Adversus Iovinianum ”, „N eue Jahrbücher für deutsche Theologie” III (1894),

Przemówienie wygłoszone podczas

Losonczi [9] proved the stability of the Hosszú equation in the class of real functions defined on the set of all reals and posed the problem of the stability of this equation in

The application of the formal series approach to the Poisson production function leads to (a) a modular-like functional equation for the moment generating function of W and (b)

Our main result implies that the most heavy- tailed power law wins, that is, the degrees in the resulting graph will follow a power law with the same exponent as the initial degrees