• Nie Znaleziono Wyników

Using this construction the tests for exponentiality and normality were presented in detail

N/A
N/A
Protected

Academic year: 2021

Share "Using this construction the tests for exponentiality and normality were presented in detail"

Copied!
18
0
0

Pełen tekst

(1)

ON TWO FAMILIES OF TESTS FOR NORMALITY WITH EMPIRICAL DESCRIPTION OF THEIR PERFORMANCES

Dominik Szynal Department of Economics Wydzia l Zamiejscowy KUL

Ofiar Katynia 6, 37–450 Stalowa Wola, Poland e-mail: szynal@poczta.umcs.lublin.pl

and

Waldemar Wo ly´nski

Faculty of Mathematics and Computer Science Adam Mickiewicz University of Pozna´n Umultowska 87, 61–614 Pozna´n, Poland

e-mail: wolynski@amu.edu.pl

Abstract

We discuss two families of tests for normality based on characterizations of continuous distributions via order statistics and record values. Simula- tions of their powers show that they are competitive to widely recommended tests in the literature.

Keywords: order statistics, record values, U -statistics, normal distribu- tions, exponential distributions, characterizations, goodness-of-fit tests, pow- ers.

2010 Mathematics Subject Classification: Primary: 62G10; Secondary:

62G30.

1. Introduction

The large literature is devoted to testing and in particular tests for exponentiality and normality. Furthermore, there are many methods and techniques to construct

(2)

goodness-of-fit tests. We are interested here in tests for normality which were studied, among other things, in D’Agostino nad Stephens [3], Rayner and Best [8], Thode [12], Kallenberg and Ledwina [4], Caba˜na and Caba˜na [1].

Goodness-of-fit tests based on characterizations of continuous distributions via expected values of two functions of order statistics or record values and U - statistics were constructed in Morris and Szynal [5, 6], respectively. Using this construction the tests for exponentiality and normality were presented in detail.

An empirical description of the performances of those tests was given in Szynal and Wo ly´nski [10, 11]. In this paper we describe how to proceed in order to test the sample (X1, . . . , Xn) for normality using the test statistics constructed by characterization via order statistics (see Morris and Szynal [5], we call them O-tests) and record values (see Morris and Szynal [6], we call them R-tests). We should mention that the method of characterization of distribution was applied to construct tests for normality by Cs¨org˝o, Seshadri and Yalovsky [2].

For an empirical comparison of the performances of our tests (O-tests, R- tests) we use tests and alternatives choosing from Kallenberg and Ledwina [4]

and Caba˜na and Caba˜na [1] (Tables 1a and 1b).

We discuss the following omnibus tests:

SW : The Shapiro-Wilk test in [1], AD : The Anderson-Darling test, D0.5: The BHEP test Dβ for β = 0.05,

D1: The BHEP test Dβ for β = 1, D3: The BHEP test Dβ for β = 3, SW: The Shapiro-Wilk test W in [4],

WS: The data driven smooth test statistic,

WS1: The data driven smooth modified test statistic,

WS2: The data driven smooth test statistic without ”adjustment”, LRk: The LaRiccia test focused on kurtosis,

LRs: The LaRiccia test focused on skewness,

KC: The Caba˜na and Caba˜na test K based on TEEP, SC: The Caba˜na and Caba˜na test S based on TEEP, KC:˜ The Caba˜na and Caba˜na test ˜K based on TEEP,

SC:˜ The Caba˜na and Caba˜na test ˜S based on TEEP.

BHEP is refered to Baringhaus and Heinze, Epps and Pulley (see [1]).

TEEP is refered to Transformed Estimated Empirical Process (see [1]).

(3)

Symmetric alternatives:

1. SB(0, 0.5).

2. T ukey(1.5).

5. T ukey(0.7).

15. Logistic(0.1).

17. T ukey(10).

20. SC(0.05, 3).

22. SC(0.2, 5).

25. SC(0.05, 5).

27. SC(0.05, 7).

28. SU (0, 1).

Skew alternatives:

40. SB(1, 1).

41. LO(0.2, 3).

44. W eibull(2).

45. LO(0.1, 3).

46. χ210.

47. LO(0.05, 3).

48. LO(0.1, 5).

49. SU (−1, 2).

50. χ24.

52. LO(0.05, 5).

54. LO(0.05, 7).

57. SU (1, 1).

58. LN (0, 1).

The alternatives considered are:

SB(γ, δ) – Johnson’s SB distribution: the law of exp(X−γδ )

1+exp(X−γδ ), X ∼ N(0, 1),

T ukey(λ) – Tukey’s distribution: the law of Uλ− (1 − U)λ, U uniform on [0, 1], Logistic(θ) – logistic distribution: the law of 1θlog1−UU , U uniform on [0, 1], SC(p, λ) – scale contaminated distribution:

f (x) = (2π)−1/2 [

(1− p) exp (

−x2 2

) +

(p λ

) exp

(

−x2λ−2 2

)]

, −∞ < x < ∞, SU (γ, δ) – Johnson’s SU distribution: the law of sinh

(X−γ δ

)

, X∼ N(0, 1), LO(p, µ) – location contaminated distribution:

f (x) = (2π)−1/2 [

(1− p) exp (

−x2 2

)

+ p exp (

−(x− µ)2 2

)]

, −∞ < x < ∞, W eibull(θ) – Weibull distribution with parameters (1, θ),

χ2n – chi-squared distribution with n degrees of freedom, LN (γ, δ) – lognormal distribution: the law of exp

(X−γ δ

)

, X ∼ N(0, 1).

(4)

Table1a.(Source:KallenbergandLedwina[4]andCaba˜naandCaba˜na[1])Estimatedpowers(in%)ofSW,AD,D0.5, D1,D3,SW,WS,WS1,WS2,LRk,LRs,KC,SC,˜ KCand

˜ SC. Alt.Tests SymmetricnSWADDDDSWWWWLRkLRsKCSC0.513SS1S2

˜ KC

˜ SC 12022343274144363426565422412 SB(0,0.5)5088928878999679355994927933 2209202152526201914372262262 Tukey(1.5)5058714646792347426942763771 5204221714121198122131131 Tukey(0.7)502340234376213459721451471 152013101411612101311111013131313 Logistic(0.1)50241620161113132112231227202720 1720849348719282828785852659456245 Tukey(10)50991007698100999910099971891649265 202071177519171916678778 SC(0.05,3)5012710863125382410813111311 2220232422181071657465191524222423 SC(0.2,5)50463036331895929892422149364936 252012141210736333732101013121313 SC(0.05,5)5023141914862556655191525212622 272016181614845424642141318161717 SC(0.05,7)50342129231274707770292036313631 2820474442423143364738443046434743 SU(0,1)50817572766568618161824082678265 Av.2023.729.016.722.223.939.035.238.533.729.412.026.216.326.316.7 Av.5048.846.627.645.341.369.552.969.350.356.714.153.626.154.225.5

(5)

Table1b.(Source:KallenbergandLedwina[4]andCaba˜naandCaba˜na[1])Estimatedpowers(in%)ofSW,AD,D0.5, D1,D3,SW ,WS,WS1,WS2,LRk,LRs,KC,SC,˜ KCand

˜ SC. Alt.Tests SkewnSWADDDDSWWWWLRkLRsKCSC0.513SS1S2

˜ KC

˜ SC 40202428242721312917296308231021 SB(1,1)5069686973558172577112766651357 4120232526292031271928624923921 LO(0.2,3)505565656953606852696607591052 4420131315141015151016517816915 Weibull(2)5036323936194141294174611391536 4520262428251325242126112518281628 LO(0.1,3)50575161553250585158185732613160 462041402624132523182694822432241 2 10χ50858161533057614862129040864684 472020172017918171817121718221722 LO(0.05,3)50423042331632333734233636433644 4820777578755376727273336955785078 LO(0.1,5)50999799989198979897579788998499 4920222023211222192021132118241723 SU(−1,2)50453747422437434042234335463646 5020494851483253513852105626502748 2 4χ50939093917595948693149649925591 5220563457513155485449354751574857 LO(0.05,5)50897987826085798778728088888889 5420666566665365646563536363666166 LO(0.05,7)50939193918392919290909093939393 5720737173736173736873737354736173 SU(1,1)50989898989496989798749789989297 5820919190918494918592369364906488 LN(0,1)5010010010010010010010010010010094100100100100 Av.2044.742.444.443.231.744.842.538.843.523.244.831.845.631.644.7 Av.5073.970.773.470.856.371.171.967.271.839.174.051.874.553.872.9

(6)

2. The family of O-tests

Let (X1, . . . , Xn) be a random sample from continuous distribution function F . To verify the null hypothesis H0 : X∼ N(µ, σ), µ ∈ R, σ ∈ R+= (0,∞), i.e.,

F (x) = 1

√2πσ

x

−∞e−(t−µ)2/2σ2dt,

there are proposed (cf. Morris and Szynal [5]) the five tests ˆD(r,m)n , ˆD(r,m)n;c1 , ˆDn;c(r,m)2 ,

Dˆ(r,m)n;c3 , ˆD(r,m)n;c4 constructed via the characterizations of continuous distributions

in terms of the expectations of two functions of order statistics, where r > −1, m∈ N. Write

ˆ

µ = ¯Xn= 1 n

n i=1

Xi, σˆ2 = Sn2 = 1 n

n i=1

(Xi− ¯Xn)2,

Z ∼ N(0, 1), ϕ(x) = 1

√2πe−x2/2, Φ(x) =

x

−∞ϕ(t)dt.

The particular procedure in a construction of the above test-statistics uses the following quantities (cf. Morris and Szynal [5]):

1 + R(r,m)n =

m i=1

(

1 m n− 1

)

+ m!(2m + 2r + 1) (m + 1)m

i=1(n− i)

·

m+1

j=2

(m + 1 j

)(n− m − 1 m + 1− j

) j

2m + 2r + 2− j (cf. Morris and Szynal [5], p. 86),

K(r,m) = E2[ϕ(Z)Φm+r−1(Z)] +1

2E2[Zϕ(Z)Φm+r−1(Z)]

(cf. Morris and Szynal [5], p. 90, 98).

They are appear in the following test-statistics which are simple but their construction is not easy.

The test-statistics contain the quantities

a(r,m)n1 = (m + r)2(1− (m + r + 1)2(2m + 2r + 1)K(r,m))

n(m + r + 1)2(2m + 2r + 1)

b(r,m)n1 = r(m + 1)(m + r)(1− (m + r + 1)2(2m + 2r + 1)K(r,m))

n(m + r + 1)2(2m + 2r + 1)

(7)

c(r,m)n1 = r2(m + 1)2(1− (m + r + 1)2(2m + 2r + 1)K(r,m)+ R(r,m)n ) n(m + r + 1)2(2m + 2r + 1)

(r,m)n1 = det

[

a(r,m)n1 b(r,m)n1

b(r,m)n1 c(r,m)n1

]

(cf. Morris and Szynal [5], p. 90).

The tests for normality are as follows

Dˆ(r,m)n = 1

(r,m)n1

c(r,m)n1

( 1 n

n i=1

Φm+r

(Xi− ¯Xn

Sn )

1

m + r + 1 )2

− 2b(r,m)n1

( 1 n

n i=1

Φm+r

(Xi− ¯Xn

Sn )

1

m + r + 1 )

× (

( 1n

m+1

)

n i=m+1

(i− 1 m

) Φr

(Xi:n− ¯Xn

Sn )

m + 1 m + r + 1

)

+ a(r,m)n1

( ( 1n

m+1

)

n i=m+1

(i− 1 m

) Φr

(Xi:n− ¯Xn

Sn )

m + 1 m + r + 1

)2

= ˆD(r,m)n;c1 + ˆD(r,m)n;c2 = ˆDn;c(r,m)3 + ˆD(r,m)n;c4 ,

where

Dˆ(r,m)n;c1 = 1

a(r,m)n1

[ 1 n

n i=1

Φm+r

(Xi− ¯Xn Sn

)

1

m + r + 1 ]2

Dˆ(r,m)n;c2 = 1

(r,m)n1 a(r,m)n1

[

a(r,m)n1 1

( n

m+1

)

n i=m+1

(i− 1 m

) Φr

(Xi:n− ¯Xn Sn

)

− b(r,m)n1 1

n

n i=1

Φm+r

(Xi− ¯Xn

Sn )

(

a(r,m)n1 m + 1

m + r + 1 − b(r,m)n1 1 m + r + 1

)]2

Dˆ(r,m)n;c3 = 1

c(r,m)n1

[ ( 1n

m+1

)

n i=m+1

(i− 1 m

) Φr

(Xi:n− ¯Xn

Sn )

m + 1 m + r + 1

]2

(8)

Dˆn;c(r,m)4 = 1

(r,m)n1 c(r,m)n1

[

c(r,m)n1 1

n

n i=1

Φm+r

(Xi− ¯Xn Sn

)

− b(r,m)n1 1

( n

m+1

)

n i=m+1

(i− 1 m

) Φr

(Xi:n− ¯Xn Sn

)

(

c(r,m)n1 1

m + r + 1 − b(r,m)n1 m + 1 m + r + 1

)]2 .

3. Simulation results for powers of the O-tests

For an empirical comparison of the performances of O-tests with widely rec- ommended tests we have choosen alternatives and tests studied in Caba˜na and Caba˜na [1]. Samples of size 20 and 50 are taken with m = 1, 2, 3, 4, 5 and r = −0.99, −0.95, −0.9, −0.7, −0.5, −0.3, −0.1, 0.1, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5. For symmetric alternatives we took additionally r = 3.5, 4.0, 4.5, 5.0 and for skew al- ternatives r = 0.3, 0.6. Critical values were simulated using 100 000 samples and associated powers were obtained using 100 000 samples but only some results are presented here (all simulations are available at W. Wo ly´nski).

For samples of size n = 20 we included simulations for some favorable om- nibus tests under symmetric alternatives with Av. powers≥ 36.5 and under skew alternatives with Av. powers ≥ 45.5 (Tables 2a and 2b).

For samples of size n = 50 we included simulations for some favorable om- nibus tests under symmetric alternatives with Av. powers≥ 68.5 and under skew alternatives with Av. powers ≥ 72.0 (Tables 2a and 2b).

4. The family of R-tests

Goodness-of-fit tests derived from characterizations of continuous distributions via record values were given, among other things, in Morris and Szynal [6].

The test statistics for exponentiality and normality were discussed in Morris and Szynal [6], Szynal [9], Szynal and Wo ly´nski [10, 11].

The aim of Section 4 is to give empirical description of performances of tests for normality presented in Morris and Szynal [6] which we call R-tests. To do a comparison R-tests with widely recommended tests we have choosen tests and alternatives studied in Caba˜na and Caba˜na [1] (as it was done in Section 2 and 3).

The construction of R-tests presented in Morris and Szynal [6] is not easy but the test-statistics have simple forms. We use here the quantities and the test statistics introduced in Morris and Szynal [6].

(9)

Table2a.Powersof5%O-testsundersymmetricalternativesbasedon100000sampleswithAv.36.5forn=20orAv.68.5 forn=50. m112234444555 Tests

(r,m)ˆ Dn;c1 (r,m)ˆ Dn (r,m)ˆ Dn;c1 (r,m)ˆ Dn (r,m)ˆ Dn (r,m)ˆ Dn;c1 (r,m)ˆ Dn (r,m)ˆ Dn;c1 (r,m)ˆ Dn (r,m)ˆ Dn;c1 (r,m)ˆ Dn

(r,m)ˆ Dn;c1 Alt.Symm.n/r3.53.52.52.51.50.50.50.750.75-0.7-0.7-0.5 120444444444344434443454444 SB(0,0.5)50939393949493949192949493 220313131313031303030313031 Tukey(1.5)50818181828281818079828281 520181919191819181918191818 Tukey(0.7)50575857585857575656575757 1520111011101011111111111111 Logistic(0.1)50201920202020202020202020 1720616261626261636061626361 Tukey(10)50959795979795979496969895 2020171717171717171818171717 SC(0.05,3)50333333333333333434333333 2220666565656566666666656566 SC(0.2,5)50979797979797979797979797 2520343333333333333434333333 SC(0.05,5)50626262626262626262616262 2720434343434343434444434343 SC(0.05,7)50747474747474747474737474 2820403940393940394039394040 SU(0,1)50787778777878777878777777 Av.2036.436.436.436.436.236.436.336.636.436.536.536.4 Av.5068.969.269.069.469.468.969.268.468.868.969.268.9

(10)

Table2b.Powersof5%O-testsunderskewalternativesbasedon100000sampleswithAv.45.5forn=20 orAv.72.0forn=50. m1122222222 Tests

(r,m)ˆ Dn;c1 (r,m)ˆ Dn;c1 (r,m)ˆ Dn;c4 (r,m)ˆ Dn;c4 (r,m)ˆ Dn;c1 (r,m)ˆ Dn;c4 (r,m)ˆ Dn;c1 (r,m)ˆ Dn;c4 (r,m)ˆ Dn

(r,m)ˆ Dn;c4 Alt.Skewn/r0.10.3-0.99-0.95-0.9-0.9-0.7-0.7-0.7-0.5 402028273434293327312729 SB(1,1)5068635552695064426335 412029293434293329332931 LO(0.2,3)5069676360695867506643 442017172222172217221722 Weibull(2)5041394847414639423939 452028303334293430363137 LO(0.1,3)5060636464606463656364 462027283334273328332833 2 10χ5062616665626460606057 472019212223202421252227 LO(0.05,3)5036394141364240454047 482076777576767678777877 LO(0.1,5)5097987776977498679861 492023242627232724282429 SU(−1,2)5044474546444647484749 502054535959545953565355 2 4χ5093917876937491699164 522053555354535555575558 LO(0.05,5)5079836970807283748376 542065656364656465646564 LO(0.05,7)5090924142904492489250 572073696665736469596954 SU(1,1)5096959594969495939691 582091918585918491829180 LN(0,1)501001002121100211002210023 Av.2044.845.246.646.944.946.945.246.345.245.8 Av.5072.072.058.657.972.057.672.255.772.054.0

(11)

quantities and test statistics:

a(r,k)n = 1

(n

k

)

k−1

j=1

(k j

)(n− k k− j

) [

2Γ(2r + 2) kr(k− j)rBk−j

2k−j(r + 1, r + 1)

+ j Γ(2r + 1)

(2k− j)2r+1 Γ2(r + 1) k2r

] +(1n

k

)

[Γ(2r + 1)− Γ2(r + 1) k2r

]

b(r,k)n = 1

(n

k

)

k−1

j=1

(k j

)(n− k k− j

)[

2(2k− j)Γ(2r + 2) + Γ(2r + 3) kr+1(k− j)r+1 Bk−j

2k−j(r + 2, r + 2) + j Γ(2r + 2)

(2k− j)2r+2 −Γ(r + 1)Γ(r + 2) k2r+1

] + 1

(n

k

)

[Γ(2r + 2)− Γ(r + 1)Γ(r + 2) k2r+1

]

c(r,k)n = 1

(n

k

)

k−1

j=1

(k j

)(n− k k− j

) [

2 Γ(2r + 4)

kr+1(k− j)r+1Bk−j

2k−j(r + 2, r + 2)

+ j Γ(2r + 3)

(2k− j)2r+3 Γ2(r + 2) k2r+2

] +(1n

k

)

[Γ(2r + 3)− Γ2(r + 2) k2r+2

] ,

where

Bx(α, β) =

x

0

tα−1(1− t)β−1dt, 0 < x < 1; α, β > 0, denotes the incomplete beta function.

E1(r,k) = E

[

ϕ(Z)(1− Φ(Z))k−2logr−1 1 1− Φ(Z)

]

, Z ∼ N(0, 1),

E2(r,k) = E

[

Zϕ(Z)(1− Φ(Z))k−2logr−1 1 1− Φ(Z)

] .

s(r,k)n = k2r2

n [

(E1(r,k))2+1

2(E2(r,k))2

]

t(r,k)n = k2r(r + 1)

n [

E(r,k)1 E1(r+1,k)+1

2E2(r,k)E2(r+1,k)

]

u(r,k)n = k2(r + 1)2

n [

(E1(r+1,k))2+1

2(E2(r+1,k))2

]

(12)

a(r,k)n1 = a(r,k)n − s(r,k)n

b(r,k)n1 = b(r,k)n − t(r,k)n

c(r,k)n1 = c(r,k)n − u(r,k)n

(r,k)n1 = det

[

a(r,k)n1 b(r,k)n1

b(r,k)n1 c(r,k)n1

] .

R-tests are as follows

Tˆn(r,k) = 1

(r,k)n1

c(r,k)n1

 1(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr 1

1− Φ(

Xi:n− ¯Xn

Sn

) − Γ(r + 1) kr

2

− 2b(r,k)n1

 1(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr 1

1− Φ(

Xi:n− ¯Xn

Sn

) − Γ(r + 1) kr

×

 1(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr+1 1

1− Φ(

Xi:n− ¯Xn

Sn

) −Γ(r + 2) kr+1

+ a(r,k)n1

 1(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr+1 1

1− Φ(

Xi:n− ¯Xn

Sn

) −Γ(r + 2) kr+1

2

= ˆTn;c(r,k)1 + ˆTn;c(r,k)2 = ˆTn;c(r,k)3 + ˆTn;c(r,k)4 ,

where

Tˆn;c(r,k)1 = 1

a(r,k)n1

 1(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr 1

1− Φ(

Xi:n− ¯Xn

Sn

) − Γ(r + 1) kr

2

Tˆn;c(r,k)2 = 1

(r,k)n1 a(r,k)n1

a(r,k)n1 1

(n

k

)

n−k+1

i=1

(n− i k− 1

)

logr+1 1

1− Φ(

Xi:n− ¯Xn

Sn

)

− b(r,k)n1 (1n

k

)

n−k+1

i=1

(n− i k− 1

)

logr 1

1− Φ(

Xi:n− ¯Xn

Sn

)

(

a(r,k)n1 Γ(r + 2)

kr+1 − b(r,k)n1 Γ(r + 1)

kr

)]2

Cytaty

Powiązane dokumenty

Used for small samples (n ≤30), when it performs better than the chi-squared test.. Chi-squared goodness-of-fit test – cont. General form of the

This happens for example if we want to check whether a random variable fits (comes from) a specified distribution (when we will perform so- called goodness-of-fit tests), when we

Properties of order statistics (o.s.) for fixed sample size n were widely investigated, while a literature on this subject in the case when n is a value of random variable V is not

ment bounds for the expectations of order and record statistics based on independent identically distributed random variables... 1- Introduction. random variables with a

Tablice wartości oczekiwanych i dyspersji liczby ogniw w sekwencjach losowych z podaniem prawdopodobieństw przekroczenia

Fig. Region and site of excising compact bone samples: A) used in tests of the impact of the sampling site, the loading rate, and the type of mechanical test on bone properties, B)

Next, we verified that the flares not in the GOES event list were not bad detections caused by data artefacts or deficiencies in the LYRAFF algorithm. We randomly se-.. a)

The need was acknowledged and directions for activation of using infoгmation and communication technologies as а prerequisite for the successful implementation of