• Nie Znaleziono Wyników

Nonparametric tests for the two-sample location problem are investigated

N/A
N/A
Protected

Academic year: 2021

Share "Nonparametric tests for the two-sample location problem are investigated"

Copied!
8
0
0

Pełen tekst

(1)

P. G R Z E G O R Z E W S K I (Warszawa)

THE ROBUSTNESS AGAINST DEPENDENCE OF NONPARAMETRIC TESTS

FOR THE TWO-SAMPLE LOCATION PROBLEM

Abstract. Nonparametric tests for the two-sample location problem are investigated. It is shown that the supremum of the size of any test can be arbitrarily close to 1. None of these tests is most robust against depen- dence.

1. Introduction. Situations with some kind of dependencies for the Mann–Whitney–Wilcoxon test were investigated by Hollander, Pledger and Lin [3], Pettit and Siskind [4], Serfling [6], Zieli´nski [8], [9]. In this paper we consider the robustness against dependence of a large family of non- parametric tests for the two-sample location problem, including the test mentioned above. We take advantage of a new description of dependence, called R¨uschendorf’s ε-neighbourhoods, proposed in [2].

2. Problem and notation. Let X1, . . . , Xmand Y1, . . . , Yndenote two independent random samples from populations with continuous distribution functions FX(x) = F (x − ∆) and FY(y) = F (y) respectively. We verify the hypothesis H : ∆ = 0 against K : ∆ > 0 by means of a test φ of size α. We assume that φ belongs to some family Φ of tests (see Sec. 3 for the definition of Φ).

Let P(F ) = {P : P (Zi ≤ z) = F (z), i = 1, . . . , m + n} describe all possible violations of independence. We denote P(F ) briefly by P. Let PC ⊂ P be the subfamily of all continuous distribution functions. Moreover, let Cε⊂ PC be a family of all c.d.f. which correspond to small dependencies (for more details see Sec. 4).

1991 Mathematics Subject Classification: Primary 62G35; Secondary 62G10.

Key words and phrases: robustness of tests, robustness against dependence, nonpara- metric tests for the two-sample location problem, size of test.

(2)

The following problems are considered in this paper:

(A) Given any φ ∈ Φ, compute the supremum of the size of φ under all kinds of dependencies, i.e. supP ∈PR

Xφ dP , where X denotes a sample space.

(B) Given any φ ∈ Φ, evaluate the robustness of the size of φ against small dependencies. We use the oscillation of the size over Cε as a measure of robustness (see [7]):

rε(φ) = sup

P ∈Cε

R

X

φ dP − inf

P ∈Cε

R

X

φ dP.

(C) Find the most robust test in Φ, i.e. a test φ0such that rε0) ≤ rε(φ) (∀φ ∈ Φ) for all ε.

3. The family Φ. We restrict our consideration to a family Φ of one-sided nonparametric tests for the two-sample location problem given as follows:

Definition. φ ∈ Φ if and only if

(i) φ(x1+ τ, . . . , xm+ τ, y1+ τ, . . . , yn+ τ )

= φ(x1, . . . , xm, y1, . . . , yn) ∀τ ,

(ii) φ(x1, . . . , xi−1, xi+ δ, xi+1, . . . , xm, y1, . . . , yn)

≥ φ(x1, . . . , xm, y1, . . . , yn) (∀δ ≥ 0), i = 1, . . . , m, (iii) if X1:m > Yn:n then φ(x1, . . . , xm, y1, . . . , yn) = 1,

if Xm:m < Y1:n then φ(x1, . . . , xm, y1, . . . , yn) = 0,

where Xi:m and Yi:n denote the ith order statistics from the first and the second sample respectively.

The conditions (i)–(iii) seem to be quite natural. The Mann–Whitney–

Wilcoxon test, the Fisher–Yates test, the Rosenbaum test and many other tests for the two-sample location problem belong to the family Φ (see [1]).

4. A description of dependence. By the R¨uschendorf theorem (see [5]) we know that h is the density of a probability measure on [0, 1]r with uniform marginals and continuous w.r.t. the Lebesgue measure dµ on [0, 1]r if and only if h = 1 + Sf where f ∈ L1([0, 1]r) and S : L1→ L1is the linear operator given by

Sf = f −

r

X

i=1

R

[0,1]r−1

f dz1. . . cdzi. . . dzr+ (r − 1) R

[0,1]r

f dz1. . . dzr

and Sf ≥ −1.

Without loss of generality we assume that F is the uniform distribution on [0, 1].

(3)

Moreover, define R = {f ∈ L1([0, 1]r) : Sf ≥ −1}, and let [0, u]r = {x ∈ [0, 1]r : 0 ≤ xi≤ ui, i = 1, . . . , r}.

Basing on the above theorem and assumptions we may write that PC =n

P : P (u) = R

[0,u]r

(1 + Sf ) dµ, f ∈ Ro .

In [2] a new description of dependence, called R¨uschendorf’s ε-neighbour- hoods, was proposed and motivated. Following that paper let Rε = {f ∈ L1: kSf k ≤ ε, Sf ≥ −1}, where k · k is the L1 norm. Then

Cε =n

P : P (u) = R

[0,u]r

(1 + Sf ) dµ, f ∈ Rε

o

describes the family of distributions which correspond to small departures from independence, so called ε-dependence, i.e. if ε is sufficiently small then the dependence measured by %-Spearman’s and many other meassures is small as well, and conversely.

In order to solve our problems (A) and (B) it will be necessary for any φ ∈ Φ to compute:

sup

f ∈R

R

[0,1]r

(1 + Sf )φ dµ, (A)

rε(φ) = sup

f ∈Rε

R

[0,1]r

(1 + Sf )φ dµ − inf

f ∈Rε

R

[0,1]r

(1 + Sf )φ dµ.

(B)

It is easy to show (see Sec. 7) that the above expressions are equivalent to the following, more convenient in further investigations:

sup

g∈G

R

[0,1]r

(1 + g)φ dµ, (A)

rε(φ) = sup

g∈Gε

R

[0,1]r

(1 + g)φ dµ − inf

g∈Gε

R

[0,1]r

(1 + g)φ dµ, (B)

where G = {g ∈ L1([0, 1]r) : g ≥ −1, R

[0,1]rg dµ = 0,R

[0,1]r−1g dz1. . . cdzi. . . . . . dzr = 0, ∀i = 1, . . . , r}, and Gε = {g ∈ G : kgk ≤ ε}.

5. Results. Now we can state the solutions of our problems (A)–(C).

Theorem 1. Let φ ∈ Φ. Suppose that all kind of dependencies between samples and among observations in samples are allowed. Then the size of the test φ can be arbitrarily close to 1, i.e. supP ∈PR

Xφ dP = 1.

(4)

Theorem 2. The robustness of any test φ ∈ Φ against ε-dependence equals ε/2, i.e. rε(φ) = ε/2.

From this theorem we get immediately:

Corollary. In the family Φ of one-sided nonparametric tests for the two-sample location problem, no test is most robust against dependence.

6. Proofs

P r o o f o f T h e o r e m 1. Let {GN}N =2 be the sequence of r-dimensio- nal subsets of [0, 1]r, r = m + n, given by

GN =[ i N,i + 1

N

m

× j − 1 N , j

N

n ,

where the union is extended over all (i, j) of the form (k mod N, k), for k = 1, . . . , N .

Let {gN}N =2 be the sequence of real functions on [0, 1]r defined as fol- lows:

gN(z) = Nr−1− 1 for z ∈ GN,

−1 for z 6∈ GN. We show that gN ∈ G (∀N ≥ 2):

gN ≥ −1 by the definition, (a)

R

[0,1]r

gNdµ = NNr−1− 1

Nr + (−1)



1 − N 1 Nr



= 0, (b)

R

[0,1]r−1

gNdz1. . . cdzi. . . dzr = Nr−1− 1

Nr−1 + (−1)



1 − 1 Nr−1



= 0 (c)

for i = 1, . . . , r.

So gN ∈ G for every N ≥ 2.

Now take φ ∈ Φ and denote by α its size. Suppose that φ is a non-randomized test with a critical region Kα. It is easy to check that for each N ≥ 2,

GN \



0, 1 N

m

× N − 1 N , 1

n

⊂ {(z1, . . . , zr) : 0 ≤ zj ≤ zi≤ 1, i = 1, . . . , m; j = m + 1, . . . , m + n} ⊂ Kα for every φ ∈ Φ (see conditions specified in Sec. 3). So we get

(5)

sup

P ∈P

R

X

φ dP ≥ sup

P ∈PC

R

X

φ dP = sup

g∈G

R

[0,1]r

(1 + g)φ dµ

= sup

g∈G

R

Kα

(1 + g) dµ ≥ R

Kα

(1 + gN) dµ

= α +



(N − 1)Nr−1− 1

Nr + (−1)



α − (N − 1) 1 Nr



= α + N − 1

N − α = N − 1 N .

Choosing N large enough one can come arbitrarily close to 1.

R e m a r k. For simplicity we have assumed in the proof that φ is a non-randomized test. The theorem is true for randomized tests as well.

P r o o f o f T h e o r e m 2. We take a non-randomized test φ ∈ Φ and denote by α its size and by Kα its critical region. Let {GN} be as in the proof of Theorem 1. Consider the sequence {gN0 } of real functions on [0, 1]r defined by

gN0 (z) =

ε

2Nr−1 for z ∈ GN,

ε 2

Nr−1

Nr−1− 1 for z 6∈ GN,

for N ≥ N0= (2/(2 − ε))1/(r−1). It is easily seen that gN0 ∈ Gε (∀N ≥ N0).

So we get sup

g∈Gε

R

[0,1]r

(1 + g)φ dµ

= sup

g∈Gε

R

Kα

(1 + g) dµ ≥ R

Kα

(1 + g0N) dµ

= α +



(N − 1)ε

2Nr−1 1 Nr +



ε 2

Nr−1 Nr−1− 1



α − (N − 1) 1 Nr



→ α + ε

2(1 − α) as N → ∞.

In order to show that also supg∈GεR

[0,1]r(1 + g)φ dµ ≤ α + ε2(1 − α), we consider the operator T g =R

Kαg dµ. It is a bounded linear operator, so we get T g ≤ kT kkgk (∀g ∈ G).

By the proof of Theorem 1, kT k = sup

g∈G

kT gk

kgk = 1 − α 2

(6)

(this is evident, because ε cannot be greater than 2). So we get (∀g ∈ Gε) T g ≤ 1 − α

2 kgk ≤ 1 − α 2 ε and therefore

sup

g∈Gε

R

Kα

(1 + g) dµ = sup

g∈Gε

(α + T g) ≤ α + 1 − α 2 ε.

Hence

sup

g∈Gε

R

[0,1]r

(1 + g)φ dµ = α + ε

2(1 − α).

Now we consider infg∈Gε

R

[0,1]r(1+g)φ dµ. Let us define a sequence {G00N} by

G00N =[ i − 1 N , i

N

m

× j

N,j + 1 N

n

⊂ [0, 1]r,

where the union is extended over all (i, j) of the form (k, kmod N ) for k = 1, . . . , N .

Let {gN00} be the following sequence of real functions on [0, 1]r:

gN00(z) =

ε

2Nr−1 for z ∈ G00N,

ε 2

Nr−1

Nr−1− 1 for z 6∈ G00N, where N ≥ N0. It is easily seen that g00N ∈ Gε (∀N ≥ N0). So

g∈Ginfε

R

[0,1]r

(1 + g)φ dµ = inf

g∈Gε

R

Kα

(1 + g) dµ ≤ R

Kα

(1 + gN00) dµ

= α + ε

2Nr−1 1 Nr ε

2

Nr−1 Nr−1− 1

 α − 1

Nr



→ α(1 − ε/2) as N → ∞.

Similarly, we can prove the opposite inequality:

g∈Ginfε

R

[0,1]r

(1 + g)φ dµ ≥ α(1 − ε/2)

and therefore

g∈Ginfε

R

[0,1]r

(1 + g)φ dµ = α(1 − ε/2).

(7)

Thus finally

rε(φ) = sup

P ∈Cε

R

X

φ dP − inf

P ∈Cε

R

X

φ dP

= sup

g∈Gε

R

[0,1]r

(1 + g)φ dµ − inf

g∈Gε

R

[0,1]r

(1 + g)φ dµ

= α + ε

2(1 − α) − α

 1 − ε

2



= ε 2, which completes the proof.

As before, the theorem is also true for randomized tests.

7. Complements. In Section 4 we have stated that in our problem we could consider the families G and Gε instead of R and Rε. This follows from Lemma. Let S be the operator defined in Section 4. Then S(R) = G and S(Rε) = Gε.

P r o o f. It suffices to show that S(R) = G. The same proof remains valid for the second assertion.

Suppose f ∈ R. Then Sf ∈ L1([0, 1]r), Sf ≥ −1 and R

[0,1]rSf dµ = R

[0,1]r−1Sf dµ = 0. So S(R) ⊆ G.

Now take any g ∈ G. Then Sg = g −

r

X

i=1

R

[0,1]r−1

gdz1. . . cdzi. . . dzr+ (r − 1) R

[0,1]r

g dz1. . . dzr= g and hence S(R) ⊇ G.

References

[1] P. G r z e g o r z e w s k i, Nonparametric tests for two-sample location problem, Mat.

Stos. 34 (1991), 37–57 (in Polish).

[2] —, The infinitesimal robustness of tests against dependence, Zastos. Mat. 21 (3) (1992), 455–460.

[3] M. H o l l a n d e r, G. P l e d g e r and P. E. L i n, Robustness of the Wilcoxon test to a certain dependency between samples, Ann. Statist. 2 (1974), 177–181.

[4] A. N. P e t t i t and V. S i s k i n d, Effect of within-sample dependence on the MWW statistic, Biometrica 68 (1981), 437–441.

[5] L. R ¨u s c h e n d o r f, Construction of multivariate distributions with given marginals, Ann. Inst. Statist. Math. 37 (1985), 225–233.

[6] R. J. S e r f l i n g, The Wilcoxon two-sample statistic on strongly mixing processes, Ann. Math. Statist. 39 (1968), 1202–1209.

[7] R. Z i e l i ´n s k i, Robust statistical procedures: a general approach, in: Lecture Notes in Math. 982, Springer, 1983, 283–295.

(8)

[8] R. Z i e l i ´n s k i, Robustness of two-sample tests to dependence of the observations, Mat. Stos. 32 (1989), 5–18 (in Polish).

[9] —, Robustness of the one-sided Mann–Whitney–Wilcoxon test to dependency between samples, Statist. Probab. Lett. 10 (1990), 291–295.

PRZEMYS LAW GRZEGORZEWSKI INSTITUTE OF MATHEMATICS

WARSAW UNIVERSITY OF TECHNOLOGY PL. POLITECHNIKI 1

00-661 WARSZAWA, POLAND

Received on 25.10.1993

Cytaty

Powiązane dokumenty

The convergence of difference schemes was proved first locally, next in the unbounded case for differential problems [2], and finally for differential-functional systems using a

by Gerd Herzog and Roland Lemmert

The aim of the present paper is the construction of a strong dual problem for (P) K with more regular variables, namely Radon measures, in place of (L ∞ ) ∗ - functionals (which

The second application, is an op- erational version of CACTus that scans in real time, the last available LASCO images for recent CMEs.. Key words: Sun: corona, Sun:

We first notice that if the condition (1.7) is satisfied then the a priori estimates for u − ε 1 (x) given in Corollary 3.3 can be modified so as to be independent of ε... Below

This assumption is physically reasonable in the case of Coulomb interactions (it means that the boundary is grounded) or for arbitrary interactions if the domain Ω is

The levels of such parameters characterizing dynamic loads and overloads of examined movement structures as: maximal and average values of vertical ground reaction forces, total

In a series of papers he considered the incompressible nonstationary Navier–Stokes equa- tions as a free boundary problem with constant surface tension and without surface tension..