F. C Z E K A L A (Wroc law)
NORMALIZING CONSTANTS FOR A STATISTIC BASED ON LOGARITHMS OF DISJOINT m-SPACINGS
Abstract . The paper is concerned with the asymptotic normality of a certain statistic based on the logarithms of disjoint m-spacings. The exact and asymptotic mean and variance are computed in the case of uniform distribution on the interval [0, 1]. This result is generalized to the case when the sample is drawn from a distribution with positive step density on [0, 1].
1. Introduction. Let X
1, . . . , X
Nbe a sample from the uniform dis- tribution on the interval [0, 1]. Let us denote by X
(1), . . . , X
(N )the order statistics derived from this sample and define X
(0)= 0, X
(N +1)= 1. Let Y
0, Y
1, . . . , Y
Nbe i.i.d. exponential random variables with unit mean. It is known that
(1) X
(i)= exp
d− Y
ii − . . . − Y
NN
, i = 1, . . . , N.
We will use the notation
(2) X
k:l= exp
−
l
X
i=k
Y
ii
, 1 ≤ k ≤ l ≤ N.
We also have
(3) X
(i)=
dY
1+ . . . + Y
iY
0+ . . . + Y
N.
We define the m-spacings from the sample X
1, . . . , X
Nas the differences D
(m)n,N= X
(n+m)− X
(n), n = 0, . . . , N + 1 − m, m ≥ 1. Let us introduce the
1991 Mathematics Subject Classification: 62E15, 62E20, 62G10, 62G30.
Key words and phrases: higher-order spacings; asymptotic normality; step density function.
Research supported by KBN (Committee for Scientific Research, Poland) Grant 2 1033 91 01.
[405]
following notation:
(4) G
(m)n= D
(m)n,Nlog D
n,N(m). We are interested in the statistic
(5) G
N= X
0≤im≤N +1−m
G
(m)imbased on disjoint m-spacings. It can be proved as in Cressie (1976) or deduced from Del Pino (1979) that G
Nis asymptotically normal. The gen- eral results of Del Pino (1979) give the asymptotic means and variances of statistics based on disjoint m-spacings in the form of means and variances of some other random variables. In any particular case these quantities have to be computed separately to obtain the values of normalizing constants.
This method was used by Jammalamadaka & Tiwari (1986). In this paper we will find the exact values of E(G
N) and Var (G
N) in a closed and simple form. We will also establish the asymptotic normality of G
Nfor alternatives with step densities.
2. Main results
Theorem 1. Let k ≥ 1 and N + 1 = km. Then the random variable G
Nhas the mean and variance given by E(G
N) = −
N +1
X
i=m+1
1 i , (6)
Var(G
N) = 1
N + 2 F (m + 1) − F (N + 2)], (7)
where
(8) F (n) = n
∞
X
i=n
1 i
2.
The method of the proof will be similar to that of Cressie (1976). To prove the main result we will need some lemmas.
Lemma 1. For α > −1 and n ≥ 1 we have E(X
n:Nα) =
N
Y
i=n
i i + α , (a)
E(X
n:Nαlog X
n:N) = −
N
Y
i=n
i i + α
N
X
j=n
1 j + α , (b)
E(X
n:Nαlog
2X
n:N) =
N
Y
i=n
i i + α
NX
j=n
1 (j + α)
2+
NX
j=n
1 j + α
2.
(c)
P r o o f. The proof is straightforward using the definition of X
n:Nand integrating with respect to the density of the exponential distribution.
Lemma 2. For n, m ≥ 1 we have E(log (1 − X
n:n+m−1)) = −
∞
X
k=1
1 k
n+m−1
Y
i=n
i i + k = −
n+m−1
X
i=m
1 i . P r o o f. Expanding the logarithm into a power series we get E(log (1 − X
n:n+m−1)) = E
−
∞
X
k=1
1
k X
n:n+m−1k= −
∞
X
k=1
1
k E(X
n:n+m−1k) = −
∞
X
k=1
1 k
n+m−1
Y
i=n
i i + k . On the other hand, X
n:n+m−1has the same distribution as the nth order statistic from a sample of size n + m − 1. Applying the transformation x → 1 − x we get 1 − X
n:n+m−1= X
d m:n+m−1and hence E(log (1 − X
n:n+m−1)) = E(log X
m:n+m−1) = −
n+m−1
X
i=m
1 i . Lemma 3. For n, m ≥ 1 we have
∞
X
k=1 n+m
Y
i=n
i
i + k = n m .
P r o o f. Using Lemma 1 and the transformation x → 1 − x as in the proof of Lemma 2, we get
∞
X
k=1 n+m
Y
i=n
i i + k =
∞
X
k=1
E(X
n:n+mk) = E X
∞k=1
X
n:n+mk= E
X
n:n+m1 − X
n:n+m= E 1 − X
m+1:n+mX
m+1:n+m= E
1
X
m+1:n+m− 1
= E
exp
n+mX
i=m+1
Y
ii
− 1 =
n+m
Y
i=m+1
i
i − 1 − 1 = n m . Lemma 4. For n, m ≥ 1 we have
∞
X
k=1 n+m−1
Y
i=n
i i + k
n+m−1
X
i=n
1 i(i + k) =
∞
X
k=n+m
1 k
2. P r o o f. This can be proved by induction. See Cressie (1976).
P r o o f o f T h e o r e m 1. We calculate E(G
N) first. It follows from (3)
that all m-spacings have the same distribution. Hence from Lemma 1(b) we
get
E(G
(m)n) = E(G
(m)0) = E(X
(m)log X
(m))
= E(X
m:Nlog X
m:N) = − m N + 1
N
X
i=m
1 i + 1 . Thus
E(G
N) = E
k−1X
i=0
G
(m)im= kE(G
(m)0)
= N + 1
m · −m
N + 1
N
X
i=m
1 i + 1 = −
N
X
i=m
1 i + 1 . The variance of G
Nis more complicated to obtain. We have
Var(G
N) = k Var(G
(m)0) + k(k − 1) Cov(G
(m),G(m)
m )
0
.
From Lemma 1 we get
Var(G
(m)0) = E(X
m:N2log
2X
m:N) − E
2(X
m:Nlog X
m:N)
= m(m + 1) (N + 1)(N + 2)
N +2X
i=m+2
1 i
2+
N +2X
i=m+2
1 i
2− m
2(N + 1)
2 N +1X
i=m+1
1 i
2. To calculate Cov(G
(m)0, G
(m)n), n ≥ m, let us note that
log (X
n+m:N− X
n:N) = log X
n+m:N+ log
1 − X
n:NX
n+m:N= log X
n+m:N+ log (1 − X
n:n+m−1).
Hence
E(G
(m)0G
(m)n) = E(X
m:Nlog X
m:N(X
n+m:N− X
n:N) log X
n+m:N)
+ E(X
m:Nlog X
m:N(X
n+m:N− X
n:N) log (1 − X
n:m+n−1)).
The expected values above can be calculated using the identities X
n:N= X
n:n+m−1X
n+m:N, X
m:N= X
m:n−1X
n:Nand the fact that X
m:n−1, X
n:n+m−1, X
n+m:Nare independent. After ele- mentary but lengthy calculations we get
(9) E(X
m:Nlog X
m:N(X
n+m:N− X
n:N) log X
n+m:N)
= m
2(N + 1)(N + 2)
N +2X
i=m+n+2
1 i
N +2
X
i=m+1
1 i +
N +2
X
i=m+n+2
1 i
2.
Similarly, using the expansion log (1 − X
n:m+n−1) = − P
∞k=1
k
−1X
n:m+n−1kand Lemmas 1–4 we obtain
(10) E(X
m:Nlog X
m:N(X
n+m:N− X
n:N) log (1 − X
n:n+m−1))
= m
2(N + 1)(N + 2)
m+n+1X
i=m+1
1 i
N +2
X
i=m+1
1 i −
∞
X
i=m+n+2
1 i
2. Combining (9) and (10) we have for n ≥ m,
E(G
(m)0G
(m)n) = m
2(N + 1)(N + 2)
N +2X
i=m+1
1 i
2−
∞
X
i=N +3
1 i
2and
Cov(G
(m)0, G
(m)n) = E(G
(m)0G
(m)n) − E(G
(m)0)E(G
(m)n)
= m
2N + 1
1 N + 2
N +2X
i=m+1
1 i
2− 1
N + 1
N +1X
i=m+1
1 i
2− 1
N + 2
∞
X
i=N +3
1 i
2. Now we can calculate Var(G
N):
Var(G
N) = k Var(G
(m)0) + k(k − 1) Cov(G
(m)0, G
(m)m)
= m + 1 N + 2
N +2X
i=m+2
1 i
2+
N +2X
i=m+2
1 i
2− m
N + 1
N +1X
i=m+1
1 i
2+ (k − 1)m
1 N + 2
N +2X
i=m+1
1 i
2− 1
N + 1
N +1X
i=m+1
1 i
2− 1
N + 2
∞
X
i=N +3
1 i
2= m + 1 N + 2
N +2
X
i=m+2
1
i
2− (k − 1)m N + 2
∞
X
i=N +3
1 i
2+
N +2X
i=m+1
1 i
2−
N +1X
i=m+1
1 i
2+ m + 1 N + 2
N +2X
i=m+2
1 i
2−
N +2X
i=m+1
1 i
2= 1
N + 2
(m + 1)
∞
X
i=m+1
1
i
2− (N + 2)
∞
X
i=N +2
1 i
2.
In the next lemma we give the expected value and the variance of G
Nin the case when N + 1 is not divisible by m.
Lemma 5. Assume that N + 1 is not divisible by m and let n < N be the largest integer such that n + 1 is divisible by m, i.e. n + 1 = km. Then
(11) E(G
N) = − n + 1
N + 1
N +1
X
i=m+1
1 i , (12) Var(G
N) = (n + 1)(n + 2)
(N + 1)(N + 2)
×
N +2X
i=n+3
1 i
2+
N +2X
i=n+3
1 i +
n+1
X
i=m+1
1 i
2+ Var(G
n)
− (n + 1)
2(N + 1)
2 N +1X
i=m+1
1 i
2where Var(G
n) is given by (7).
P r o o f. We have G
N= P
k−1i=0
D
(m)im,Nlog D
(m)im,Nand D
im,N(m)= (X
im+m:N− X
im:N)
= X
n+1:N(X
im+m:n− X
im:n) = X
n+1:ND
(m)im,n. In the equation above X
n+1:Nis independent of D
im,n(m), thus
G
N=
k−1
X
i=0
X
n+1:ND
im,n(m)log (X
n+1:ND
im,n(m))
=
k−1
X
i=0
X
n+1:Nlog X
n+1:ND
im,n(m)+
k−1
X
i=0
X
n+1:ND
(m)im,nlog D
im,n(m)= X
n+1:Nlog X
n+1:N+ X
n+1:NG
nand X
n+1:Nis independent of G
n. Using the last identity and applying Lemma 1 we get easily the statement of Lemma 5.
Theorem 2. E(G
N) and Var(G
N) given by (6), (11) and (7), (12) re- spectively are asymptotically equivalent to
(13) e
N=
m
X
i=1
1
i − log (N + 2) − γ and
(14) σ
N2= 1
N + 2 [F (m + 1) − 1],
where γ = 0.577 . . . is Euler’s constant.
P r o o f. We want to show that Var(G
N)/σ
N2→ 1 and (E(G
N) − e
N)/σ
N→ 0. Since F (n) → 1 it is easy to show that (7) and (12) are equivalent to σ
2N. It remains to prove that √
N + 2 (E(G
N) − e
N) → 0. We can replace (11) by (6) because
√ N + 2
− n + 1 N + 1
N +1
X
i=m+1
1 i +
N +1
X
i=m+1
1 i
≤ m √ N + 2 N + 1
N +1
X
i=m+1
1 i → 0.
Let us define γ
N= P
Ni=1
i
−1− log (N + 1). Then γ
N=
N
X
i=1
1
i − log i + 1 i
and γ
N→ γ. Applying the inequalities 0 < 1
i − log i + 1 i
< 1 i − 1
i + 1 < 1 i
2, we get
√ N + 2
−
N +1
X
i=m+1
1 i − e
N= √
N + 2 (γ − γ
N +1) = √ N + 2
∞
X
i=N +2
1
i − log i + 1 i
≤ √ N + 2
∞
X
i=N +2
1 i
2≤
∞
X
i=N +2
1 i
3/2→ 0.
It is easy to see that the expresions for e
Nand σ
N2given by (13) and (14) agree with the results of Gebert and Kale (1969) for m = 1 and Jammala- madaka & Tiwari (1986) for m ≥ 1.
3. The case of step densities. In this section we will find the asymp- totic distribution of G
nin the case when the underlying distribution has a positive step density on [0, 1]. For m = 1 it was found by Czeka la (1993).
Let X = (X
1, X
2, . . . , X
n, . . .) be a sequence of random variables which are used to form G
n. G
nis a function of n and X which we can write as G
n= φ(n, X). To simplify the notation we assume that φ(0, X) = 0. Now let k ≥ 1 be a fixed integer and let 0 = x
0< x
1< . . . < x
k= 1 be fixed real numbers. We can define subintervals of [0,1] as follows:
I
1= [x
0, x
1), . . . , I
k−1= [x
k−2, x
k−1), I
k= [x
k−1, x
k].
The lengths of the intervals I
iwill be denoted by d
i. Let f
i> 0, i = 1, . . . , k, be fixed numbers such that P
ki=1
f
id
i= 1. These numbers together with
the intervals I
idefine a step density f :
(15) f (x) =
k
X
i=1
f
i1
Ii(x)
Define p
i= f
id
i. We have P p
i= 1 and hence there exist numbers 0 = x
00<
x
01< . . . < x
0k= 1 such that the intervals I
i0, defined similarly to I
i, have lengths p
i. There also exists a vector of random elements (U, Y
1, . . . , Y
k) such that:
(a) the coordinates of this vector are stochastically independent, (b) U = (U
1, U
2, . . .) is a sequence of independent random variables uniformly distributed on [0,1],
(c) Y
i= (Y
1i, Y
2i, . . .), for i = 1, . . . , k, are sequences of independent random variables with uniform distribution on I
i.
We can now define a sequence Z = (Z
1, Z
2, . . .) of independent random variables with density f :
(16) Z
n=
k
X
i=1
1
Ii0(U
n)Y
ni, n ≥ 1.
Let us denote by N
i,nthe number of random variables Z
1, . . . , Z
ntaking values in the interval I
i, that is,
(17) N
i,n=
n
X
j=1
1
Ii(Z
j).
It is easy to see that N
i,n= P
nj=1
1
Ii0(U
j) and thus the sequence of vectors (N
1,n, . . . , N
k,n) is independent of (Y
1, . . . , Y
k). Using this we will prove in the next two lemmas that φ(n, Z) is asymptotically identically distributed as
(18)
k
X
i=1
φ(N
i,n, Y
i).
In the sequel we will use the notation ψ(x) = x log x.
Lemma 6. If τ is an arbitrary random variable taking values in the set {1, . . . , m} then
√ n ψ(X
(τ )) → 0,
Pwhere X
(τ )denotes the τ -th order statistic from the sample of size n.
P r o o f. We have P ( √
n |ψ(X
(τ ))| ≥ ε) ≤
m
X
i=1
P ( √
n |ψ(X
(i))| ≥ ε),
hence it is enough to prove that √
n ψ(X
(i)) → 0 for each fixed i ≥ 1. It is
Pknown that nX
(i)converges weakly to some random variable X (with Er- lang’s distribution). Since ψ is a continuous function we also have ψ(nX
(i))
→ ψ(X). It follows that
d√ n ψ(X
(i)) = ψ(nX
(i))
√ n − log n
√ n (nX
(i)) → 0.
PLemma 7. The statistic φ(n, Z) has the same asymptotic distribution as the statistic (18).
P r o o f. Let us consider the sequence Z
αi,1, Z
αi,2, . . . of those successive random variables Z
nwhose values belong to the interval I
i. The sequence α
i,n, n ≥ 1, is determined by the sequence U , so it is independent of Y
i. Since by (16), Z
αi,n= Y
αii,n, the sequence Z
αi,nhas the same probability dis- tribution as Y
i. It can be shown similarly that the joint distribution of the vector (Z
α1,n, n ≥ 1, . . . , Z
αk,n, n ≥ 1) is the same as that of (Y
1, . . . , Y
k).
It follows that the sequence X
(1), . . . , X
(n)has the same probability dis- tribution as Y
(1)1, . . . , Y
(N11,n), . . . , Y
(1)k, . . . , Y
(Nkk,n). To prove our lemma we need to exclude the m-spacings that span different intervals I
i. We have N
i,nP.1
→ ∞ so we can assume that each of the intervals I
icontains at least one full m-spacing. Then the m-spacings that are fully contained in I
iare built from the random variables
(19) Y
τii, Y
τii+1, . . . , Y
τii+ηi,
where τ
i= m + m[S
i−1/m] − S
i−1, η
i= m[S
i/m] − m[S
i−1/m] − m and S
i= P
ij=1
N
j,n. We have omitted the index n in the definition of τ
i, η
i, S
ito simplify the notation. Let D = Y
τi+1i+1−Y
τii+ηi
be the m-spacing containing x
iwhich is the common endpoint of I
iand I
i+1. We can write D = D
1+ D
2= (x
i− Y
τii+ηi) + (Y
τ i+1i+1− x
i). We have √
n |ψ(D)| ≤ √
n |ψ(D
1)| + √
n |ψ(D
2)|
and from Lemma 6 we obtain ψ(D) → 0. We have proven that the m-
Pspacings spanning different intervals I
iare negligible.
Now τ
iand η
iare independent of Y
iso the distribution of (19) will not change if we replace (19) by
(20) Y
(0)i, Y
(1)i, . . . , Y
(ηii).
It follows that φ(n, Z) has the same asymptotic distribution as P
ki=1
φ(η
i, Y
i). It is easy to see that the latter statistic is asymptotically identically distributed as (18) because 0 < N
i,n−η
i< 2m−1 and N
i,nP.1
→ ∞.
We will need some lemmas to find the asymptotic distribution of (18).
Lemma 8. Let X and X
n, n ≥ 1, be random variables such that X
n P.1→ 0 and √
n X
n→ X. Then
d(21) √
n |log (1 + X
n) − X
n→ 0.
PP r o o f. (21) follows easily from the fact that for |x| ≤ 1/2 we have
|log (1 + x) − x| ≤ x
2.
Lemma 9. Let X
i,n, i = 1, . . . , k, n ≥ 0, be the random variables defined as follows:
X
i,n= 1 σ
n(φ(n, Y
i) − d
i(log d
i+ e
n)), where e
nand σ
nare defined by (13) and (14). Then (22)
X
1,N1,n, . . . , X
k,Nk,n, N
1,n− np
1√ n , . . . , N
k,n− np
k√ n
→ (d
d 1X
1, . . . , d
kX
k, W
1, . . . , W
k), where the X
iare independent and normally N (0, 1) distributed random vari- ables, the vector (W
1, . . . , W
k) is independent of (X
1, . . . , X
k) and has the multivariate normal distribution N (0, Σ), where Σ = [σ
i,j] and
σ
i,j= −p
ip
jfor i 6= j, p
i− p
2ifor i = j, i, j = 1, . . . , k.
P r o o f. We will first show that
(23) (X
1,n, . . . , X
k,n) → (d
d 1X
1, . . . , d
kX
k).
We have σ
n−1(φ(n, Y
i/d
i) − e
n) → X
d i, where e
nand σ
nare given by (13) and (14) respectively. Transforming φ(n, Y
i/d
i) into φ(n, Y
i) we get X
i,n→
dd
iX
i. As X
i,nare independent we also have (23). Now we show that (24) N
1,n− np
1√ n , . . . , N
1,n− np
1√ n
d→ (W
1, . . . , W
k).
This follows from the central limit theorem because N
i,n= P
nj=1
1
Ii0(U
j), E(1
Ii0(U
1)) = p
iand Cov(1
Ii0(U
1), 1
Ij0(U
1)) = σ
i,j. Since the sequence N
i,nis independent of Y
i, (23) and (24) together give (22).
Lemma 10. We have
√
n log N
1,nnp
1, . . . , √
n log N
k,nnp
k d→ 1 p
1W
1, . . . , 1 p
kW
k.
P r o o f. Let α(x) = log (1 + x) − x. Then
√ n log N
i,nnp
i= N
i,n− np
i√ np
i+ √
n α N
i,n− np
inp
i. From Lemma 8 it follows that √
n α((N
i,n− np
i)(np
i)
−1) → 0. The rest of
Pthe proof results from Lemma 9.
Now we can compute the asymptotic distribution of the statistic G
nwhen the underlying distribution has density f given by (15).
Theorem 3. If the X
1, X
2, . . . , X
n, . . . are i.i.d. random variables with the density f given by (15) then
(25) G
n− e
f,nσ
f,n→ N (0, 1),
dwhere
e
f,n= E 1
f log 1 f
−
n+1
X
m+1
1 i , (26)
σ
2f,n= 1 n + 2
F (m + 1)E 1 f
2− 1
. (27)
An asymptotically equivalent form of e
f,nis (28) e
f,n= E 1
f log 1 f
− log n +
m
X
i=1
1 i − γ.
P r o o f. As proved in Lemma 7 we can replace G
nby P φ(N
i,n, Y
i). Set C
m= F (m + 1) − 1. From Lemmas 9 and 10 we get
1 σ
n kX
i=1
φ(N
i,n, Y
i) −
k
X
i=1
d
ilog d
inp
i−
mX
i=1
1 i − γ
→
d kX
i=1
d
i√ p
iX
i− 1
√ C
m kX
i=1
d
ip
iW
i, where W
iand X
iare defined in Lemma 9. It remains to compute the variance of the right side:
Var
kX
i=1
d
i√ p
iX
i− 1
√ C
m kX
i=1
d
ip
iW
i=
k
X
i=1
d
2ip
i+ 1 C
mk
X
i=1 k
X
j=1
d
id
jp
ip
jσ
ij=
k
X
i=1
d
2ip
i+ 1 C
m kX
i=1
d
2ip
i− 1
.
From this we get P
ki=1
φ(N
i,n, Y
i) − E(f
−1log f
−1) + log n − P
mi=1
i
−1+ γ p(n + 2)
−1[F (m + 1)E(f
−2) − 1]
→ N (0, 1).
dReplacing log n + γ in the expression above by P
n+1i=1
i
−1, which is asymp- totically equivalent as was shown in the proof of Theorem 2, we obtain (26).
References
N. C r e s s i e (1976), On the logarithms of high-order spacings, Biometrika 63, 343–355.
F. C z e k a l a (1993), Asymptotic distributions of statistics based on logarithms of spacings, Zastos. Mat. 21, 511–519.
G. E. D e l P i n o (1979) On the asymptotic distribution of k-spacings with applications to goodness of fit tests, Ann. Statist. 7, 1058–1065.
J. R. G e b e r t and B. K. K a l e (1969), Goodness of fit tests based on discriminatory information, Statist. Hefte 3, 192–200.
S. R. J a m m a l a m a d a k a and R. C. T i w a r i (1986), Efficiencies of some disjoint spacings tests relative to a χ2 test , in: M. L. Puri, J. Vilaplana and W. Wertz (eds.) New Perspectives in Theoretical and Applied Statistics, Wiley, New York, 311-318.
B. K. K a l e (1969), Unified derivation of tests of goodness of fit based on spacings, Sankhy¯a Ser. A 31, 43–48.
Franciszek Czeka la Mathematical Institute University of Wroc law Pl. Grunwaldzki 2/4 50-384 Wroc law, Poland
E-mail: czekala@math.uni.wroc.pl
Received on 24.1.1995