Ro b e r t W i e c z o r k g w s k i, Ry s z a r d Zi e l i n s k i
Warszawa
Some properties and applications of a distribution-free quantile estimator
(Praca wptynqia do Redakcji 30.10.1989)
Sum m ary. In the paper Zielinski (1988) a distribution-free median-unbiased quantile estimator was proposed. We study some properties, asymptotic properties among them, of that estimator, and we discuss its usefulness as a robust estimator of a quantile in maximally violated exponential distribution. 1
1. D istribu tion -free m edian-unbiased quantile estim ator. Let T be the family of all continuous cumulative distribution functions (cdf’s) such that if F G T , then there exists an interval (a ,6) ,- o o < a < b < oo, on which F is strictly increasing, and F(a) = 0, F(b) = 1. Given a sample X 2, . . . , X n, a statistic T = T(X\, X 2, . . . , X n) is said to be a distribution-free median-unbiased 7-quantile, if for all F £ T
Pf {T < x7(F)} < 1/2 <>Pf {T < x ^ F ) } where x^(F) is a quantile of order 7 of F.
Given an estimator T , let us define the function Ct : [0,1] —► R by the formula
(1)
CtW) = Pf {T < xp(F)}For all estimators T considered in this paper the function Ct does not depend on F.
and
C i - W > C T(P), if /3 > 7 Ct -(P) < CT(P) i if /3 < 7
for all distribution-free median-unbiased estimators T from a given class T of such estimators (it means, that T* is most concentrated estimator in the class T ). The estimator T* has the following form. Let
J *(«) = E ( " V h - ®)n - i . ° < 9 < 1 -j=k
Given 7 E (0,1) and n > — log2 / log(max{7, 1 — 7}), let k* be the positive integer such that
(2) 77fc.(7) > i > 7 7 t. +1(7) Let
A* = j - ^ + 1(7) ^•*(7) - ^ - +1(7)
Then T* = X j :n, where X j :n is the jth order statistic from A”i , X2, . . . , X n, and J is a random variable, independent of X i, X2,... rX n, and distributed
as
P{J = k*} = A* = 1 - P{J = k* -!- 1} .
The condition “ n > — log 2 / log(m ax{7, 1 —7} ) ” is intrinsic; given 7 E (0,1), T * exists iff the sample size n satisfies that inequality.
The values of k* and A* for some 7 < 1/2 and n are presented in Tab. 1. The last row of that Table contains minimal n = 71(7) guaranteing the existence of T*. For 7 > 1/2 we have £*(7) = n — k*(1 — 7), A*(7) = 1 — A*(l — 7), and 77(7) = 7i(l - 7).
The functions Ct*(P) for T*, if 7 = 1/2 and if 7 = 1/5, for some n, are sketched in Fig. 1.
2. Asymptotic properties of T*, Let T* denotes T* from the sample Afi, X2, . . . , X , and let
(3)
C M3) =
Pf {T: < xp(F)}Th e o r e m 1 (consistency). IfT* is an estimator of the 7-quantile, then for every (3 E (7 ,1 )
Cn{fi) —* 1, ' as n 00 ,
and for every (3 E (0,7)
P r o o f. By the Berry-Essen inequality [cf. Feller (1966), Vol. 2, Ch. 16.5] we have
9(7)
(4) n k(7, . — / ) - * ' k —1 - 717 < \fn
\f ^7(1 “ 7),
where $(t) = 1 — $ (/), is the cdf of the standard normal distribution N (0, l),flf(7)-= c[72 + (1 -7 )]/> /7 (1 “ 7), and c is the Berry-Essen constant.
By (2) and (4) _ / k• - n7 \ _ ^ < Ilk-+i(~l) < | < ^ - (7) \ >/ ^7(1 — 7) / v « 2 < « ( v ~ i ~ n7V g ^ \\/n7(l - 7 ) / v™ hence fc* = 717 + 6
where 6 is a number from [0,1].
The function Cn(/3) can be written as
C M
3) = A-/7fc.(/3) + (1 - A*)i7t. +1(/J)
and by the above estimates we obtain
A" _ ^ w (7 -/? ) + a - l ^ _ ^ f n(7 - / ? ) + £
VVH1 - (3)
y/n(3( 1 - /?)
(5) < A’ , ^ f ^ (7 -/^ ) + ^ ] _ £(7) \ \A/3(1 - (3) ) < c„(/* ) - f n ( 7 - /3 ) + 6- l ^ _ ^ < 7 - / 3 ) + ^ + 2^(7) \ A W - /3) y/n^T^P) I V™ _ , ^ / " ( 7 - f t ) + M , fill) \ “ P) )Consider the equation Cn(/3) = a and let us denote its solution by qn(a). The “quartile deviation ” rn = qn{3/4) - qn(l/4) is a measure of concen-tration of the distribution of T*. To estimate qn( 1/4), qn(3/4), and rn, we proceed as follows.
By (5) we can write Cn((3) in the asymptotic (as n —> oo) form
C n W ' * ( % ( T - 7 ) + o
i
y/n Taking into account that
- / ra( 7 - /?) + ^
= ¥ ( - P ).... + Q — / n( l - P ) + 0
V ^ ( i - /?) and substituting, in the equation Cn(fi) = &■> the right hand side by
J. yfn - / ? ) / we obtain g » ( l /3 4 * *) ~ 7 - z \Z7 ^ n (
6
) ,9 » ( 3 / 4 ) * 1 +
where z = # -1(3 /4) ~ 0.674490. Hence (7) ^ 'The accuracy of (6) is illustrated in Tab. 2; two numbers in the first row of each entry represent the exact values of <jfn(l/4 ) and </n(3/4), and the numbers in the second row represent those calculated by the formula 7 i yAy(l _ 7)/n . Observe that </„( 1/4) for a given 7 6 (0,1) is equal to 1 — 9n(3/4) for 1 - 7.
3. T* as a robu st estim ator in the sim plest exponential m o-del. Given 7 £ (0,1) and n > 1, let us consider the problem of estima-ting the 7-quantile of the exponential distribution with density fe(x) = 9~l exp(-x/0), x > 0, 0 > 0, 0 unknown.
S = s r = i X , and
_ -log(l - 7 )
7 M2n
and M2n is the median of the chi-square distribution with 2n degrees of fre-edom. (cf. Lehman (1986), Ch. 3.5). The quartile deviation of this estimator is given by the fprmula
Rn = y (M 2„(3 /4 )-M 2„(l/4))»
where M2n(a) is the a-quantile of the chi-square distribution with 2n degrees of freedom. Th eo rem 2. If n (8) Rn = -00, then 2zlog(l - 7) yjn
6
+
0
where z « 0.674490 (as in (6)).P r o o f. Let Xm be a random variable with chi-square distribution with m degrees of freedom. By the following asymptotic formula
p { x l- x } = * v fm t) + 0 ( i ) ’ we obtain:
M2n( 3/4) = 2n + 2zy/n + 0 (1 ) M2„ ( l /4 ) = 2n - 2zy/n + 0 (1 ) M2n( l /2) = 2n + 0(1)
and hence the theorem.
In Tab. 3, the exact values of Rn/0 (the upper number), and the values —2zlog(l — 7)!\/n (the lower number) are presented. It appears that the approximation (8) is quite satisfactory even for small values of n.
In applications, the exponential distribution is usually an approximation only. But if the random variable under consideration is not exactly expo-nentially distributed, then the estimator cyS is not more a median-unbiased one. If we still insist to have a median-unbiased estimator, we can use T* instead of c^S. One can however expect, the quartile deviation r\ ; of T* in the exponential model would be greater than Rn. A natural question arises: how large should be sample size N for T^ to get the accuracy ryN ' of the optimal estimator c^S from a sample of size n.
For the 7-quantile £0(7) of the exponential distribution with density fo(x) = 6~x exp(—x/0) we have
Hence the quartile deviation rffi of TJj in the exponential model is given by the formula
(9) .(E)N = 0 log 1 ~ 1 - QqN(n( 1/4) 3/4)
To answer the above question we should find, given 7 £ (0,1) and n > 1, the smallest N(n) such that for all N > N(n)
r(E) < u 1 yv — JXn or the smallest N(n) such that for all N > N(n)
(10) log 1 1 < y (M 2„(3/ 4) - M2b(1/4))
Let us recall, that given 7 £ (0,1), the estimator T* of the 7-quantile exists iff iV > - log 2 / log(max{#, 1 - g}). This makes the estimator T'N highly uneconomical in the exponential model, especially if 7 is close to 0 or close to 1. Minimal values N(n) for some 7 £ (0,1) and n > 1 are given in Tab. 4 (the upper number in each entry).
By the (6), (7), and (8), an approximation for N = N(n) is given by the inequality
exp ./w ^ ~ V
- 2z iog(l-7 )
y/n
A sommewhat less exact but simpler approximation gives us 7
1 - 7 *
The smallest integer N satisfying this inequality we can write as [nA\, where [aj denotes the smallest integer greater or equal to a, and A = 7/(1 — 7) log2(1 — 7). The values of N(n) calculated by this formula are given in the second row of Tab. 4, and the values of A are given in Tab. 5.
From Tab. 4 and Tab. 5 one can conclude that the price for robustness is in the considered case rather high, especially if one wants to estimate a 7-quantile with 7 close to zero or 7 close to one. On the other hand, to estimate a 7-quantile for 7 £ (0.5,0.94) by the robust T^, it is enough to double the sample size of the optimal estimator c7S\
y/N> y n
log(l - 7)
References
[1] W . F eller, An introduction to probability theory and its applications, Vol. 2, Wiley 1966.
[2] E. L. L ehm an, Testing statistical hypotheses, Wiley 1986.