• Nie Znaleziono Wyników

Asymptotically optimal estimation of smooth functionals for interval censoring, case 2

N/A
N/A
Protected

Academic year: 2021

Share "Asymptotically optimal estimation of smooth functionals for interval censoring, case 2"

Copied!
48
0
0

Pełen tekst

(1)

ASYMPTOTICALLY OPTIMAL ESTIMATION OF SMOOTH FUNCTIONALS FOR INTERVAL CENSORING, CASE 2

BY RONALDGESKUS ANDPIETGROENEBOOM

Municipal Health Service, Amsterdam and Delft University of Technology

For a version of the interval censoring model, case 2, in which the observation intervals are allowed to be arbitrarily small, we consider estimation of functionals that are differentiable along Hellinger differen-tiable paths. The asymptotic information lower bound for such functionals can be represented as the squared L -norm of the canonical gradient in2

the observation space. This canonical gradient has an implicit expression as a solution of an integral equation that does not belong to one of the standard types. We study an extended version of the integral equation that can also be used for discrete distribution functions like the

nonpara-Ž .

metric maximum likelihood estimator NPMLE , and derive the asymp-totic normality and efficiency of the NPMLE from properties of the solu-tions of the integral equasolu-tions.

1. Introduction. In the interval censoring problem, one wants to obtain information on some distribution F, often representing an event time distri-bution, based on a sample of random intervals J , . . . , J in which unobserv-1 n able X , . . . , X1 n; F are known to be contained. In case 1, we have a sample of observation times T and we know whether X is smaller or larger thani i the corresponding observation time T . More formally: we observei ŽT ,1 ⌬ , . . . , T , ⌬ , with ⌬ s 11. Ž n n. i  XiF T 4i. Case 2 is usually denoted as the

Ž .

situation with two observation times U , V and the information whether Xi i i is left of U , between U and V or right of V .i i i i

For case 1, also denoted by current status data, quite a lot is known. It is

Ž .

already shown in Ayer, Brunk, Ewing, Reid and Silverman 1955 and van

Ž .

Eeden 1956 that there exists a one-step procedure for calculation of the

ˆ

Ž .

nonparametric maximum likelihood estimator NPMLE F of the distribu-n tion function F, based on isotonic regression theory. The asymptotic

distribu-ˆ

Ž .

tion of F t , for fixed tn 0 0g ⺢, is derived in Part II, Chapter 5 of Groeneboom

Ž . 1r3

and Wellner 1992 ; n is the obtained convergence rate. The same chapter

ˆ

Ž . Ž .

discusses convergence properties of the NPMLE ␮ F of the mean ␮ F andn it is shown that, under some extra conditions,

F x

Ž

.

1y F x

Ž

.

ˆ

'

n

ž

␮ F y ␮ F

Ž

n

.

Ž

.

/

ª N 0,DD

ž

H

dx

/

as nª ⬁,

g x

Ž

.

Received March 1996; revised February 1999.

AMS 1991 subject classifications. 60F17, 62E20, 62G05, 62G20, 45A05.

Key words and phrases. Nonparametric maximum likelihood, empirical processes, asymptotic distributions, asymptotic efficiency, integral equations.

(2)

Ž .

with g x the density of the distribution of the observation times, and ªDD

Ž .

denoting convergence in distribution. Huang and Wellner 1995 prove a Ž .

similar result for functionals that are linear in F: K F s Hc dF. Then an Ž .

extra factor c⬘ x appears in the integral formula for the limit variance. Groeneboom’s proof for the mean uses the convergence rate of the supremum distance between the NPMLE and the underlying distribution function, which is replaced by an easier argument based on L -distance properties of2

ˆ

Ž . Ž .

F and smoothness of cn ⬘ x rg x in Huang and Wellner’s proof.

In case 2 one may expect better estimation results than in case 1, since one has more information on the location of the X ’s. However, both theoretical asi well as practical aspects of the problem are more complicated. Only iterative

ˆ

procedures are available for computation of the NPMLE Fn of F. The

iterative convex minorant algorithm, as introduced by Groeneboom in part II

Ž .

of Groeneboom and Wellner 1992 , converges quickly in computer experi-ments, and a slight modification of this algorithm is shown to converge in

Ž .

Jongbloed 1998 .

ˆ

Ž .

When considering the asymptotic distribution of the NPMLE F t , an 0

Ž .

distinction should be made. If the relative amount of mass of the U, V

-Ž .

distribution near the diagonal point t , t , compared to the amount of mass0 0

of F near t is very small we are more in a case 1-type situation and we still0

1r3 w Ž .x

ˆ

Ž .

have a n convergence rate Wellner 1995 . The limit distribution of F tn 0

Ž .

has been established in Groeneboom 1996 for the situation that U and V remain bounded away. If the observation time distribution has sufficient

Ž .1r3 w mass along the diagonal, the convergence rate increases to n log n see

Ž .x

Groeneboom and Wellner 1992 . There is a conjecture on the asymptotic

ˆ

Ž .

distribution of F tn 0 in this situation, but the proof is still incomplete.

Ž1.Ž .

Another ‘‘estimator’’ is Fn t , which is obtained by doing one step in the0

iterative convex minorant algorithm, with the true underlying distribution F as starting value. Of course, this procedure, which does not even lead to an estimator in the strict sense, has no practical value. However, it may be

Ž1.Ž .

relevant for theoretical purposes, for the asymptotic distribution of Fn t0 is known, and it is conjectured to have the same asymptotic distribution as the NPMLE.

Ž . Ž .

Contrary to F t , for case 1 as well as case 2, the mean0 ␮ F is a smooth

linear functional. This means that it is differentiable along Hellinger differ-entiable paths of distributions. For any functional having this property, one can derive a Hajek

´

᎐LeCam convolution theorem type information lower

'

bound, giving the best possible limit variance that can be attained under n

Ž .

convergence rate. See for example, van der Vaart 1991 , part I of

Groene-Ž . Ž .

boom and Wellner 1992 or Bickel, Klaassen, Ritov and Wellner 1993 for the general theory and the application to case 1. For case 1, an explicit expression of the information lower bound can be derived and is given by

c⬘ x

Ž

.

␾ x dx,

Ž

.

(3)

Ž .

with ␾ x being the function

F x

Ž

.

1y F x

Ž

.

␾ x s c⬘ x

Ž

.

Ž

.

.

g x

Ž

.

So this lower bound is attained by the NPMLE.

The function ␾ appearing in the information lower bound has an analogue in case 2. However, contrary to case 1, an explicit formula for␾ is unknown and unlikely to exist, except for some very special choices of the distributions

Ž .w Ž .x

of X and U, V see Geskus 1997 . Nevertheless it can be proved, without

ˆ

Ž .

knowing ␾ explicitly, that the NPMLE K F , estimating the smooth func-n Ž .

tional K F , shows asymptotically optimal behavior in case 2 as well, using properties of the integral equation for␾. Just as in the problem of estimation

Ž .

of F t , two situations can be distinguished, showing different behavior. In0

Ž .

Geskus and Groeneboom 1996, 1997 , together from now on denoted as GG,

Ž .

the simplest case is treated, with the U, V -distribution having no mass along the diagonal. Smoothness properties of ␾ are derived from the struc-ture of the integral equation and are sufficient to give the proof. The technical

Ž .

report Geskus and Groeneboom 1995 contains both papers. See also Geskus Ž1997 , which contains a rather extensive treatment of the application of the. general lower bound theory to case 1 and this simple version of case 2. In the present paper, we treat the situation where the observation times can be arbitrarily close. In the long run we will have observations for which we get very close to observing X itself, a situation that never occurs in case 1 and the case 2 treated in GG. Now the analysis is much more complicated, since we really have to deal with the singularity of the integrand of the integral equation on the diagonal.

Our general outline is as follows. In Section 2, after a specification of the problem, the relevant lower bound calculations are given. In Section 3 we show asymptotic efficiency of the NPMLE; here we benefit from some recent results on the Hellinger distance between the NPMLE and the underlying

Ž .

distribution function in van de Geer 1996 . Section 3 also contains our main

ˆ

Ž .

result. Theorem 3.2, showing that the NPMLE K Fn of a smooth functional

'

Ž .

K F0 of the underlying distribution function F at n -rate, is asymptotically0

normal and has an asymptotic variance that is equal to the information lower bound.

In Section 4 we present some simulation results showing that the variance of the NPMLE is already very close to the information lower bound for reasonable sample sizes. In this section we also show some pictures of the solution of the integral equation and its derivative for the case that the

w Ž .

distributions are uniform F uniform and the distribution of U, V uniform x

on the upper triangle of the unit square and the smooth functional corre-sponds to the mean. These graphs are compared to graphs of a corresponding

Ž

solution of the integral equation when F is replaced by the NPMLE and the .

equation is transformed into another equation in an inverse scale . Finally, in the Appendix, some technical results are proved that are needed in Section 3.

(4)

Of course, situations with more than two observation times for each unobservable event time may occur as well. This is usually denoted as the case k situation. However, only the two observation times immediately around the event time give relevant information, so it very much resembles case 2. We will only consider case 2. See also GG, where case k is briefly discussed.

Unless otherwise stated, all norms in this paper should be read as L -2 norms. The dominating measure varies, but is denoted by a subscript, or should be clear from the context. The L0-space denotes the subspace of

2

functions that integrate to zero.

2. The model. Lower bound considerations. We will start with a brief formulation of the problem and a list of assumptions that are needed in order to perform the lower bound calculations. Moreover, a key equation on which our analysis will be based is given. A more elaborate introduction, including a derivation of this equation, can be found in GG and Geskus Ž1997 ..

2.1. The model. General lower bound theory for smooth functionals. Let F be a distribution function. We are interested in estimation of some functional

Ž .

K F . However, instead of a sample X , . . . , X1 n; F, we are only able to

Ž . Ž .

observe the sample U , V ,1 1 ⌬ , ⌫ , . . . , U , V , ⌬ , ⌫ with ⌬ s 11 1 n n n n i  XFU 4 and

i i

⌫ s 1i Ui- X F V 4i i. The following model assumptions are made:

ŽM1 Let K. ) 0 and let S be a bounded interval ; ⺢. F is contained in the class

< <

F

FS[ F N support F ; S, F absolutely continuous, sup f x F K .

½

Ž

.

Ž

.

5

x

Here F is the distribution on which we want to obtain information; however, we do not observe Xi; F directly.

ŽM2 Instead, we observe the pairs U , V , with distribution function H.. Ž i i. Now H is contained in HH, the collection of all two-dimensional

distribu-Ž . 4

tions on u, v N u - v , absolutely continuous with respect to

two-di-Ž .

mensional Lebesgue measure and such that U , Vi i is independent of

Ž .

X . Let h denote the density of U , V , with marginal densities andi i i distribution functions h , H and h , H for U and V , respectively.1 1 2 2 i i ŽM3 If both H and H put zero mass on some set A, then F. 1 2 g FF has zeroS

mass on A as well, so F< H q H . This means that F does not have1 2 mass on sets in which no observations can occur.

Ž . w x

Typically, under M1 , F has support 0, M , and H has its mass on the

Ž . 4

triangle u, v N 0 F u - v F M . From now on we will restrict attention to

this typical situation. More general choices of the support and the situation with H1q H having larger support than F have been treated in Geskus2 Ž1997 , but are similar in essence. Without condition M3 , the functionals in. Ž . which we are interested are not well defined. Moreover, if F has probability

(5)

mass on a region in which no observations can occur, one cannot estimate the structure of F on this region consistently. In many survival studies, events do occur outside the domain of observation, making estimation of functionals like the mean an impossible task.

Ž .

The observable random vectors U , V ,i i ⌬ , ⌫ have densityi i

␥ 1y␦y␥

qF , H

Ž

u, v,␦ , ␥ s h u, v F u

.

Ž

.

Ž

.

Ž

F v

Ž .

y F u

Ž

.

. Ž

1y F v

Ž .

.

with respect to ␭ m ␯ , where ␯ denotes counting measure on the set2 2 2

Ž0, 1 , 1, 0 , 0, 0 . The letter ‘‘M’’ in conditions M1 to M3 stands for. Ž . Ž .4 Ž . Ž . ‘‘model.’’

With respect to the functional to be estimated we assume

ŽF1. K is differentiable along Hellinger differentiable paths of distributions from FFS.

The canonical gradient for this functional is denoted by␬ . What functionals

˜

F satisfy this pathwise differentiability requirement? An important class of functionals are the functionals that are linear in F,

K F

Ž

.

s c x dF x .

H

Ž

.

Ž

.

k Ž .

All moment functionals F¬ Hx dF x belong to this class. Estimation of the distribution function at a fixed point concerns a linear functional as well: for

Ž . Ž . Ž . Ž .

K F s F t we have c x s 10 w0, t x0 x . In Bickel, Klaassen, Ritov and

Well-Ž .

ner 1993 , Proposition A.5.2, it is shown that linear functionals on FFS with

2

sup E c XF

Ž

.

- ⬁

FgFFS

are pathwise differentiable at any Fg FF , with canonical gradientS ␬ x s c x y c x dF x .

Ž

.

Ž

.

Ž

.

Ž

.

˜

F

H

So if we were able to observe the X ’s directly, we would obtain the informa-i tion lower bound

5 52

c X

Ž

.

y E c XF

Ž

Ž

.

.

F.

For nonlinear functionals, there is no general method that immediately establishes pathwise differentiability and supplies the formula for the canoni-cal gradient. An example of a nonlinear functional to which our theory can be applied is

K F s F2 x w x dx.

Ž

.

H

Ž

.

Ž

.

If the function w is bounded, a slight extension of the proof in Bickel,

Ž . Ž .

Klaassen, Ritov and Wellner 1993 , as given in Geskus 1997 , shows that this functional has canonical gradient

M M M

␬ x s 2

Ž

.

F s w s ds

Ž .

Ž .

y 2 F s w s ds dF x .

Ž .

Ž .

Ž

.

˜

F

H

H

H

(6)

ˆ

Ž .

In order to show that the NPMLE K Fn asymptotically attains the lower bound, we have to make the following extra assumption:

5 52

F2 K G y K F s ␬ x d G y F x q OO G y F ,

Ž

.

Ž

.

Ž

.

H

˜

F

Ž

.

Ž

. Ž

.

Ž

.

w x

for all distribution functions G with support contained in 0, M , with ␭ Ž .

denoting Lebesgue measure on⺢. For linear functionals F2 holds without the OO-term. However, the functional

K F s F2 x w x dx

Ž

.

H

Ž

.

Ž

.

Ž . also satisfies F2 .

In the interval censoring model we do not observe X; F directly, and the

Ž . Ž .

estimated K F is only implicitly defined as a functional ⌰ QF , H on the class of probability measures on the observation space, with H acting as a nuisance parameter. What about differentiability and information lower bounds in this model? The score operator L , relating the censoring model to1

the unattainable model without censoring is, in our situation,

w

L a1

x

Ž

u, v,␦ , ␥ s E a X N U s u, V s v, ⌬ s ␦ , ⌫ s ␥

.

Ž

.

␦Hua dF 0 s F u

Ž

.

2.1

Ž

.

␥Hva dF 1y␦ y ␥ HMa dF

Ž

.

u v

w

x

q q a.e.- QF , H F v

Ž .

y F u

Ž

.

1y F v

Ž .

Ž . Ž .

This operator may be defined on L F , with range in L Q2 2 F , H . However,

0

Ž .

since it relates scores, our main interest lies in the domain L F . Then its2 0

Ž . 0Ž .

range is contained in L Q2 F , H . The adjoint of L on L Q1 2 F, H can be written

w U xŽ . w Ž . x as L b x1 s E b U, V, ⌬, ⌫ N X s x and we get M M U

w

L b1

x

Ž

x

.

s

H

H

b u, v, 1, 0 h u, v dv du

Ž

.

Ž

.

usx vsu x M q

H

H

b u, v, 0, 1 h u, v dv du

Ž

.

Ž

.

us0 vsx 2.2

Ž

.

x x

w x

q

H

H

b u, v, 0, 0 h u, v dv du

Ž

.

Ž

.

a.e.- F . us0 vsu Ž .

Now we have pathwise differentiability of ⌰ QF , H if and only if ␬ g R

˜

F R L

Ž

U1

.

˜

and if this holds, then the canonical gradient is the unique element ␪F , H in

0 Ž . R R L

Ž

1

.

; L Q2 F , H satisfying U

˜

2.3 L ␪ s␬ .

Ž

.

1 F , H

˜

F

wSee van der Vaart 1991 .Ž .x

Many functionals that are pathwise differentiable in the model without censoring lose this property in the interval censoring model. Due to the

(7)

smoothness of the adjoint operator, any functional K with a canonical gradient that is not a.e. equal to a continuous function cannot be obtained under LU1. So not all linear functionals remain pathwise differentiable. For

Ž . Ž . Ž . Ž .

example, K F s F t , with canonical gradient 10 w0, t x ⭈ y F t , is discontin-0

0

uous at t , and therefore does not belong to the range of L0 U1. This corresponds

'

Ž .

with F t0 not being estimable at n -rate. However, functionals with a

canonical gradient that is sufficiently smooth will be shown to remain differ-entiable under censoring. Hence for these functionals the information lower bound theory holds.

˜

˜

If the canonical gradients ␪ s ␪F, H and ␬ satisfy some extra conditions,

˜

F

˜

2

5 5

the information lower bound ␪ QF , H has an alternative formulation.

˜

Ž .

˜

THEOREM 2.1. Let ␪ be contained in RR L , say ␪ s La for some a g0 0

0Ž . Ž .

L F . Assume that the function x2 ¬␬ x is differentiable with bounded

˜

F

derivative. Then we have

˜

2 5 5␪ QF , Hs a ,² 0 ␬

˜

F:F M X s

H

␬ x ␾ x dx

˜

F

Ž

.

0

Ž

.

0 Ž . M Ž . Ž . with␾ x s H a t dF t .0 x 0 Ž .

For the proof, see Theorem 3.3 in Geskus and Groeneboom 1995 .

This theorem holds more generally. However, in the interval censoring model, both case 1 and case 2, we have the extra property that the function,

M 0

␾ x s

Ž

.

H

a t dF t

Ž .

Ž .

with ag L F as defined,2

Ž

.

x

also appears explicitly in the score operator L . Therefore it will play an1

important role. It will be called the integrated score function. From its

Ž . Ž .

definition we know that ␾ satisfies ␾ 0 s ␾ M s 0 and that ␾ is continu-ous for Fg FF .S

In Section 2.2 we will pay attention to the structure of the lower bound.

ˆ

Ž .

Section 3 will be devoted to showing that the NPMLE⌰ of ⌰ Qn F, H satisfies

2

ˆ

˜

'

n

Ž

⌰ y ⌰ Qn

Ž

F , H

.

.

ª N 0,DD

Ž

5 5␪ QF , H

.

.

2.2. Lower bounds for interval censoring case 2. We restrict ourselves to

˜

Ž .

˜

Ž .

the case ␪ g RR L1 . So the case ␪ g RR L

Ž

1

.

_ RR L1 will not be considered. Solvability of the equation

w

U

x

2.4 ␬ x s L L a x a.e.-F

Ž

.

˜

F

Ž

.

1 1

Ž

.

0

Ž .

in the variable ag L F will be investigated. The support of F may consist2

Ž .

of a finite number of disjoint intervals. However, 2.4 is not defined on intervals where F does not put mass, and these intervals do not play any further role. So without loss of generality, we may assume the support of F to

(8)

w x

consist of one interval 0, M . By the structure of the score operator L this1

Ž .

can be reformulated as an equation in ␾. If we suppose 2.4 to hold for all

w x

xg 0, M , taking derivatives on both sides yields the following integral

equation: x ␾ x y ␾ t

Ž

.

Ž .

2.5 ␾ x q d x h t , x dt

Ž

.

Ž

.

F

Ž

.

H

Ž

.

F x

Ž

.

y F t

Ž .

ts0 ␾ t y ␾ x

Ž .

Ž

.

M y

H

h x , t dt

Ž

.

s k x d x ,

Ž

.

F

Ž

.

F t

Ž .

y F x

Ž

.

tsx Ž .

with dF x being the function

F x

Ž

.

1y F x

Ž

.

dF

Ž

x

.

s ,

h1

Ž

x

.

1y F x q h x F x

Ž

.

2

Ž

.

Ž

.

Ž . XŽ .

writing k x instead of ␬ x . Although k may depend on F, we do not

˜

F explicitly express this dependence. This is done, since in proving asymptotic

Ž .

efficiency of the NPMLE we have to consider 2.5 for convex combinations

ˆ

Ž . Ž .

Fs 1 y␣ F q ␣F , where F g FF the unknown distribution is continu-0 n 0 S

ˆ

Ž .

ous and Fn the NPMLE of F0 is purely discrete. Solvability and structure of

Ž .

the solution to 2.5 will be investigated for such combinations, with k still

Ž X .

determined by the underlying distribution F0 so ks␬ . Apart from the

˜

F

0

Ž . Ž .

model conditions M1 to M3 , some extra conditions will have to be intro-duced in order to make the proofs in this section possible. For the distribu-tions we assume

ŽD1 h x. 1Ž .q h x ) 0 for all x g 0, M .2Ž . w x

ŽD2 The function h u, v. Ž . is continuous. The partial derivatives ⌬ x s1tŽ . Ž⭸r⭸ x h x, t and ⌬ x s ⭸r⭸ x h t, x exist, except for at most a. Ž . 2tŽ . Ž . Ž . finite number of points x, where left and right derivatives with respect to x do exist for each t. The derivatives are bounded, uniformly in t and

x.

ŽD3 F is a nondegenerate distribution function with at most finitely many.

Ž .  4

points of jump xig 0, M . Let D s x s 0, x , . . . , x , x0 1 m mq1s M de-note the ordered set of jump points of F, augmented with the endpoints

w x

of the interval 0, M . We assume that F is differentiable between jumps, except for at most a finite number of points, where left and right derivatives exist. Everywhere outside D, the derivative is bounded and

Ž Ž .

greater than or equal to c for some c) 0 so we assume f x G c for0

w x.

all xg 0, M . The set of points of jump may be empty. Note that if F has jumps, we assume that F has derivative greater than or equal to c

Ž . Ž . Ž . Ž

also on the nonempty intervals 0, x1 and x , Mm where we allow .

x1s x , though .m

For the functional we should have

ŽF3 k is differentiable, except for at most a finite number of points x, where. left and right derivatives exist. The derivative is bounded, uniformly in x.

(9)

Ž . Ž .

Note that the letter ‘‘D’’ in conditions D1 to D3 stands for ‘‘distribution’’

Ž . Ž .

and the letter ‘‘F’’ in F1 to F3 is for ‘‘functional.’’

Ž . Ž .

Of course, D2 implies continuity of h and h : D1 is the equivalent of1 2

g) 0 in case 1 and is needed. It implies that d is bounded. In case 1, theF

function ␾ has an explicit representation of the form

w

x

F 1y F

␾ x s k

Ž

.

,

g

Ž .

whereas in case 2,␾ can only be expressed implicitly as a solution to 2.5 . If Ž2.5 is solvable, its solution. ␾ can be shown to contain a factor F 1 y F , justŽ . as in case 1. The structure of dF already suggests this factor to be present. Validity of the factorization is shown by inserting

␾ s F 1 y F ␰

Ž

.

Ž .

in 2.5 . Some reordering yields an integral equation in ␰ , which will be shown to be solvable. This ␰-equation has the following form:

x ␰ x y ␰ t

Ž

.

Ž .

2.6 ␰ x q c x h* t , x dt

Ž

.

Ž

.

F

Ž

.

H

Ž

.

F x

Ž

.

y F t

Ž .

ts0 ␰ t y ␰ x

Ž .

Ž

.

M y

H

h⬚ x, t dt s k x c x ,

Ž

.

Ž

.

F

Ž

.

F t

Ž .

y F x

Ž

.

tsx Ž . with cF x given by x M y1 cF

Ž

x

.

s

H

1y F t h t, x dt q

Ž .

Ž

.

H

F t h x , t dt

Ž .

Ž

.

ts0 tsx 2.7

Ž

.

s h x E 1 y F U N V s x q h x E F V N U s x2

Ž

.



Ž

.

4

1

Ž

.



Ž

.

4

and h* t , x

Ž

.

s F t 1 y F t h t, x

Ž .

Ž .

Ž

.

if tF x, h⬚ x, t s F t 1 y F t h x, t

Ž

.

Ž .

Ž .

Ž

.

if xF t. 2.8

Ž

.

This equation is similar in structure to the ␾-equation. So the lemmas and

Ž .

theorems in the remainder of this section apply to both the ␾-equation 2.5

Ž .

and the ␰-equation 2.6 . Most of the proofs will only be given for the ␾-equation.

Unlike the situation treated in GG, we now assume that the observation density does have mass along the diagonal. This has the consequence that the integral equation may no longer be a Fredholm integral equation. However, we first consider a ‘‘desingularized’’ integral equation, to which the theory on

w Ž .x

Fredholm integral equations of the second kind can be applied Kress 1989 . If F has jumps, the solution of the integral equation will in general also have jumps. However, the key observation in analyzing the integral equation and in proving the efficiency of the NPMLE is that, even when F has discontinuities, we can make a change of scale in such a way that the solution of the integral equation can be extended to a Lipschitz function in the transformed scale.

(10)

Ž . y1Ž . w x

We first introduce some notation. Let G t s F t , tg 0, 1 , with a

derivative g which exists except for at most a finite number of points, where, Ž . Ž Ž .. however, G has left and right derivatives. Furthermore, let k t s k G t ,

Ž . Ž Ž . Ž .. Ž . Ž Ž . Ž ..

H t, u s H G t , G u and likewise h t, u s h G t , G u , and let d beF

defined by t 1

Ž

y t

.

2.9 d t s ,

Ž

.

F

Ž .

1y t h t q th t

Ž

.

1

Ž .

2

Ž .

where his h (G, i s 1, 2. Note that, if F has jumps, d / d (G. Also notei F F that k, d and h are continuous. In a similar way, we define

t 1 y1 cF

Ž .

t s

H

Ž

1y s h s, t dG s q

.

Ž

.

Ž .

H

sh t , s dG s

Ž

.

Ž .

0 t and h* t , u

Ž

.

s t 1 y t h t, u

Ž

.

Ž

.

if tF u, 2.10

Ž

.

h⬚ u, t s t 1 y t h u, t , if u F t.

Ž

.

Ž

.

Ž

.

We have the following lemma. Ž .

LEMMA 2.1. i The integral equation

␾ t y ␾ t⬘

Ž .

Ž

.

t ␾ t s d t k t y

Ž .

F

Ž .

½

Ž .

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ k

Ž

.

0 2.11

Ž

.

␾ u y ␾ t

Ž

.

Ž .

1 q

H

h t , u dG u

Ž

.

Ž

.

5

uy t k

Ž

.

t

has a unique continuous solution␾ , satisfying

2.12 inf d x k x F␾ t F sup d x k x ,

Ž

.

F

Ž

.

Ž

.

Ž .

F

Ž

.

Ž

.

w x

xg 0, M xg 0, Mw x

w x

for all tg 0, 1 and ␧ ) 0.

Ž . Ž . Ž .

For points t in the range of F, say ts F x , we have␾ t s ␾ x .

Ž .ii The integral equation

␰ t y ␰ t⬘

Ž .

Ž

.

t ␰ t s c t k t y

Ž .

F

Ž .

½

Ž .

H

h* t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ k

Ž

.

0 2.13

Ž

.

␰ u y ␰ t

Ž

.

Ž .

1 q

H

h* t , u dG u

Ž

.

Ž

.

5

uy t k

Ž

.

t

has a unique continuous solution ␰ , satisfying

2.14 inf c x k x F␰ t F sup c x k x ,

Ž

.

F

Ž

.

Ž

.

Ž .

F

Ž

.

Ž

.

w x

xg 0, M xg 0, Mw x

w x

for all tg 0, 1 and ␧ ) 0.

Ž . Ž . Ž .

(11)

Ž .

PROOF. i By the Fredholm theory, as used, for example, in Geskus and

wŽ . x Ž .

Groeneboom 1996 , Theorem 5, page 82 , the ␾ -equation 2.11 can be shown to have a unique continuous solution, for each ␧ ) 0. Note that the

Ž . Ž . Ž .

integration in 2.11 is only with respect to dG t⬘ and dG u and therefore only involves values belonging to the range of F. So for points t in the range of F we have

␾ t s ␾ G t .

Ž .

Ž

Ž .

.

w x w x

Let ms arg min ␾ and s s arg max ␾ . We have

␾ s F d s k s F sup d x k x ,

Ž .

F

Ž .

Ž .

F

Ž

.

Ž

.

w x

xg 0, M

Ž . Ž . w x

since␾ s y ␾ t G 0, t g 0, 1 , and similarly

␾ m G d m k m G

Ž

.

F

Ž

.

Ž

.

inf dF

Ž

x k x ,

.

Ž

.

w x

xg 0, M

Ž . Ž . w x Ž .

since␾ m y ␾ t F 0, t g 0, 1 . Hence we have 2.12 .

Ž .ii The argument is completely similar to the argument given for i .Ž . I The following lemma is the crux of the proof of the existence of the solution to the original integral equation.

w x

LEMMA 2.2. The functions␾ are Lipschitz on 0, 1 , uniformly in ␧ ) 0.

PROOF. We will use similar notation to that in Lemma 2.1. Let x , . . . , x1 m

be the points of jump of F and let x0s 0, xmq1s M. Furthermore, let

Ž . w x

␶ s F x , i s 0, . . . , m q 1. For i s 0, . . . , m, the interval ␶ , ␶i i i iq1 can be divided into two parts.

Ž .i The interval w␶ , ␶ , where ␶ s F xi iX. iX Ž iq1y . The interval. w␶ , ␶ corre-i iX.

w .

sponds to the interval x , xi iq1 in the original scale. The function G is Ž X.

strictly increasing and differentiable on the interval ␶ , ␶ , and is right andi i left differentiable at␶ and ␶i iX, respectively.

Ž .ii The interval w␶ , ␶Xi iq1x. This interval corresponds to the jump of F at

xiq1. Here the function G is constant, again having right and left derivatives at the respective endpoints.

If is m, the second interval only consists of the point 1. Let



4



X X

4

D⬘ s ␶ , . . . , ␶0 mq1 j ␶ , . . . , ␶0 m X j discontinuity points of k⬘ t , d t ,

½

Ž .

F

Ž .

⭸ ⭸ 1 2

⌬ t st

Ž .

h t , t

Ž

⬘ for t F t⬘, and ⌬ t s

.

u

Ž .

h u, t for t

Ž

.

G u .

5

(12)

Ž .

Then ␾ t is differentiable for t f D⬘, and has left and right derivatives for

tg D⬘. Using ␾ t y ␾ t⬘

Ž .

Ž

.

␾ u y ␾ t

Ž

.

Ž .

t 1 k t

Ž .

y

H

h t

Ž

⬘, t dG t⬘ q

.

Ž

.

H

h t , u dG u

Ž

.

Ž

.

ty t⬘ kuy t k

Ž

.

Ž

.

0 t ␾ t

Ž .

s s␰ t

Ž . Ž

1y t h t q th t ,

.

1

Ž .

2

Ž .

dF

Ž .

t

and using left or right derivatives when tg D⬘, we have

X X ␾ t s d t ␰ t

Ž .

F

Ž .

Ž . Ž

1y t h t q th t

.

1

Ž .

2

Ž .

␾ t y ␾ t⬘ ⭸

Ž .

Ž

.

t ␧ ␧ q d t k⬘ t yF

Ž .

½

Ž .

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ k␧ ⭸ t

Ž

.

0 ␾ u y ␾ t ⭸

Ž

.

Ž .

1 q

H

h t , u dG u

Ž

.

Ž

.

5

uy t k␧ ⭸ t

Ž

.

t X ␾ t

Ž .

␾ t y ␾ t⬘

Ž .

Ž

.

y d tF

Ž .

½

H

½

ty t⬘ y 2

5

dH t

Ž

⬘, t

.

t⬘: tyt⬘)␧

Ž

ty t⬘

.

2.15

Ž

.

X ␾ t

Ž .

␾ u y ␾ t

Ž

.

Ž .

q

H

½

y 2

5

dH t , u

Ž

.

5

uy t u, uyt)␧

Ž

uy t

.

t tq␧ X y1 y d tF

Ž .

␾ t ␧

Ž .

½

H

h t

Ž

⬘, t g t⬘ dt⬘ q

.

Ž

.

H

h t , u g u du .

Ž

.

Ž

.

5

ty␧ t Ž . Ž . Ž . Ž .

Note that ⭸r⭸ t H t, u s h t, u g t and similarly for the other partial

X

Ž .

derivative of H. Moving the terms containing␾ to the left-hand side of 2.15

XŽ . X

shows that ␾ t has a finite upper bound, using Lemma 2.1. Moreover, ␾ is piecewise continuous on the closed intervals from one point in D⬘ to the

X

subsequent one. So␾ attains a maximum value, which may be a right or left derivative. The rest of the proof is devoted to showing that this maximum value is uniform in ␧.

XŽ . X

Let Ms suptgw0, 1x␾ t and suppose that ␾ attains its supremum at a␧ ␧

Ž . Ž .

point s. Note that MG 0, since ␾ 0 s ␾ 1 s 0 and ␾ is continuous. Then, if 0- t - s y␧, X s X X ␾ s

Ž .

␾ s y ␾ t

Ž .

Ž .

Ht



␾ s y ␾ u du

Ž .

Ž

.

4

y 2 G 2 G 0. sy t

Ž

sy t

.

Ž

sy t

.

Likewise, if 1) t ) s q␧, we get X ␾ s

Ž .

␾ t y ␾ s

Ž .

Ž .

y 2 G 0. ty s

Ž

ty s

.

(13)

Ž . So these parts work in the opposite direction, and are harmless in 2.15 . Now

Ž . let K t be defined by X K

Ž .

t s d t k⬘ t q d tF

Ž .

Ž .

F

Ž .

␰ t

Ž . Ž

1y t h t q th t

.

1

Ž .

2

Ž .

␾ t y ␾ t⬘ ⭸

Ž .

Ž

.

t y d tF

Ž .

½

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ k␧ ⭸ t

Ž

.

0 ␾ u y ␾ t ⭸

Ž

.

Ž .

1 y

H

h t , u dG u

Ž

.

Ž

.

5

uy t k␧ ⭸ t

Ž

.

t Ž .

and let C t be defined by

t y1 C

Ž .

t s 1 q d tF

Ž .

½

H

h t

Ž

⬘, t g t⬘ dt⬘

.

Ž

.

ty␧ 2.16

Ž

.

tq␧

w

x

q

H

h t , u g u du ,

Ž

.

Ž

.

5

tg 0, 1 . t Then we have X 2.17 ␾ s C s F K s ,

Ž

.

Ž .

Ž .

Ž .

implying 2.18 M F sup K t rC t .

Ž

.

Ž .

Ž .

w x tg 0, 1 XŽ .

In a similar way, if ms inftgw0, 1x␾ t , we get

2.19 m G inf K t rC t .

Ž

.

Ž .

Ž .

w x

tg 0, 1

Let the function A be defined by

⭸ ⭸ t tq␦ A

Ž .

t s d tF

Ž .

½

H

h t

Ž

⬘, t dG t q

.

Ž .

H

h t , u

Ž

.

dG u

Ž

.

5

, ⭸ t ⭸ t ty␦ t

w

x

tg 0, 1 . w x

Fix ␦ ) 0 such that, for all t g 0, 1 ,

1

2.20 A t rC t F .

Ž

.

Ž .

Ž .

2

Note that ␦ ) 0 can be chosen independently of ␧ ) 0, since lim C

Ž .

t s 1 q 2 d t h t, t g t ,F

Ž .

Ž

.

Ž .

tg 0, 1 .

Ž

.

␧ x0

Ž . w x

Then we get from 2.20 , for each tg 0, 1 , by applying the mean value

 Ž . Ž .4 Ž .  Ž . Ž .4 Ž .

theorem on the ratios ␾ t y ␾ t⬘ r t y t⬘ and ␾ u y ␾ t r u y t , ␾ t y ␾ t⬘ ⭸

Ž .

Ž

.

t dF

Ž .

t

½

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ k␧ ⭸ t

Ž

.

ty␦ ␾ u y ␾ t ⭸

Ž

.

Ž .

tq␦ ␧ ␧ q

H

uy t k␧ ⭸ th t , u

Ž

.

dG u

Ž

.

5

C

Ž .

t

Ž

.

t 1 < < < < F A t max M , m rC t F max M , m .

Ž .



4

Ž .

2



4

(14)

Ž . Defining B t by X < < < < < < B

Ž .

t s d t k⬘ t q d tF

Ž .

Ž .

F

Ž .

Ž

1y t h t q th t

.

1

Ž .

2

Ž .

sup



cF

Ž

t⬘ k t⬘

.

Ž

.

4

w x t⬘g 0, 1 2 dF

Ž .

t < < q sup



dF

Ž

t⬘ k t⬘

.

Ž

.

4

w x t⬘g 0, 1 ⭸ ⭸ =

½

sup h t

Ž

⬘, t q sup

.

h t , u

Ž

.

5

, ⭸ t ⭸ t w x w x t⬘g 0, t ug t , 1 w x we get, for tg 0, 1 , X < < < < dF

Ž .

t k⬘ t q d t

Ž .

F

Ž .

␰ t

Ž . Ž

1y t h t q th t

.

1

Ž .

2

Ž .

<␾ t q ␾ t⬘

Ž .

< <

Ž

.

< ⭸ ty␦ q d tF

Ž .

½

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

ty t⬘ ⭸ t 0 <␾ t q ␾ u

Ž .

< <

Ž

.

< ⭸ 1 q

H

h t , u

Ž

.

dG u

Ž

.

5

uy t ⭸ t tq␦ X < < < < F d t k⬘ t q d tF

Ž .

Ž .

F

Ž .

␰ t

Ž .

Ž

1y t h t q th t

.

1

Ž .

2

Ž .

2.21

Ž

.

2 dF

Ž .

t < < q sup ␾ t⬘

Ž

.

w x t⬘g 0, 1 ⭸ ⭸ ty␦ 1 =

½

H

h t

Ž

⬘, t dG t⬘ q

.

Ž

.

H

h t , u

Ž

.

dG u

Ž

.

5

⭸ t ⭸ t 0 tq␦ F B t F c,

Ž .

w x

for some constant c, independent of ␧ and t. Hence, for each t g 0, 1 ,

X 1

<␾ t F A t rC t q B t rC t F max M , m q B t rC t ,

Ž .

<

Ž .

Ž .

Ž .

Ž .

2



< <

4

Ž .

Ž .

implying

1 < <

2.22 max M , m



4

F sup B t rC t F sup crC t F c⬘,

Ž

.

2

Ž .

Ž .

Ž .

w x w x

tg 0, 1 tg 0, 1

for some constant c⬘ independent of ␧.

XŽ . w x

Hence␾ t is bounded on 0, 1 , uniformly in ␧ and t, implying that ␾ is Lipschitz, uniformly in ␧ ) 0. I

We now have the following theorem.

Ž . y1Ž . w x

THEOREM 2.2. Let G t s F t , tg 0, 1 , with a derivative g which exists except for at most a finite number of points, where G has left and

Ž . Ž Ž .. Ž . Ž Ž . Ž ..

right derivatives. Furthermore, let k t s k G t , H t, u s H G t , G u ,

Ž . Ž Ž . Ž ..

h t, u s h G t , G u , and let d be defined byF t 1

Ž

y t

.

2.23 d t s ,

Ž

.

F

Ž .

1y t h t q th t

(15)

where his h (G, i s 1, 2. Theni Ž .i The integral equation

␾ t y ␾ t⬘

Ž .

Ž

.

t ␾ t s d t k t y

Ž .

F

Ž .

½

Ž .

H

dH t

Ž

⬘, t

.

ty t⬘ 0 2.24

Ž

.

␾ u y ␾ t

Ž

.

Ž .

1

w

x

q

H

dH t , u

Ž

.

5

, tg 0, 1 , uy t t w x

has a unique solution which is Lipschitz on 0, 1 .

Ž .ii The Lipschitz norm in i has the following upper bound. Let C t beŽ . Ž .

defined by

2.25 C t s 1 q 2 d t g t h t, t .

Ž

.

Ž .

F

Ž .

Ž .

Ž

.

Ž . Ž .

Moreover, let A t and B t be defined by

t A

Ž .

t s d tF

Ž .

½

H

h t

Ž

⬘, t dG t⬘

.

Ž

.

⭸ t ty␦ 2.26

Ž

.

tq␦ q

H

h t , u

Ž

.

dG u

Ž

.

5

, ⭸ t t and < < B

Ž .

t s d t k⬘ tF

Ž .

Ž .

X < < < < q d tF

Ž . Ž

1y t h t q th t

.

1

Ž .

2

Ž .

sup



cF

Ž

t⬘ k t⬘

.

Ž

.

4

w x t⬘g 0, 1 2 dF

Ž .

t < < q sup



dF

Ž

t⬘ k t⬘

.

Ž

.

4

t⬘g 0, 1w x 2.27

Ž

.

⭸ ⭸ =

½

sup h t

Ž

⬘, t q sup

.

h t , u

Ž

.

5

⭸ t ⭸ t w x w x t⬘g 0, 1 ug t , 1 At the points in

D⬘ s discontinuity points of g t , augmented with 0 and 1



Ž .

4

X

j discontinuity points of k⬘ t , d t ,

½

Ž .

F

Ž .

⭸ ⭸

1 2

⌬ t st

Ž .

⭸ th t , t

Ž

⬘ for t F t⬘, and ⌬ t s

.

u

Ž .

⭸ th u, t

Ž

.

for tG u ,

5

A and B have two versions, one corresponding to taking left derivatives and one corresponding to taking right derivatives.

Then there exists a␦ ) 0 such that

sup A

Ž .

t rC t F 1r2

Ž .

w x

(16)

and we have < < 2.28 ␾ u y ␾ t F c u y t , 0F t - u F 1,

Ž

.

Ž

.

Ž .

Ž

.

where c is given by 2.29 cs 2 sup B t rC t .

Ž

.

Ž .

Ž .

w x tg 0, 1

Žiii The integral equation 2.5 has a unique solution. Ž . ␾.

Ž .  4 Ž

PROOF. i By the preceding two lemmas, the set ␾ : ␧ F ␧ 0 for some

.

␧ ) 0 is bounded and equicontinuous. Hence, by the Arzela᎐Ascoli theorem,0

`

Ž .

each sequence ␾ , ␧ x0, has a subsequence ␾ , converging in the supre-n nm

w x

mum metric to a continuous function ␾ on 0, 1 . By Lebesgue’s dominated

Ž .

convergence theorem we get, for such a subsequence ␾ ,

m ␾ x s lim ␾

Ž

.

m

Ž

x

.

mª⬁ x␾ x y ␾ t

Ž

.

Ž .

s d x k x yF

Ž

.

½

Ž

.

H

h t , x dG t

Ž

.

Ž .

xy t 0 2.30

Ž

.

␾ t y ␾ x

Ž .

Ž

.

1 q

H

h x , t dG t

Ž

.

Ž .

5

. ty x x

Uniqueness of the solution follows in the same way as in Lemma 2.1. Ž .ii It was shown in 2.22 in the proof of Lemma 2.2 thatŽ .

< X <

sup ␾ t F 2 sup B t rC t ,

Ž .

Ž .

Ž .

w x w x

tg 0, 1 tg 0, 1

Ž .

where C is defined by 2.16 . But since

lim C

Ž .

t s 1 q 2 d t h t, t g t ,F

Ž .

Ž

.

Ž .

␧ x0

w x Ž .

for tg 0, 1 , 2.28 now follows.

Žiii We define. ␾ by ␾ x s ␾ F x . If t s F x , we get, by a change ofŽ . Ž Ž .. Ž . variables, ␾ x s ␾ t

Ž

.

Ž .

␾ t y ␾ t⬘

Ž .

Ž

.

t s d t k t yF

Ž .

½

Ž .

H

dH t

Ž

⬘, t

.

ty t⬘ 0 ␾ u y ␾ t

Ž

.

Ž .

1 q

H

dH t , u

Ž

.

5

uy t t x␾ x y ␾ x⬘

Ž

.

Ž

.

s d x k x yF

Ž

.

½

Ž

.

H

dH x

Ž

⬘, x

.

F x

Ž

.

y F x⬘

Ž

.

0 ␾ y y ␾ x

Ž

.

Ž

.

M q

H

dH x , y

Ž

.

5

, F y

Ž

.

y F x

Ž

.

x

(17)

and hence␾ satisfies the original integral equation. Uniqueness of ␾ follows Ž

from uniqueness of ␾ since a solution ␾ conversely defines a solution ␾ on .

the inverse scale .I

REMARK. The same arguments can be applied to prove existence of a

solution to the ␰-equation. Hence ␾ can be written as ␾ s F 1 y F ␰ .

Ž

.

Solvability of ␬ s L

˜

F U1L a can now immediately be seen.1

COROLLARY2.1. The equation ␬ s L

˜

F U1L a is solvable.1

PROOF. By the Lipschitz property of ␾ we have, for any 0 F x - y F M,

<␾ y y ␾ x

Ž

.

Ž

.

< <␾ F y y ␾ F x

Ž

Ž

.

.

Ž

Ž

.

.

<

s F K ,

F y

Ž

.

y F x

Ž

.

F y

Ž

.

y F x

Ž

.

w x for some constant K. Thus the Radon᎐Nikodym derivative d␾rdF is a.e.- F bounded by K. I

For the canonical gradient we get, if t- u,

˜

w

x

␪ t, u, ␦ , ␥ s L a t, u, ␦ , ␥F

Ž

.

1

Ž

.

␾ t

Ž .

␾ u y ␾ t

Ž

.

Ž .

s y␦ y␥ F t

Ž .

F u

Ž

.

y F t

Ž .

2.31

Ž

.

␾ u

Ž

.

q 1 y

Ž

␦ y ␥

.

. 1y F u

Ž

.

3. Asymptotic efficiency of the NPMLE. In this section we will denote the unknown distribution function of the unobservable random variables Xi

ˆ

by F . As in Section 2, we will assume that F is continuous. Let F be the0 0 n

Ž .

NPMLE of F , based on the sample of observations0 U , V ,1 1 ⌬ , ⌫ , . . . ,1 1

ŽU , V ,n n ⌬ , ⌫ . It is obtained by maximizing the likelihoodn n.

n ⌫ 1y⌬ y⌫ ⌬i i i i 3.1 F U F V y F U 1y F V h U , V

Ž

.

Ł

Ž

i

.

Ž

Ž

i

.

Ž

i

.

. Ž

Ž

i

.

.

Ž

i i

.

is1 Ž .

over the class of piecewise constant right-continuous sub- distribution

func-w x

tions on 0, M , having jumps only at a subset of the points U and V ,i i

is 1, . . . , n. The properties of the function thus obtained are discussed in

Ž .

Groeneboom and Wellner 1992 and GG.

A rather important property of the NPMLE is that it does not depend on

Ž .

Ł h U , V , so we do not have to perform any preliminary density estimationi i or bandwidth choice. The fact that we do not have to solve a bandwidth problem is one of the great advantages of the nonparametric maximum likelihood approach in the present problem.

ˆ

By the restriction that F only has mass at the observation times, alson

Ž .

(18)

func-tion. Let x0s 0, xmq1s M and let x - ⭈⭈⭈ - x be the points of jump of F1 m

ˆ

Ž .

in the interval 0, M . Then F satisfies the following properties.n

PROPOSITION 3.1. Any function ␴ that is constant on the same intervals

ˆ

w . Jis xiy1, xi as F satisfiesn ␦ ␥ ␴ t

Ž .

y dQ

Ž

t , u,␦ , ␥

.

H

½

ˆ

ˆ

ˆ

5

n F t F u y F t tgJi n

Ž .

n

Ž

.

n

Ž .

␥ 1y␦ y ␥ q

H

␴ u

Ž

.

½

ˆ

ˆ

y

ˆ

5

dQn

Ž

t , u,␦ , ␥ s 0

.

F u y F t 1y F u ugJi n

Ž

.

n

Ž .

n

Ž

.

for is 2, . . . , m. wŽ . x

PROOF. See Groeneboom and Wellner 1992 , part II, Proposition 1.3 and

wŽ . x

Geskus and Groeneboom 1997 , Corollary 1, page 207 . PROPOSITION3.2.

ˆ

5 5 Prob lim F

½

ny F0 s 0 s 1.

5

nª⬁ wŽ .

For the proof, see Groeneboom and Wellner 1992 , part II, Sections 4.1 Žcase 1 and 4.3 case 2 .. Ž .x

PROPOSITION3.3. 1r6 y1r3

ˆ

5Fny F05His OO np

Ž

Ž

log n

.

.

as nª ⬁, for i s 1, 2. wŽ . x

PROOF. See Geskus and Groeneboom 1997 , Corollary 2, page 209 and

wŽ . x

van de Geer 1996 , Example 3.2 .

The following result is needed in the proof of Lemma 3.1. PROPOSITION3.4.

ˆ

lim Pr F is defective



n

4

s 0.

nª⬁

wŽ . x

See Geskus and Groeneboom 1997 , Proposition 1, page 206 . Although the conditions on H are different there, the proof is the same, since the difference in conditions has no bearing on this particular property.

Ž . Ž .

In addition to the smoothness conditions D1 to D3 , given in Section 2, we assume

D4 h t , t s lim h t, u G c ) 0,

Ž

.

Ž

.

Ž

.

ux t

Ž .

for all tg 0, M and some c ) 0.

˜

As in GG, our definition of the canonical gradient ␪ will be extended to piecewise constant distribution functions F with finitely many discontinu-ities, based on the solution ␾ of a discrete version of the integral equationF

(19)

Ž2.5 . In order to stress dependence on F, we will write. Ž ␾ instead of ␾.F .

Ž . Ž .

However, since F v y F u no longer remains bounded away from zero on the region where H puts mass, we have to use an approach different from the one in GG. On one hand, the quotient,

␾ v y ␾ uF

Ž .

F

Ž

.

,

F v

Ž .

y F u

Ž

.

for u and v in the same interval of constancy of F, can only be defined correctly if␾ is constant on the same interval. On the other hand, d , h andF F

˜

X in general are not constant on these intervals, making a completely

F0

discrete version of the integral equation impossible. Therefore, instead of one

Ž .

function ␾ we now need a pair of functions ␾ , ␺ , satisfyingF F F

x ␾ x s d x k x yF

Ž

.

F

Ž

.

½

Ž

.

H

rF

Ž

t , x h t , x dt

.

Ž

.

0 3.2

Ž

.

M q

H

rF

Ž

x , t h x , t dt ,

.

Ž

.

5

x Ž . where rF t, u is defined by

¡

␾ u y ␾ tF

Ž

.

F

Ž .

, if F t

Ž .

- F u ,

Ž

.

F u

Ž

.

y F t

Ž .

~

3.3 r t , u s

Ž

.

F

Ž

.

␺ u y ␺ t

Ž

.

Ž .

F F , if F t

Ž .

s F u , t - u,

Ž

.

¢

F0

Ž

u

.

y F t0

Ž .

where ␾ is constant on the same intervals as F.F

Since ␾ is constant, the only real integral part is the ␺ -part; theF F remaining part of the integral can be written as a summation. The key to the

Ž .

proof of the existence of a solution pair ␾ , ␺ and also to the other proofs inF F this section are a representation of the equation for ␾ on an inverse scaleF and the construction of a continuous extension of the equation for␾ on thisF

Ž .

inverse scale similar techniques were used in Section 2 . Using a similar notation to Section 2, we denote by G the inverse of F, where, for purely discrete distribution functions F, we take the right-continuous version of the inverse, defined by

w

x

G t

Ž .

s inf x g 0, M : F x ) t ,



Ž

.

4

tG 0. Furthermore, we define kFs k(G, h1 , Fs h (G, h1 2 , Fs h (G and2 t 1

Ž

y t

.

dF

Ž .

t s . 1y t h t q th t

Ž

.

1 , F

Ž .

2 , F

Ž .

Ž . Ž Ž . Ž .. and likewise H t, u s H G t , G u , 0 F t F u F 1.

(20)

Ž .

For part iii of Theorem 3.1, we will also need the following notation:

xiq1 3.4 ⌬ g s g t dt

Ž

.

i

Ž

.

H

Ž .

xi xiq1 xjq1 3.5 ⌬ h s h u, v dv du

Ž

.

i j

Ž

.

H

H

Ž

.

usx vsxi j z 1i

Ž

y zi

.

˜

3.6 d s .

Ž

.

i ⌬ hi

Ž

1

. Ž

1y z q ⌬ h zi

.

i

Ž

2

.

i

The following theorem shows the existence of the solution pair. Moreover, it gives a uniform Lipschitz condition for the functions␾ and ␺ , which willF F

˜

be a crucial tool in showing the Donsker property for␪ .F

THEOREM 3.1. Let the following conditions on F , H and0

˜

F be satisfied:

0

ŽM1 to M3 ; D1 to D4 ; F1 to F3 . Furthermore, let F. Ž . Ž . Ž . Ž . Ž . Fw0, M x be the set of

w x

discrete nondefective distribution functions on 0, M with finitely many

Ž .

points of jump, contained in 0, M . Then there exists an ␧ ) 0 such that, for

 < Ž . Ž .< 4

Fg FF, where FF is defined by F g FFw0, M x: supxgw0, M x F x y F x F0 ␧ ,

Ž .i There exists a unique Lipschitz function ␾ : 0, 1 ª ⺢ such that, forF w x

w x tg 0, 1 _ D, ␾ t y ␾ t⬘F

Ž .

F

Ž

.

␾ t s d t k t yF

Ž .

F

Ž .

½

F

Ž .

H

dH t

Ž

⬘, t

.

ty t⬘ w . t⬘g 0, t 3.7

Ž

.

␾ u y ␾ tF

Ž

.

F

Ž .

q

H

dH t , u

Ž

.

5

, uy t Ž x ug t , 1 Ž .

where D is the finite set of discontinuities of the right-continuous inverse

y1 Ž .

Gs F in 0, 1 , augmented with 0 and 1. The function ␾ is Lipschitz,F

uniformly for Fg FF.

Ž .ii There exists a pair Ž␾ , ␺ , solving the integral equation 3.2 , whereF F. Ž . ␾ is absolutely continuous with respect to F and the function ␺ is LipschitzF F

on each interval between jumps of F, uniformly for Fg FF, with a Lipschitz norm not depending on the interval.

Žiii. Let zis F x and y sŽ i. i ␾ x , i s 1, . . . , m. Then, using the defini-FŽ i.

Ž . Ž . Ž .

tions 3.4 to 3.6 , we have that the vector ys y , . . . , y ⬘ is the unique1 m

solution of the set of linear equations,

⌬ hji

Ž

.

i j

Ž

h

.

y1

˜

yi

½

di q

Ý

q

Ý

5

ziy zj zjy zi j-i j)i ⌬ hji

Ž

.

i j

Ž

h

.

s ⌬ k qi

Ž

.

Ý

yjq

Ý

y ,j is 1, . . . , m. ziy zj zjy zi j-i j)i 3.8

Ž

.

(21)

Theorem 3.1 will be proved by approximating the purely discrete

distribu-Ž .

tion function F by the function Fs 1 y␣ F q ␣F and by studying the0

behavior of the corresponding function ␾ , as ␣ ­1. The rather technicalF proof is given in the Appendix. By Theorem 3.1, the definition of the function

˜

␪ can be extended to piecewise constant distribution functions F g FF byF defining ␦␾ tF

Ž .

Ž

1y␦ y ␥ ␾ u

.

F

Ž

.

˜

3.9 ␪ t, u, ␦ , ␥ s y y␥ r t, u q ,

Ž

.

F

Ž

.

F

Ž

.

F t

Ž .

1y F u

Ž

.

Ž . Ž . Ž . Ž . w Ž .x

where ␾ and ␺ solve 3.2 , and where ␾ t rF t and ␾ u r 1 y F uF F F F

Ž . Ž .

are defined to be zero if F t s 0 or if 1 y F u s 0, respectively. Note that

˜

␪ no longer has an interpretation as canonical gradient.F

In the sequel we will write QF instead of QF, H. We are now ready to formulate our main result.

THEOREM 3.2. Let the conditions of Theorem 3.1 be satisfied. Then

2

ˆ

˜

'

5 5

3.10 n K F y K F ª N 0,as nª ⬁.

Ž

.

ž

Ž

n

.

Ž

0

.

/

DD

ž

F0 QF0

/

PROOF. Note that it is sufficient to show the following:

ˆ

˜

'

'

3.11 n K F y K F s n ␪ d Q y Q q o 1 .

Ž

.

ž

Ž

n

.

Ž

0

.

/

H

F0

Ž

n F0

.

p

Ž .

ˆ

Ž .

Moreover, using the uniform consistency of Fn see Proposition 3.2 , we may

ˆ

assume that Fng FF, for all large n, where FF is defined as in Theorem 3.1. The proof consists of the following steps.

Ž .i By conditions D1 and F2 , and Proposition 3.3 we haveŽ . Ž .

ˆ

ˆ

'

n K F

ž

Ž

n

.

y K F

Ž

0

.

/

s n

'

H

␬ d F y F q o 1 .

˜

F0

Ž

n 0

.

p

Ž .

Ž .ii In Lemma 3.1 the following will be shown:

˜

␬ d F y F s y ␪ dQ ,

˜

Ž

.

H

F0 0

H

F F0

if Fg FF.

Žiii Unlike the situation in GG,. ␾Fˆn is constant on the same intervals

ˆ

ˆ

Ž .

ˆ

Ž .

as F . Sincen ␥ s 0, if F u s F t , Proposition 3.1 can be used to obtainn n

˜

␪ dQ s 0,ˆ

H

Fn n yielding

˜

˜

'

'

y n

H

␪ dQ s n ␪ d Q y QFˆn F0

H

Fˆn

Ž

n F0

.

.

Cytaty

Powiązane dokumenty

Zde­ rzenie ty ch postaw jest 'interesujące, ale budzi obaw y, czy zostanie odczytane zgodnie z inten cją autorki, poniew aż istnieje uzasadnione podejrzenie, że

Dlatego od momentu powstania i przy- jęcia przez Radę Ministrów Programu Polityki Prorodzinnej Państwa pojawiło się wiele zarzutów i wniosków o ograniczenie liczby kierunków

sekretarzem Hanna Krawczyk� Skład redakcji w kolejnych latach zmieniał się nieznacznie, liczył od 13 do 16 osób, przewodniczy jej cały czas A� Ruszkowski�

Na pierwszym rozpoznano dokładniej zachodnią strefę zabudowy mieszkalnej i odkry­ to zniszczony grób szkieletowy pod nawisem skalnym, na drugim zaś ujawniono bardzo

Ważne dla realizacji planu budowy zakładu salezjańskiego było zebranie 3 XI 1938 roku Komitetu Obywatelskiego na Górnym Łyczakowie i kolejne zebranie obywatelskie w ratuszu

Oprócz Muzeum Ziemi Leżajskiej znaczącą instytucją kultury w Le- żajsku, cieszącą się dużą renomą w Polsce, jest Muzeum Prowincji Ojców Bernardynów, któremu

Pierwszy numer „Rocznika” o objętości 112 stronic ukazał się 7 listopada 2002 roku w nakładzie 300 egzemplarzy, wydrukowany w zakładzie graficznym „Poligrafia Artur

Based on the values of the upper k-records listed in Table 2, we obtained the realizations of confidence intervals for the quantiles of rank p of the logarithmic rates of return