Estimation of spectral and nonlinear regression parameters

(1)

ESTIMATION

OF SPECTRAL AND NONLINEAR

REGRESSION PARAMETERS

ARTHUR SIEDERS

DELFT

TR diss

1554

(2)

(3)

(4)

(5)

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de

Technische Universiteit Delft, op gezag van de

Rector Magnificus, prof. dr. J.M. Dirken, in het

openbaar te verdedigen op donderdag 25 juni 1987

te 16.00 uur ten overstaan van een commissie door

het College van Dekanen daartoe aangewezen,

door ARTHUR SIEDERS.

. ^ £ c ^

geboren te Rotterdam, doctorandus in de natuurkunde.

TR diss

1554

(6)

Promotiecommissie:

Prof.dr. C L . Scheffer. promotor (TU Delft) Dr. K. Dzjaparidze (CWI)

Dr. A. van den Bos (TU Delft) Prof.dr. R.D. Gill (RU Leiden, CWI) Prof.dr. M.S. Keane (TU Delft)

Prof.dr. J. Oosterhoff (VU Amsterdam) Prof.dr. P.J.M. Rousseeuw (TU Delft)

(7)

(8)

Gratefully, I acknowledge the support given to me by the Netherlands Foundation for Mathematics SMC and the financial aid provided to me by the Dutch tax payers via the Netherlands Organization for the Advancement of Pure Research (ZWO). Also I thank my colleagues of

the Delft University of Technology and the Centre for Mathematics and Computer Science for their help and encouragement; in particular I express my gratitude to Mrs. Netty Zuidervaart-Murray for the

3 .

(9)

l-Vlll 1-31 32-51 52-86 Preface Chapter I Chapter II Chapter III Summaries Curriculum

A large deviation result and its application to nonlinear regression analysis.

Inequalities for products of Toeplitz matrices and their inverses, with applications.

Estimation of a function-valued parameter'-Whittle's spectral estimator.

Nederlands, English, Francais, IIo-PyccKH. Arthur Sieders, wonderkind of total, _

(10)

(11)

1. In this thesis we demonstrate how the large deviation part of Ibragimov and Has'minskii's theory (1981) of parametric estimation can be extended and improved.

We apply our method to two statistical problems:

(i) least squares estimation in non-linear regression (chapter I); (ii) spectral estimation by P. Whittle's method (chapter III).

The research done on spectral estimation has had spin-off in the form of new results on the asymptotic behaviour of Toeplitz matrices; these are

laid down in chapter II and partly used in chapter III.

Nonlinear regression and spectral estimation are subjects of paramount importance because they are applied in almost all branches of natural and social sciences. Some examples will be given in section 2.

Throughout we shall be dealing with models containing a finite and fixed number of parameters, of which P. Whittle (1962) remarked:

'Such models have been condemned as unrealistic by certain authors. I would agree that there has been a tendency to use oversimplified models in time series analysis, a tendency which is nevertheless understandable in a young subject. I would also agree that there are applications for which the construction of a model is impracticable. However, an unqualified rejection of finite parameter models has implications outside

time series analysis: it would imply the conviction that one cannot expect to 'explain' a set of data; i.e. that one cannot set up a complete hypothesis which would account for observed phenomena. If this were true

(12)

2. In this section we provide some examples: in (i) we describe a biochemical experiment where the least squares method of nonlinear regression is applicable; in (ii) we scetch how spectral analysis can help to 'repare' a damaged compact disk.

(i) An example of nonlinear regression analysis is provided by the Michaelis Menten model, which is used to describe the relation between

the velocity v of an enzyme reaction and the concentration c of the substrate. The parameters are M, the maximal reaction velocity, and K, the chemical affinity. The parameter set 9 of the (K,M) is a bounded open set in the positive quadrant. The model is

Mc v(c;K,M) = .

K+c

Consider fixed designs c given by concentrations c1(c„ c , where c -» 0 as n -» °°. At each concentration c an independent measurement of the reaction velocity is taken, giving the data X.. ,X„ X :

M ct Xt = vt(K.M) + et = - — +et,

K+c where the e are independent centered errors.

In chapter I, equation (4.21), we give a bound on the probability of a large deviatoin of the least squares estimator for (M/K.K).

(ii) Compact disks are becoming increasingly popular as a medium to store

(digitally recorded) music. One of the advantages of this medium is that small scratches and dust do not have such a disastrous effect as they have on old-fashioned records. However, a method of restoring lost sample values in such discrete-time signals is still of interest. We assume that the positions of the lost samples are known and that they are embedded in a sufficiently large neighbourhood of known samples. A simple method like linear interpolation tends to give audible errors because it mutilates the harmonic components in a music signal. Another method is the

(13)

following. We model the music signal (locally) by a stationary autoregressive stochastic process X with parameter a € R .

Xt = alXt - l+ a2+ ••• + apXt - p+V t € [ - T . T ] .

This model is natural here because, for zero e and 'nice' a, the above equation in X has 'musical' solutions. From the observed values of X one may estimate the power spectral density f of the process, which yields estimates for a.. , a„, . . . a needed to interpolate X .

For a more precise description we refer to Janssen, Veldhuis and Vries (1986).

3. In chapter I of this thesis, we provide an extension to Ibragimov and Has'minskii's theory which allows one to treat estimators which are not perse maximum-likelihood or Bayesian but maximize 'some' other functional than the likelihood or the expected loss. The reasons for this are many: the maximum-likelihood or Bayesian estimator may be distrusted, it may be difficult to calculate, or the exact form of the likelihood may be unknown, which compels one to use another criterion, such as least squares.

This extension is easy to carry out. In fact, the notion of likelihood plays hardly a role in many arguments of Ibragimov and Has'minskii. This has been observed earlier. For example, on p. 243 and p.250 of Serfling (1980) it is stated that one of the ways M-estimators can be handled is by extending the classical theory of maximum likelihood estimation. We have worked this out in chapter I of this thesis, where we treat a

least squares estimator by a method applicable to maximum likelihood estimation.

(14)

4. Also in chapter I, we give a simple alternative for Ibragimov and Has'minskii's condition N3 (p.174) on the normalised m-th order Hellinger distance which, in its original form, is quite cumbersome to verify. Omitting the technical details condition N3 runs as follows:

N3. there exist constants B.m and a, where m > a > k (here k is the

dimension of the Euclidean parameter space), such that, for all u,v and 9 € K

(i)

E

e

e )

O

u

) -z^(v)r<B|u-v|

a

.

Here e is the parameter of asymptoticity and Z Q(u) is the normalized likelihood ratio

z

e.e(

u

>

:

=

d P

e 4 u

/ d P

e

e )

-Tn where (|> Is a suitable sequence of normings.

The expression on the left-hand side of (1) can be identified as the 'm-th order Hellinger distance' between the measures P^ i and PJL i .

^n ^n Evaluating the left-hand side of (1) by the binomial formula yields a sum of m+1 expectations with alternating signs. The triangle inequality is too coarse to obtain an inequality like (1): one has to expand each term up to m-th order. This may present a formidable task if the dimension k of the parameter is >2 .which forces the order m to be £3, cf. Ingster (1984) and Ibragimov and Has'minskii (1981), p.202. To illustrate this difficulty we refer to p.56 of Ibragimov and Has'minskii (1981), where a

theorem is announced which concerns the case k > 1 (theorem 1.5.8). The proof, however, is valid only for k=l and extension to the case k£2 is not obvious.

In chapter I, we circumvent this difficulty by considering, instead of the normalized likelihood ratio Z 0(u), the logarithm of this quantity.

(15)

This simple trick allows one to replace condition (1) by the moment condition

(I') 4

6 )

l

l 0

«

Z

e , 8

( u )

~

log Z

e.e

( v )

|

m

*

B

l

u

~

v

|

a

-which is much easier to check because Z tends to be of exponential form. In the appendix to this preface we present an inequality due to Dzhaparidze and Valkeila (personal communication) which brings another possibility of simplification.

5. In chapter III, we present a large deviation theory for function-valued parameters. The motivation for this research was the following.

Let # denote a family of probability measures {9-,6 6 8} indexed by some

o k

subset 0 of Euclidean space 1R .

In many statistical problems, such as (nonlinear) regression and spectral estimation, the family 3" used to model the observations has the property that P. depends on 8 via some function f0( * ) , e.g. the regression function or the spectral density function. More precisely, the map IT : 9 -* 9s; 8 -» 9* can be factorized as IT - w,o-jr with

u i z

ir2 : 9 -* F; 9 -+ fg(-).

irl : F -» 9s; f -> P,.

Here F is a (metric) space of functions and, with an abuse of notation, P = P

f

e

Formally, such a factorization is always possible but often there are strong indications that a functional parame tri zat ion IT. ■ F -* 3>, i.e. a

parametrization by a function-valued parameter, is more natural than a Euclidean parametrization n '■ Q -* 5s. For instance if the Hellinger distance (denoted by h) has the property

(16)

where d is a metric on F then reparametrization of 9s by ir. leads to the elegant

h(Pf.Pf.) = d(f.f).

whereas those insisting on Euclidean parametrization would, in general, have to content themselves with inequalities of the form

Cjle-9'l

i h(P

fl

.p

fl

.) < c

2

|e-e'|.

which may cause a 'loss on exchange' by a factor C_/C.. It is obvious that a statistical theory covering parametrization by non-Euclidean metric spaces would gain sharpness and elegance. Such a theory is becoming available now, see e.g. Birgé (1983).

In Chapter III of this thesis we have set a contribution to this field by proving a large deviations result for Whittle's spectral estimator without taking recourse to Euclidean parametrization, as is done by e.g. Ingster (1984). For the nonlinear regression problem results of this type were recently obtained by Van de Geer (1986).'

6. In the chapters I and III, we have proved some large deviation results

for parameter estimators. Such theorems, besides being of independent "\ statistical interest, are an important tool in the theory of Ibragimov

and Has'minskii. We refer the reader to their theorem III.1.3 and corollary 1.5.2 on the asymptotic efficiency and the asymptotic risk, respectively, of an exponentially tailed estimator.

7. In chapter II, we have described some auxiliary results on the asymptotic behaviour of Toeplitz matrices. In particular, it is shown that the product of finite section Toeplitz matrices (and their inverses) associated to functions in the real Krein algebra can be approximated, in a strong sense, by the Toeplitz matrix associated to the products of these functions (and the corresponding inverses). Some of these results

(17)

are needed in chapter III to calculate moments of Whittle's approximation of the likelihood of a stationary Gaussian process. As another application of these results we obtain, in chapter II, an inequality related to Szegö's strong limit theorem.

Appendix.

As a n encore, we give an inequality which helps to check condition N3 discussed in section 4. Let x > 0. y > 0 and let k be any positive number. Then

(2) | x1 / k- y1 / k|k i k "k| l n ( x / y ) |km a x ( x . y ) .

This i n e q u a l i t y has the following c o r o l l a r y : i f P,P a r e p r o b a b i l i t y measures with dominating measure R, and if Q := '/S(P+P) then

r fdP-,l/k fdPil/k^k , / fdP dP-,-»k

%(8 - Ü

}<»-W"fc'^-(3)

dQ dQJ

Using this inequality, it is quite simple to get the assertion of lemma III.5.2 of Ibragimov and Has'minskii (1981). Let P.P give, respectively, drift S,S to a standard Wiener process b(t), 0 < t < T. Assume that S,S €

2

L ([0,T]). Using (3) and theorem 1 in appendix II of Ibragimov and Has'minskii (1981) we obtain dP-, l/2m rdP-. rlljJ VIIjJ P-, l/2m R 2m < Const(m) EQ dP In — dP 2m = Const(m) E

J(S-:

S)db 2 m < ConstfrnJIIS-Sllg"1.

(18)

References

Birgé, L. (1983). .

Approximation dans les espaces métriques et theorie de 1'estimation. Z. Wahrscheinlichkeitstheorie verw. Gebiete. 65, 181-237.

Draper, N.R. and Smith, H. (1981).

Applied Regression Analysis. Wiley, New York. Geer, S. van de (1986).

On rates of convergence in least squares estimation. Preprint of the Centre for Mathematics and Computer Science. Amsterdam.

Ibragimov, I.A. and Has'minskii, R.Z. (1981).

Statistical Estimation: Asymptotic Theory. Springer, New York. Ingster. Yu.I. (1984).

Asymptotic regularity of a family of measure corresponding to a Gaussian random process which contains a white noise component for a parametric family of spectral densities. J. Soviet Math., Vol. 25, No. 3, p.1165-1181.

Janssen, A.J.E.M. , Veldhuis, Raymond N.J. and Vries, Lodewijk B. (1986). Adaptive Interpolation of Discrete Time Signals That Can Be Modeled as Autoregressive Processes. IEEE Trans. Acoust. Speech Signal Process. Vol. ASSP-34, No.2, April.

Robinson, E.A. (1980).

Physical Applications of Stationary Time-Series. Griffin, London.

Serf ling, R.J. (1980).

Approximation Theorems of Mathematical Statistics. Wiley, New York. Whittle. P. (1962).

Gaussian Estimation in Stationary Time Series. Bull. Inst. Internat. Statist. 39, p. 105-129.

(19)

A Large Deviation Result for Parameter Estimators

and its Application to Nonlinear Regression Analysis*

A. Sieders

Delft University ol Technology, P.O. Box 356, 2600 AJ Dent, The Netherlands

K. Dzhaparidze

Centre lor Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam, The Netherlands

Elaborating on the work of Ibraglmov and Has'minskil (1981) we prove a Law of Large Deviations (LLO) for M-estimators, i.e. those estimators which maximise a functional, continuous in the parameter, of the obser vations. This LLO is applied, using results of Petrov (1975), to the problem of parametrlcal nonlinear regression in the situation of discrete time, independent errors and regression functions which are continu ous in the parameter. This improves a result of Prakasa Rao (1984).

1980 Mathematics Sublet Classification: 60F10, 62F12, 62J02.

Key Words & Phrases: M-estimators, large deviations, rate of convergence, least-squares, nonlinear regres

sion, Michaeiis Menten model.

Note: This research was supported by the Netherlands Foundation of Mathematics SMC with financial aid

from the Netherlands Organisation for the Advancement of Pure Research (Z.W.O.).

*

(20)

1. Introduction

The main results of this paper are the theorems 3.1 and 3.2, which establish a LLD for the least-squares estimator of a nonlinear regression parameter. The proofs rely on theorem 2 . 1 , which is a generalisation of theorem 1.5.1 of Ibragimov and Has'minskii(1981). In order to understand why generalisation is desirable, consider the following nonlinear regression model for the observations X := X!t X2, ..., X :

(1.1) Xt = ft( e ) + et, t = l , 2 n,

where the f. are known continuous functions on a parameter set 0 C IR , the e. are independent, not necessarily identically distributed, errors with zero expectation, and 9 € 0 is the true value of the parameter, which is to be estimated by some functional 9 (Xt, X2, ..., X ).

If the distributions F. of the e. are known, then we can construct a family of measures { I Pe ( n' , 9 £ 0 } on a suitable space of events { X( n\ü/( n ) }, define the family of statistical experiments { x 'n' , l /n ), I P ^n ) } , n = l , 2 , . . . , and proceed as Ibragimov and Has'minskii (1981) in order to describe the asymptotical behaviour of the maximum likelihood estimator 9

n

For instance, we can apply theorem 1.5.1 of Ibragimov and Has'minskii (1981), which states that a Law of Large Deviations, i.e. an (exponential) inequality for the probability of a large deviation of the estimator 9 from the true value 9, holds if the normalised likelihood ratio Z g(u) satisfies two conditions, which, roughly stated, are that, for n large enough (e small enough, in the formulation of the theorem; put e:=l/n), Z a M is, in expectation, sufficiently continuous in u and that

1/2

IEZ o(u) ' decreases exponentially as |u| -► oo.

(21)

" LS

often resorts to the so-called least-squares estimator 6 , which minimalizes the residual sum of squares

(1.2) Q J Xn, e ) ' : = 2 ( \ - ft( 9 ) )2

-n t*n l l

The properties of 0 can be investigated if one restricts the F, to a sufficiently "nice" class ( F. }. We claim that theorem 1.5.1 of Ibragimov and Has'minskii (1981), although formulated for the maximum likelihood scheme, can provide a valuable tool here. In the theory of M-estimators the idea has been developed (see, for instance, Serfling (1980)), that the classical maximum likelihood theory can be extended to estimators maximising some other functional of the observations. Indeed, inspection of the proof of the mentioned theorem reveals that i t continues to hold if the likelihood is replaced by some other 0-continuoi

C (Xn,0), which we shall call an M-functional.

likelihood is replaced by some other 0-continuous IPg - a.s. positive functional

n

We shall t r y to apply this generalised version of theorem 1.5.1 to the LS-estimator for the model given by equation (1.1), which maximizes the M-functional

(1.3) C (Xn,0) := exp -* 5[ (X. - f.(6))2,

1 tSn l l

which is, of course, the likelihood i f the e, are i.i.d. standard normal. Theorem 1.5.1 (and our theorem 2.1) express the large deviation properties of the estimator in the normalised ratio Z g(u) and not directly in C (Xn,0) (the reason for this lies in the application of lemma A2). Therefore we define, for some choice of norming constants é ,

n

(1.4) Zn > e(u) := Cn(Xn,e+<J>nu) / Cn(Xn,0).

Unfortunately, i t turns out that it is not easy at all to formulate conditions on the family of regressors ( f (8), 0 £ 0 } and the class of distributions { F. } of e. which guarantee that the Z g(u) defined by (1.3) and (1.4) satisfies the conditions of the

(22)

generalised theorem described above. It is perhaps for this reason that Prakasa Rao (1984) restricts himself to the case that e. are i.i.d. Gaussian and the dimension k of 6 is equal to 1. The main difficulty inherent to theorem 1.5.1 seems to be that its Holder condition (1) is quite difficult to verify, as its authors, in their comment on theorem 1.5.1, implicitly admit, especially i f the dimension k of 9 is > 1 . On p. 56 of Ibragimov and Has'minskii (1981), a theorem is announced which concerns the case k > l (theorem 1.5.8). The proof, however, is valid only for k = l , and extension to the case k> 1 is not obvious. Less powerful, but more sound methods a l l require considerable manipulation, even in the Gaussian situation, cf. Ingster (1984), p.

1179, and Ibragimov and Has'minskii (1981), lemma III.5.2 on p. 202f.

These observations motivated us to seek for a LLD in the spirit of theorem 1.5.1, which would not only apply to a much broader class of estimators than just ML, but which would also be more flexible in its conditions. This effort resulted in theorem 2.1 cf this paper, which we apply, in section 3 , to the nonlinear regression problem. For statistical applications of LD theorems we refer the reader to theorem 1.10.1 of Ibragimov and Has'minskii (1981). which may give an idea of the possiblilities.

Dzhaparidze (1986) used a rudimentary form of theorem 2.1 to infer about intensity parameters of counting processes. Another study on theorem 1.5.1 was recently made by Vostrikova (1984), who gives conditions for a LLD for Bayesian and M L

estimators in terms of variation distance and predictable terms. Large deviation results for M-estimators in an i.i.d. setting were recently obtained by Kester(1985).

Acknowledgement: we acknowledge Carel Scheffer for his helpful advice and Lieneke Lekx for her careful manipulation of the text. We thank the referee, whose remarks have substantially improved the paper.

(23)

2. A Law of Large Deviations

Consider a family of statistical experiments E^ = { x '6' , f( e' , I P0 ( 6 ); 0 € 6 } , where the IPa are not necessarily of known form (see 1. Introduction). The parameter set 6 is a Borel subset of k-dimensional Euclidean space. We shall consider M-estimators maximizing an M-functional C : X ' x 6 -► [0,oo), which is assumed to be, for all X € X , a positive continuous function of 0 and, for each 6 € Q, a measurable functional of X .

Throughout we assume that, for a l l 0 6 0 and I P0 - allmost a l l Xe, a solution 0 to the equation

(2.1) C

e

(X

6

, 0

e

) = sup C

e

(X

e

, 9)

0 6 6

exists (this is certainly true i f 0 is compact). On the basis of the existence assumption we may demonstrate that a measurable functional 0 : X -► 0

exists which is a solution of (2.1). This is worked out in lemma A l i n the appendix. So we assume henceforth that 0 is measurable.

A l l our results are of asymptotic nature, i.e. they are valid for e small enough (0) and R large enough, where e-K) describes the approach of the ' l i m i t experiment' E and R describes the normalised deviation of the estimator 9 from the true value 0.

Let, for each e and 0 6 0 , 0(e,0) be a non-singular kxk matrix and define the normalised M-ratio

(2.2) Ze 0 (u) := Ze>e (Xe,u) = Ce(X6,0 + *(e,0)u) / Cf( Xe, 0 ) ,

which, f o r fixed observation X , is a continuous, non-negative finite function on the set Uf e := 0(6,0)"' (0-0). Define Tf e R := Ue e f l {u: R * |u I * R + l } . We define the following sets of functions (compare Ibragimov and Has'minskii

(24)

(1981), Ch. 1.5, p. 41).

G io the set of oil functions g (.) possessing the following properties:

(1) for fixed e, g (.) is a function on [0,oo) monotonically increasing to infinity; (2) for any N>0,

(2.3) lim RN exp - g£(R) = 0.

R+oo 6 + 0

Let K be a measurable subset of 0 , then Hi, is the set of all functions n g(.) possessing the following properties:

(1) for fixed e and 8 6 6 , n fl(.) is a function U ,,-► (0,oo);

(2) there exists a polynomial pol(R) in R such that, for e small enough and R . sufficiently large, the following inequality holds:

(2.4) sup %e

{u)

~

l

* P °

1 ( R )

-e -e K

;

u € r

n , e , R

Let, for each e and 6, t fl : [0,oo) -► IR be a monotonically non-decreasing continuous function and define the random functional

(2.5) . ? e , 9 <u) - f e , 9 <Ze , e <u»

-The main result of this section is the following theorem, which gives sufficient conditions, in terms of the functionals J a (u), for a LLD to hold for 9 .

Theorem 2 . 1 .

a) Let the functionals f fl (u) possess the following properties: given a measurable subset K C 8 C IR , there correspond to it numbers m and a, where m^a>k, functions g 6 G and n a £ HR' a n c' a poly™3™!3! poL(R) in R such that, for all e small and R large enough, the following conditions hold:

Ml:

U)

(25)

for all 0 € K and u and v 6 T g □;

M2: IPQ ( e ) { ?e ( 0 (u) - f e > e (0) * - ne ( 9 (u) } * exp - gf (R) for all 0 € K a n d u € T ap

-Then the following uniform LLD holds:

there exist positive constants BQ and bg such that, for all e small and H large enough,

sup I Pe ( e ) { |0(€,0)-' (êe - 6)| * H } * B0 exp -bQ g£(H).

The constant bQ can be made arbitrarily close (from below) to (a-k)/(or-k+mk) by choosing BQ large enough.

b) The conclusion of part a) continues to hold if Ml is replaced by the following condition Ml(<5):

Ml (<5): Ml holds for all 0 € K and u,v € T£ Q R satisfying |u - v| * 6, where 6 is a fixed positive constant,

provided one of the two following (weak) assumptions is satisfied: M l ' : 0 is a convex set;

M l " : IEe ( e ) lfe 0( u ) |m * polK (R) for all 0 € K and u 6 Te Q R.

Remarks:

1. For applications in the method of Ibragimov and Has'minskii (1981), the set K is chosen to be compact. For the above theorem this is not essential.

2. Theorem 1.5.1 of Ibragimov and Has'minskii (1981) follows from the above theorem by choosing f 0 (u) := Z Q (U) 'm and r\ n(u) = {. In particular, condition

(2) of 1.5.1 implies M2 by Markov's inequality and condition (1) implies M l . 3. Compare also the conditions of Vostrikova (1984), theorems 1 and 3.

(26)

4 . If, for some 6, <#>{e,8) -► 0 in operator norm as e -► 0 then this 6 i s weakly consistently estimated by 9 .

The proof of theorem 2.1 proceeds via a number of propositions. The reader is advised to consult the proof of theorem 1.5.1 of Ibragimov and Has'minskii (1981), as our proof follows the same line. To avoid tedious repetitions, we assume at each stage of the proof that an initial choice of sufficiently small e and sufficiently large R (or H) has been made.

Proposition 2.2.

If there exists constants B and b such that

(2.6) sup IPg

0 € K (e)

sup ?

6 | 0

im * ?

e>0

(O)

u € r e

, e , R

* B exp -b g6 (R)

then (i) the assertion of theorem 2.1 holds;

(ii) the constant b~ there can be chosen a r b i t r a r i l y close (from below) to b. Proof. Ibragimov and Has'minskii (1981), Ch. 1.5, p.42, prove a s i m i l a r , but less precise, statement in equation (5.4). We apply lemma A2 (appendix) and estimate its right-hand side. For any small positive 6 one has, using the monotonicity of g and J,

(2-7)

IP

_e

(e) sup Zf 0(u) £ 1 luRH

^ B 2 exp -bge(r+H) = B exp -bQge(H) 2 exp -bc$ge(H+r)

r = 0 r = 0

where bQ := b(l-<5). The sum on the right-hand side is finite: relation (2.3) says that,

,N

(27)

Proposition 2.3.

ConditionMl (<5) together with either condition M l ' or M l " implies condition M l .

Proof.

Case 1: M l {6) & M l ' =» M l . From the convexity of 0 follows that any u and v in r a D may be connected by a path in T Q R consisting of linear segments of length £ <5, where the number of segments does not exceed C<$"'|u-v| and C is a fixed constant not depending on 9 or R. To all the segments Ml(<5) is applied; by

Minkowski's inequality for integrals i t then follows that

(2.8) (I EK6 l 9(u) - ?e ) 0( v ) lm)1 / m * C ( 5 - ' | u - v | . < $a / m. p o lK( R )1 / r n,

which leads to M l because, as u and v € Te Q R , |u-v| ü | u - v |G r / m. ( 2 ( R + l ) )1~o r / m, where the second factor is absorbed by the polynomial p o L .

Case 2: Ml(<5) S M l " » M l . From M l " follows, using Minkowski's inequality again, that the left-hand side of M l is bounded by 2 .pol^(R), which, for any u,v such that |u-v| > <5, is bounded by |u-v|f f.2mc5~apolK( R )-( : l

Proof of theorem 2 . 1 .

By proposition 2.3 i t suffices to prove only part a). By proposition 2.2 we need only prove relation (2.6). We subdivide the section {u: R ^ | u | ^ R + l } into N regions, each with diameter at most h. Such a subdivision can be accomplished such that the number of regions is bounded by

(2.9) N i Const(k) ( R + l )k _ 1 h"k,

where Const (k) is a constant depending only on k. This subdivision induces a partition of Te Q R in at most N sets; denote this partition by

(28)

where N ' ^ N, and choose in each member T g o a point u.. Then

(2.11) IP, (e)

e

e,e,R where P, and P-, are given by

N'

sup ?6tfl(n) ^6 i 6( 0 ) | * P1 +P2 >

fc ft R J

(2.12)

P i = = 2 i PeW {f e t e( u ) - fe i e( 0 ) * - ,e f e(U) } ,

P2 := IPflW max |?g)9(u) - ?£ ) 9(v)| * lnf r , ^ ; u,v € rc A R

Uu-vRh r

e R

From condition M2 and the inequality (2.9) we have immediately (2.13) P[ <■ Const(k) (R+l)1^1 h"k exp -g (R).

The second term P-, is bounded as follows. Throughout the argument we let pol(R) denote any (not necessarily always the same) polynomial in R, the coefficients of which may depend on a, k, m and pol,, but not on e, R, 0, u and v.

Now, let UQ be any point in T a D and consider the random function

C g(u) - C Q(UQ) on the closed set T fl R. Now apply to it lemma A3 in the appendix. By assumption, f is continuous in u and hence it has a measurable and separable version (see Neveu (1970) for the notion of separability). Put

(2.14) C(u) : = m a x { l , | u - uor } . p o lK( R ) ,

then C(u) is bounded by pol(R), as u and UQ É V g R. With this choice of C(u), the conditions (1) and (2) of the lemma are fulfilled due to condition Ml of theorem 2 . 1 . It then follows from this lemma and Markov's inequality that

(2.15) P2 £h( o r"k ) / m.pol(R),

(29)

Putting the inequalities (2.11), (2.13) and (2.15) together we have

(2.16) IP

e(e)

{sup f

€jfl

(u) * f

e

'

tfl

(0)} * h"

k

.pol(R) exp -g

f

(R) + h

(cr

"

k)/m

.pol(R).

Now we put h:= exp Cg (R), where the constant C should be chosen such that no one

tail in (2.16) dominates the other. This leads to

(2.17) C = -m/(or-k+mk)

The final result (2.6) follows from (2.16), (2.17) and the property (2.3) of exp g

to dominate any polynomial. The statement concerning b« is now obvious from the

second part of proposition 2.2. We remark that Ibragimov and Has'minskii (1981)

k-1

use, instead of (2.9), the inequality N * Const.R/h , which we were unable to

verify. Of course, this would lead to another bound for b« in theorem 2.1.

(30)

3. Nonlinear least-squares regression with independent errors

Let 0 be a Borel subset of IR and let f. (9) be a continuous deterministic function from 6 to IR for each t € IN ;all our results can easily be generalised to the case of a deterministic triangular design array ( t . , U , . . . , t ; n € IN). We consider the nonlinear regression model

(3.1) X

t

= f

l

(0) + e

t

, t=l,2 n,

where Xn := Xl f X2, ..., X are the observed random variables and {e,, t € IN} is a sequence of real independent random variables w i t h expectation zero.

The least-squares estimator 9 (which we assume to exist; see section 2 and lemma A l ) maximises the functional

(3.2) Cn(Xn,9) := exp -i 1 (\- ^O))2.

Given a sequence of non-singular matrix norming factors <f> (9) we define the ratio

Zn , 9( u ) : = Cn( X" 0 +* n( 9 ) u ) / Cn( > < n'6 ) (3.3)

exp 2

d t n 0

<

u ) e

t ■ * 2 d

tn0

(u)

2

t*n t^n

where

(3.4) d

tne

(u) :=f

t

(e+«

n

(fl)u) -f

t

(9).

Because of the many practical application of the model (3.1), the various properties of the least-squares estimator, such as strong or weak consistency, asymptotic normality and large deviation behaviour, have been studied extensively. See e.g. Van de Geer (1986), Ivanov (1976), Lauter (1985), Prakasa Rao (1984) and Wu (1981). A l l these authors restrict themselves to the case that the errors e, are independent

(31)

and identically distributed.

We shall study the large deviation probability of the least squares estimator i n the case of independent e r r o r s . To this end, we stipulate the following assumptions which allow us to apply theorem 2 . 1 .

Assume that, for some Borel subset K of 0 , there exist functions gn(R) € G, positive constants y > 0 , A . 6 (0,oo], 6 6 ( 0 , | ) , K > 0 and p € (0,1], and a polynomial pol(R) such that, for all n and R large enough, the following inequalities hold:

N l : for a l l t € IN and |X| £ A j (note that A j = oo is allowed) IE exp Xe, ^ exp i y X2; N 2 : for a l l 0 € K and u,v € T „ D> where |u-v| ^ K, one has

2 [ ft (0+*n(6)u) - ft(0+0n(0)v) ]2 * |u - v|2P.pol(R) t * n

and

2 [ ft(9+0n(0)u) - ft(0) ]2 * pol(R); t * n

N 3 : for a l l 0 6 K and u 6 T * D one has

2 [ ft (0+0n(0)u) - ft(0) ]2 * An(0,u) gn(R), t ^ n

where

An(0,u) := max { 2y<5'2, 2A1"'(5-1 maxn(0,u) ) and

maxn(0,u) := max ( |ft(0+<f>n(0)u) - ^ ( 0 ) 1 ; t = l , 2 , . . . , n }.

The following theorem seems to us an instructive example of the application of the very general theorem 2 . 1 .

(32)

Theorem 3 . 1 .

Let, for some K C 0 and suitably chosen normings <J> (6), assumptions N l to N3 be fulfilled. Then the following LLD holds:

there exist constants BQ and OQ such that, for a l l n and H large enough,

sup I Pe ( n ) { 1^(6)-' (9n - 0)| * H }' * BQ exp - bQ gn(H). 0 6 K

Moreover, for any jS >0 we can choose Bn such that

(3.5) b

Q

^p(p+k) - ' - £ .

Before proving this theorem, let us. discuss the significance of conditions N l to N3 and the relation they bear to known results concerning the behaviour of the least-squares estimator.

Condition N l prescribes that the tails of the e. should be uniformly "thin". The uniformity is evident in the i.i.d. case. If the e. are e.g. Gaussian or bounded then N l holds with A j = oo; in that case An in N3 is constant and |ft(G +0n(0)u) - ^(0)1 may increase unboundedly in t.

Condition N2 is a Holder type continuity condition on the parametrisation 0'-► f(0). It is directly related' to condition M l of theorem 2 . 1 . This assures that the regression functions do not behave too wildly in 0, so that uniform estimates can be obtained. Compare e.g. lemma 3 of Jennrich (1969), condition I I I of Ivanov (1976), assumption A(ii) of Wu (1981) and condition (2.5) of Prakasa Rao (1984), which are of a s i m i l a r nature. It is easy to construct an example where the regression functions f,(0) are not everywhere continuous in 0 but s t i l l a LLD holds. Therefore we mention the approach of Van de Geer (1986) to impose entropy instead of continuity conditions; compare also our inequality (2.9) and lemma A of Wu (1981).

(33)

Condition N3 prescribes the rate of asymptotic separation. Asymptotic separation (the regression functions keep enough apart to be statistically distinguishable) is a necessary condition for consistent estimation; see Wu (1981), theorem 1. It may be interesting to note that asymptotic separation may be viewed as a form of continuity of the inverse of the parametrisation, i.e. of the map f(9) -► 0: if 9 and

9':=0+0n(e)u are "apart", i.e. if |<f>n(9)-'(6 - 9')l ^ R, then also f(9) and f(9') are "apart" in the sense of condition N3. Logically, this is equivalent to a form of continuity. In Jennrich (1969), the separation condition is that of existence of the tail cross products (see also his lemma 3). In Wu (1981), this seems to be his

(complicated) condition A(i). In the same line lie the conditions of Ivanov (1976) (condition III), Prakasa Rao (1984) (condition (2.6)) and Lauter (1985) (condition (12) to theorem 1).

Proof of theorem 3 . 1 .

The proof consists of checking conditions Ml and M2 to theorem 2.1 with J(Z):= log Z. We assume that an initial choice of sufficiently large n and R has been made.Let, throughout, u,v € I~ a ^ , |u-v| ^ K and 9 ë K.

First we check condition M l .

Condition N2 may be expressed in the d, Q (U), as defined in equation (3.4): (3.5) 2 | dt n 9( u ) - dt n 0( v ) |2 * |u - v|2P . pol(R)

t*n and

(3.6) 2 dl n 9M2 * pol(R).

t£n

(34)

where the factor AT ^ is absorbed by the polynomial pol(R). From (3.3) we have, choosing f « (u) := log ZR Q (U),

(3.7) where (3.8)

**t*n**

At: = dt n e( u )-dt n 9( v )'

2B

t

:=d

t n e

(u)

2

-d

t n 9

(v)

2

.

Note that, by lemma 5 in Ch. III.4 of Petrov (1975), condition N l implies the existence and boundedness, uniform in t , of moments of a l l order m of e,. Hence, using the independency of the e., condition N l and Ee, = 0 , we find, for a l l even m22,

s (3.9) IE | Jn e (u) - fn e (v) |m * Const(m) . 2 * ft

1,1^...,^ 1=1

where * denotes summation over all positive even ' i t ' ? * " * » ^ ^ 2 and even tëO (where s^O) having sum m. We have the following estimates:

f"

1.1 2 V

. 1 J

•

' n

2B

t

U J

n

2 2 B

t

1 * 2ld

l n

eM-d

l n

e(v)|.|d

l n e

(u)

+

d

lne

(

V

)|

l

n n (3.10)

* ( 2 l d

l n 9

( u ) - d

t n 0

( v ) | ^ 2 | d

t n e

( u ) + d

t n e

( v ) ^ )

i

1 1 £ |u-v|P.pol(R),

where we have used Cauchy-Schwarz, the inequality (a+b)2 £ 2a2 + 2b2, the fact that u,v 6 r a n by assumption and inequalities (3.5) and (3.6).

We also have, for / even and ^ 2 , using (3.5) again,

(3.11) 0 * 2 A . ' * 2 A ,2| '/ 2 * |u-v| Pl. pol(R). t i n t * n

(35)

Consequently, (3.9) becomes, using (3.10) and (3.11),

(3.12) I E | ?n ) 9( u ) - ?n i 0( v ) |m ^ l u - v l ^ . pol(R).

If we choose m even and larger than k / p , (3.12) f u l f i l l s condition M l of theorem 2 . 1 , with the constant a = pm.

Now we check condition M2. We shall w r i t e , for simplicity of notation, d, := d, g(u) and max | d . | := max { |d.no(u) | ; t = l , 2 , . . . , n }. Choose

2 (3.13) i, fl(u) := W - f l S d t n f l M

n,o t^-n in

By condition N 3 , one has the inequality

(3-14) 5dt n 9( u ) 2 > 8Y & n( R )' t^n

which shows that rj fl (u) 6 PL because, as follows from equation (2.4), g_{R)"' ^ 1 for n and R sufficiently large. By (3.7), (3.8) and (3.13) and lemma A 4 in the appendix

I p

e

( n )

< ?

n

,e<

u

**> - £n,e<°> *" V >**

M } =

"V"'

{t

?

nd

t

e

**t * '£*?**

}

(3.15)

£ e xP- 5 d2/ A

t?n l n

where A (6,u) is defined in condition N 3 .

It remains to apply the inequality of N3 to (3.15), which yields

( 3-1 6 ) I P9( n ) < ?n,9 ( u ) " fn.0 ( 0 ) * ' %,Q(u) > * e xP " &n(R)>

thus f u l f i l l i n g condition M2 of theorem 2 . 1 .

(36)

is easily accomplished by choosing or = pm and letting m -► co.D

We have formulated conditions N2 and N3 in the spirit of Ibragimov and Has'minskii (1981) and our theorem 2 . 1 . This has allowed a direct application of this theorem. From theorem 3.1 we now deduce a slightly weaker theorem of friendlier

appearance, which seems to suffice for many applications. To this end, we make the following observations.

i . Problems might occur if, for some 0 and u, A (0,u) would increase to infinity in n. For i t follows from N2 and N3 that gn(R) £ pol(R)/An(0,u); i f AR -► oo then condition (2.3) on the set G would be violated. Fortunately, one also has

maxn(e,u) * [ 2 [ ft( 6 + <f>n(9)u) - ft(6) ]2 ] * ( pol(R) )* t * n

for all '6 É K and u 6 T a D by N2 and N 3 , so that A „ is bounded in n. n,ü,K ' n 2. One might argue that theorem 3.1 is of l i t t l e value in applications because, in

practice, one never knows the exact value of A , . Indeed, when analysing real data, we may as well set A , = oo; the meaning of condition N l is of course that it gives the theorem a certain robustness: nothing terrible happens when A . <oo. 3. In practice, the constant p w i l l usually be equal to 1 (a counterexample is

provided by f (0) = 9^, 0<p<l and 0 = [-1,1]; the reparametrisation 0^ =: r makes p = 1 again).

4. The polynomial pol(R) seems to be unimportant in applications; however, it saved us the two extra constants m, and M , used in theorem 1.5.1 of Ibragimov and Has'minskii (1981).

5. Finally, a natural choice for the function gn(R) seems to be a quadratic function and for K we might, out of the context of Ibragimov and Has'minskii (1981), as well choose the set 9 . To obtain simple conditions, we restrict ourselves to the case that 0 do not depend on 0.

(37)

These considerations have motivated the following theorem:

Theorem 3.2.

Let, for a suitable sequence of normalising matrices 4 , the following conditions be fulfilled:

N l ' : For some y, condition Nl holds with A. = oo.

N4 : Let there exist positive constants D, and D2 such that, for all 9, 9 ' € 6 and n large enough,

D j i ^ - o - e ' ) !2 * 2 (ft(0) - ft(00 P * D2| *n- ( e - 9 ' ) |2. t*n

Then the following LLD holds for the LS estimator 0 :

there exist constants Bn. and b such that, for all n and H large enough,

sup I Pe ( n ) ( |0n"' (6n - 0) | * H } s B0 exp -b H2.

e e e

Moreover, for any /? >0 we can choose BQ such that b * DAA6y{l + k)) - p.0

Proof.

To apply theorem 3 . 1 , let us verify its conditions. Nl holds by assumption; by N4, N2 holds with p = 1 and pol(R) = D2- By N4 and N l , N3 holds for any 6 € (0,i), with the choice A := 2y(5"2 and g (R) := {D,/2yó-2) R2. Now apply theorem 3.1

and let 6 -► {.0

Theorem 3.2 extends a result of Ivanov (1976), namely his LD lemma 1. It generalises the result of Prakasa Rao (1984). His theorem follows immediately from ours. In section 4 we give an example to show that our generalization is not void.

(38)

4. Examples and concluding remarks.

In this section, we present some examples of the application of theorem 3 . 2 . Recall that two sequences of positive numbers (a ) and (b ) are called

(asymptotically) equivalent (write a - b ) i f there exist positive constants C, and C2 such that C,b ^ a ^ CJz for all n (large enough). In the same manner, we call a parametrised family of positive sequences { (a (0)) ; 8 € 0 } (asymptotically) uniformly equivalent to a positive sequence (b ) if there exist positive constants C, and Cy such that, for all n (large enough), the inequality C , bn £ an(0) ^ C2bn holds. We shall write a (9) - b (uniformly in 0). These definitions can, in an obvious manner, be generalised to sequences of positive definite symmetric matrices (A ; n = l , 2 , . . . ) . We say that A ^ B if the difference is a positive sernidefinite matrix.

Examples 1 and 2 are provided by the Michaelis Menten model, which is used to describe the relation between the velocity v of an enzyme reaction and the concentration c of the substrate. The parameters are M, the maximal reaction velocity, and K, the chemical affinity. The parameter set 6 of the (K,M) is a bounded open set in the positive quadrant. The model is

M c

(4.1) v(c;K,M) K + c

We shall consider fixed designs c given by concentrations c . , c?, .... c , where c -»• 0 as n ■* co. At each concentration c. an independent measurement of the velocity is taken, giving the data X) ( X2, . . . , X :

M c .

(4.2) Xt = vt(K,M) + e = — + et,

K + c,

where the e. are independent centered errors satisfying condition N l ' of theorem 3.2 for some y.

(39)

Example 1.

Consider the following simple model, which is obtained from (4.1) by assuming that

-1/4

K / M is known (put K / M = 1 , without loss of generality) and putting c, = t ' . This model can be w r i t t e n as

1 (4.3) f»(ö)-" - n V7A t=l,2,3

'V

K

-T7774

Note that, for this model, the conditions of Jermrich (1959), Ivanov (1975) and, i n particular, Prakasa Rao (1984), do not hold.

One has

(4-4) 2 (f.(K) - f. ( K ' ) )2 = IK"1 - K ' " * |2 Cn(K,K'),

tsh l l n

where

(4.5) Cn(K,K') := 2 l / [ (K"1 + t1 / 4) . ( K ' "1 + ti / 4) ]2

and it is easily shown that the sequence C (K,K') * log n, uniformly in K,K'. It follows i n particular that, for n large enough (as usual),

(4.6) 2 (f t(K) * ft ( K ' ) )2 2 D. |K - K ' |2 . log n tsn

4

where D, can be chosen a r b i t r a r i l y close (from below) to l / ( s u p K) . Now we can apply theorem 3 . 2 , which yields

(4.7) sup I PK ( n ) { (log n)*.|Kn - K| * H ) S BQ exp-bH2,

where b can be chosen a r b i t r a r i l y close (from below) to 1 / 1 6 y (sup K) . We remark that, i n the case cf i.i.d. disturbances e., the strong consistency of the LS-estimator for this model can be demonstrated by theorem 3 of Wu (1981). By theorem 5 of the same author, i t is also asymptotically normal:

(40)

s

n

(K

1

,K

2>

K

3

,K

4

) := j ^ l ^ / j ^ V

; B

i

(K

l

,K

2

)\(K

V

K

2

) y a

s

(K

3

,K

4

)b

s

(K

3

,K

4

)

(4.17)

t^n - - - ssn

for various values of the parameters K.. Hence it suffices that these sequences be

uniformly equivalent.

Using (4.13) and (4.16) it follows that

(4.18)

2 a

t 2

= 2c

t2

.d+o

n

(D)

t ?n tsn n

and the like for 2 \ and 2 a ^ • This leads to

(4.19)

sn( Ki'K2'K3>K4> = ( |n ct3 / K3K4 j2- (rn{ 1 + 0n{ l ) )- (K3K4/ KiK2) ( 1 + 0n( 1) ) '

and together with (4.14) uniform equivalence follows: fixing arbitrary values of K and K', say KQ and Kg', we have, uniformly,

(4.20) Bn(K,K') * Bn(K0,K0'),

whence condition N4 holds for some choice of constants D, and D^ (which can be -1/2

obtained from lemma AS(ii) ) and 4>n : = ^ ( K Q . K Q ' ) ' . Application of theorem 3.2 yields

(4.21)

sup l PK i M W { \*n~l col { L - L, M - M } | i H } * BQ exp -bH2,

where b can be chosen arbitrarily close (from below) to D,/24y.

A similar inequality can be derived for the pair of estimators (K,M) but, as in example 1, the bounds for b are of poorer quality.

Example 3.

Consider the linear model

(4.22) Xt= 9 + et, t=l,2,...,n.

(41)

(4.23) I Pe ( n ) { n1 / 2 | ên - 9 | 2: H } * ( 2 / T T )1 / 2 exp - b H2/ 2 . For b we can take any value £ 1 . Theorem 3.2. allows us to take any b < 1 / 1 6 , which is a factor 16 too pessimistic. No other estimator can improve the value b = l ; see Kester (1985) chapter I I , example 1 . 1 .

In section 3 , we applied the very general theorem 2.1 to the problem of least-squares estimation. It would be nice to t r y our method on other M-estlmators, e.g. the Huber estimators in nonlinear regression, i.e. estimators maximizing a functional of the form

(4.24) C(Xn,e) := - 2 * ( Xt- ft( 6 f t

n tsn l

and to compare our bound for b with the exact rate of convergence obtained by Kester (1985) i n the case that e, are 1.1.d. and 0 is a location parameter ,i.e. f,(0) = 9. For details see Kester (1985) chapter I I . 4 b , theorem 4 . 2 .

However, we wish to point out that there are also situations where our theorems 2.1 -0

and 3.1 do not apply. For instance, consider the power model f.(0) = t , 0 € 6 := [0,aj, where a ^ $. This model is also discussed by Wu (1981), who shows that the LS estimator i s strongly consistent.

2 Our theorems do not apply because the rate of growth (in n) of 3J (f,(9) - f,(0')) depends on 0 and 9', whereas our theory assumes a 'uniform' growth rate in n. Hence a suitable norming <t (9) does not exist for this example (Has'minskii (1986), personal communication). An extension of theorem 2.1 to a theorem with more flexible normings would meet this difficulty and would also contribute to Ibragimov and Has'minskii's theory.

(42)

Appendix.

In this appendix, we list the lemmata we used in the paper.

Lemma A 1 .

Let (X,LO be a measurable space and let { IPv, 9 € 0 } be a family of probability measures on (X,U), where 0 is a Borel subset of IR . Let C be a real function from X x 0 to [0,oo) which i s , for all X € X , a positive continuous function of 0 and, for each 0 € 0 , a {U,B) - measurable function of X. Finally, let 0 ° be a subset of © which has a countable subset D which is dense i n 0 ° .

Then the following assertions hold:

(i) the random variable S(X) := sup C(X,9) Is Omeasurable; 060°

(ii) if 0 is compact then, for any X, the equation in t ( A . l ) sup C(X,6) = C(X,t)

060-has a solution (which we denote 9(X) ), which is Omeasurable;

(iii) if, for arbitrary (non-compact) 0 the existence of a solution to (A.l) is assumed, then there exists a measurable version 0(X) of this solution.

Proof.

(i) See Schmetterer (1974), Ch. V . 3 , lemma 3.2, page 3 0 7 .

We observe that any subset 0 ° of IR has a countable subset D which is dense in the closure 0 ° .

(ii) See Schmetterer (1974), Ch. V . 3 , lemma 3.3, page 307f. or Jennrich (1969), lemma 2.

(iii) The set 0 is Borel, whence it is possible to approximate i t by an increasing sequence of compact sets K.t 0 . Let 0(X) be the set of the 0 solving (A. 1).

(43)

it is also measurable, which can be seen as follows.

Let D be a countable dense subset of 6. Then'the event { i* > n } can be written as

n oo

{ X: 0 U U sup C(X,0) * C(X,r) - k"' }

1=1 k=l TED eeKj

which is clearly measurable by part (i) of this lemma.

Then

(A. 2) supC(X,6) = supC(X.e)

and also, because the K. are compact, the equation in t

(A.3) supC(X,6) = C(X,t)

Ki

has a measurable solution t=0. (X) for each i, as is seen by application of part (ii) of

this lemma. Combining equations (A.2) and (A.3) it follows that 0,»(X) provides a

solution to (A. 1), which is measurable because 1* is measurable.0

Lemma A2.

Let the quantities C, Z, 0 etc. be defined as in section 2. Then the following

inequality holds:

(A.4) IP

fl(e)

{ \<P^6)'

l

{6

p

- 6)1 H } * IP*

fl(6)

{ sup Z

c fl

(u) * 1 ).

u 6 U 6 )

e

Proof. See Ibragimov and Has'minskii (1981), Ch. 1.5 and Wu (1981), lemma l.D

Lemma A3.

Let f (u) be a real-valued function defined on a closed subset F of the Euclidean space

IR , which is measurable and separable. Let the following condition be fulfilled:

there exists numbers m^cr>k and a function C: IR -* IR, bounded on compact sets,

such that for all u,v 6 T

(44)

(l) IE|?(u)|

m

* CM.

(li) I E | ? { u ) - f ( v ) |m * C(u) | u - v |a.

Then a.s. the realisations of f (u) are continuous functions on T . Moreover, set w(h,?,L) := sup | f ( u ) - f ( v ) | ,

where the sup is taken over all u,v € T with |u-v| £ h, | u | ^ L, |v| £ L. Then IEw(h,£,L) * B ( sup C ( u ) )1 / m Lk / m h{a'k)/m, where B is a constant depending on m,ot and k.

Proof. See Ibragimov and Has'minskii (1981), p. 372 ff, where i n equation (8) Lk should be replaced by L ' m (printing error).

Lemma A 4 .

Let Y< .Y^.-.-.Y be independent random variables. Let d , , .... d be reals and let S := 2 d,Y..

i 5 i

Suppose there exist positive constants y . , i = l , 2 , . . , n , and A , ( A , possibly oo) such that, for all X € [-A, , A , ] and t = l , 2 , . . . , n one has

(A. 5) IE exp XYt * exp i ytX2.

Write G:= 5 Y j d .2 and A := A j / m a x { | d j |, .... | d j }. Then i^n

(A. 6) IP { Sn * x } £ exp - min { x2/ 2 G , A x / 2 }. The same inequalities hold i f we replace S by -S .

Proof. This lemma is a simple extension of theorem 16 of Petrov (1975),Ch. I I I . 4 . D

Lemma AS.

Let ( * , n € IN } be a sequence of positive definite symmetric matrices and let

(45)

symmetric matrices Indexed by the parameter K. For a l l K in K define the sequence

(A.7) Rn(K) := *n"1 / 2 Mn(K) *n~1 / 2, n € I N . Then the following assertion holds: the family M is uniformly equivalent (for a definition see section 4) to the sequence * i f f there exists an interval I :=[o,j8], w i t h jS>a>0, such that for all n 6 I N and a l l K € K, the spectrum of Rn(K) is contained i n the interval I.

Remarks:

(i) for *n we may always take M ^ K Q ) , where KQ is an arbitrary, but fixed, element in K;

(ii) i f a l l M (K) are of size 2 x 2 then it is also necessary and sufficient that the trace and determinant of R (K) remain i n some fixed positive interval for all n and K. In fact, one has, for any K,

(A.8) f i n f det R f K ) / t r R ( K ) ] * £ M (K) * f sup t r R„(K) ] *

L « n n j n n v. K n j n

Proof. If M - * then there exists an a > 0 such that, for all K and n,

(A. 9) a *n <■ M (K) ^ P *n.

-1/2

Now let x be any eigenvector of R J K ) and sandwich (A.8) between * ' x and i t s transpose; this yields a ^ X ^ p, where X Is the eigenvalue belonging to x. On the other hand, f r o m eigenvectors of R (K) one may form an orthonormal basis of IR so the converse reasoning also holds. D

References

Dzhaparidze, K.O. (1986). On asymptotic inference about intensity parameters of a counting process. Report MS-R86xx of the Centre for Mathematics and Computer

(46)

Science, Amsterdam.

Geer, Sara van de (1986). On rates of convergence in least squares estimation. Report MS-R86xx of the Centre for Mathematics and Computer Science, Amsterdam.

Ibragimov, I.A. and Has'minskii, R.Z. (1981). Statistical estimation: Asymptotic Theory. Springer, New York.

Ingster, Yu. I. (1984). Asymptotic regularity of a family of measures corresponding to a gaussian random process which contains a white noise component for a parametric family of spectral densities. J. Soviet Math., Vol. 25, No. 3 , p.

1165-1181.

Ivanov, A.V. (1976). An asymptotic expansion for the distribution of the least-squares estimator of the nonlinear regression parameter. Theory Probab. Appl Vol. 2 1 , p. 557-570.

Jennrich, Robert I. (1969). Asymptotic properties of nonlinear least squares estimators. Ann. Math. Statist., Vol. 4 0 , No. 2 , p. 633-643.

Kester, A.D.M. (1985). Some large deviation results in statistics. CWI Tract no. 18, Centre for Mathematics and Computer Science, Amsterdam. La'uter, Henning (1985). Strong consistency of the least squares estimator in

nonlinear regression. Preprint Akad. der Wissensch. der DDR, Berlin. Neveu, I. (1970). Calcul des Probabilités, Masson et Cie, Paris.

Petrov, V.V. (1975). Sums of independent random variables, Ergeb. Math. Grenzgeb. Springer, Berlin.

Prakasa Rao, B.L.S. (1984). On the exponential rate of convergence of the least squares estimator in the nonlinear regression model w i t h gaussian errors. Statist. Probab. Lett., V o l . 2 , p. 139-142.

Schmetterer, L. (1974). Introduction to Mathematical Statistics. Grundlehren Math. Wiss., Bd'. 2 0 2 , Springer, Berlin.

(47)

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, ■New York.

Vostrikova, L. Ju. (1984). On criteria for c(n)-consistency of estimators. Stochastics, Vol. 1 1 , p. 265-290.

Wu, Chien-Fu (1981). Asymptotic theory of nonlinear least squares estimation. Ann. Statist., Vol. 9., No. 3 , p. 501-513.

(48)

CHAPTER II

Inequalities for products of Toeplitz matrices and their inverses, with applications.

1. Introduction

Let f be any real function in L ([-T'.Tr)) with Fourier coefficients f, . With f we associate the finite section Toeplitz matrices T (f) with entries

T (f)i o =f, „• k,«=1.2 n.

nv 'V.,t k-£

The following limit relation seems to be well-known: under smoothness 1 ê

conditions on the functions f f one has

(1.1) T i - r y ^ f2. . . ^ ) - T J f ^ T ^ f2) . . . ^ ^ ) ] = 0(1)

as n-*° (note that the first term on the left-hand side increases as n ) . Results of this kind are given by Kac (1957) p.46-55, Grenander and Szegö (1958). Hirschmann (1971) (following Kac), Dakhmouche (1979), Coursol and Dacunha-Castelle (1982) and Taniguchi (1983, 1986). Kac and Hirschmann

1 lA

prove (1.1) in the case where all f are equal and f € 2f£ f~l B„ (for definitions, see section 2 ) .

Grenander and Szegö (1958), Ch. 8.1, p.123, show that the left-hand side of (1.1) is o(n) provided ?k=0(k~2) (k*0).

Dakhmouche, who considers different f in L 0B9, gives a proof which is obscured by many misprints. Coursol and Dacunha-Castelle in their proposition 2' also consider different f in ?£ flB„. Unfortunately, their proof could not convince us, see section 3. Taniguchi (1986) mentions (1.1) as a lemma; the proof is included in the proof of theorem 1 in his 1983 article, where he assumes that the f are even functions whose Fourier coefficients satisfy 2|k||f, | < °°.

(49)

In the case where 5=2, the right-hand side of (1.1) can be made more

1 lA

precise. For f ,g € M 0B„ the following non-asymptotic result is well-known-'

|Tr[Tn(fg) - Tn(f)Tn(g)]| i llfyigll^.

Applications in the spectral analysis of stationary time series, which will be discussed in the author's thesis, urge us to generalize this inequality to products of more than two Toeplitz matrices and to products containing also inverses of these matrices. We are also interested in the proximity of the off-diagonal elements of T (77f ) and I7T (f ).

Results of this type are laid down in theorems 1 and 2, for functions f1 in the Krein algebra L HB„.

This algebra contains all functions satisfying the conditions of the aforementioned authors, in particular those of Taniguchi (1983), whose

theorem 1 comes close to our results. Note that

■» 'A I 'A

L D B, D Sfê1 D B,.

An example proving the strict inclusion is the absolutely continuous function

oo 1

f(X) := 2 sin Xk, k=2 k log k

see Zygmund (1935), sctions 5.13, 6.31.

As an application of theorems 1 and 2 we prove theorem 3, a Szegb'-type theorem, stating that, under certain assumptions, the trace of <}>(T (f)), where 4> is an analytic function, can be approximated by — ƒ (})(f(X))dX.

2ir

From this result we deduce a new inequality related to Szegö's strong oo %

limit theorem for functions in L flB_ (theorem 4 ) .

We like to stress that in fact many lemmata currently used in the analysis of stationary time series can be obtained from theorem 1 and 2

in a simple and unified way, e.g. lemmata Al.2 (4,7-12) and Al.3 (1,2,4,5) of Dzhaparidze (1986).

(50)

Acknowledgement.

We are indebted to professors Ph.P.J.E. Clément (Technische Universiteit Delft) and M.A. Kaashoek (Vrije Universiteit Amsterdam) who suggested some useful references.

(51)

2. Definitions and main results. .

- Throughout we consider real functions on [-TT.TT). The real function spaces L ([-ir.Tr)) are denoted L . For f e L , the (complex) Fourier coefficients f of f are defined

n IT n

%, J

f = — | f(X)e~iXndX. 2ir -7T

- With f € L we associate the n x n finite section Toeplitz matrix T (f) with coefficients

We=X-e-

k

-

£=1

-

2 '

n

-Note that T (f) is Hermitean because f is real. n

Remark: some authors let k,ê run from 0 to n in the definition of

T (f). nv '

- The Besov space B„ is the linear space of functions f € L ([-TT.TT)) satisfying 2|k||f, | < <*>. The space B„ is equipped with the norm

llflljj - (2 |k|tfk|2) . k

14 2 Note that., by the Parseval equality, B. C L .

00 %

- Consider the Banach space of functions f € L (1 B. equipped with the

norm ( ' K r e i n n o r m ' )

IIfII,, := llfll + IIfII...

K m

a

00 'A

L fB_ (the 'Krein algebra') is, with the usual multiplication, a Banach algebra of functions:

(2.1) ll^f2!^ i llf^llf2!!

and lllll..= l. For the proof of this statement (in the complex case) see Krein (1970), theorem 1.

- VB is the linear space of functions f having absolutely summable

Fourier coefficients: 2|f. | < <*>.

(52)

necessarily being equal to lim f(A).

2 2 The L -modulus of continuity of f € L is defined

t)(e;f) := sup ( | |f (ir+ó)-f (X) |2dX) . |fi|<e J

-IT

where f(X+6) := f(X+6-2Tr) if X+6>TT.

The main object of study will be the difference matrix

e e

A (f1.f2 f*) := T ( IT f1) - IT T (f1) nv ' nv ' nv ' i=l i=l A is the set Z\{1,2 n} . n i • J

Because we shall also be concerned with series which are not absolutely summable we define

1 a, := lim 2 a,

k€A N-**> k€A k

n n |k|<N

if the limit exists.

Throughout M denotes any n x n complex matrix with operatornorm IIMII := max |Mx|.

xec"

|x|=l

The main results of this paper, are the following.

Theorem 1

Let f . f r

G L

n

B^.

e i

1.

Then, for any n x n complex matrix M,

( 2 . 2 ) |Tr M C T ^ f2. . . ^ ) - Tn(f ^ T J f2) . . . Tn( f ' ) ] | <

(53)

Remarks.

(i) This theorem remains true for functions f in the complex algebra

L

n

B

;.

(ii) If M has one entry equal to one and consists for the rest of zeros then an element-wise comparison is obtained.

(iii) If we set M. .=x.x., where x is an arbitrary n-vector of norm 1, we obtain from (2.2) II T (f!f2...fê) - T (f*)T (f2)...T (f5) II nv ' nv ' nv ' nv ' i (e-1) Hf1!! llf2llK...llf*llK. Theorem 2 Let f1, f^ r € L n Bg Let N be a subset of {1,2 «}." Let p. := 1 i t N, l := -1 i 6 N. i i —1 m

For those i which are element of N assume that f > 0 and (f ) € L . Then (2.3) Tr M _{ÏÏ (f}i1 P) i I. i„ _i=l i Pi 17 T (f1) nv ' 1=1 II M M ÏÏ 11(f) I i pi i=l

e-i + i iifjiui(fJ) ^ L ,

(54)

3. Proofs of main results

First we prove a special case of theorem 1. Note that f and g are not 00 n e c e s s a r i l y in L . P r o p o s i t i o n 1 L e t f , g e B* Then ( 3 . 1 ) Proof |Tr M[T ( f g ) - T ( f ) T ( g ) ] | i IIMII llfll,. Ilgll,,. 'A "&

"54-By lemma Al (which i s used i m p l i c i t l y by most a u t h o r s ) n Tr MAn(f .g) = 2 M...

i . j = l

2 f,-_flgo_,- " 2 f;_oS

eel

i-e&e-j ~ i-e&i-j

Hence = 2

eek

2

* * - j V i - *

4 . j = l |Tr MA ( f , g ) | < IIMII 2

eeA

\j=l 2

! « « - /

2 |f. i-e1 4 = 1 Now < IIMII ₂

_{2 l ^ j l}

* € An j = l 2 2 If- „I

i-e

[ '•eeA i = i n 2 2 |g£_jl = 2 min(n. | k | ) | gk|2 < Ilgll' k€Z «€An j=l

(55)

Remarks. ', (i) The inequality (3.1) is sharp: for M=I and f=g,

lim Tr[Tn(f2) - T (f)2] = llfll2. n -*»

(ii) If f € L and 2 |i| |f.| = o(n) and g satisfies the same

lil < n

condition,- then still 1

lim - T r [TJfg) - Tn(f)Tn(g)] = 0. n ^ »n

Now we proceed t o p r o v e theorem 1. We assume ê £ 3 .

To a p p l y p r o p o s i t i o n 1, we use a t e l e s c o p i n g t e c h n i q u e for

e-i

J_1

e

1 ê A (f1 r ) : nv < • i A

(f r ) =

i

(

n T

(f

1

))

A

(f

J

,

n

f ).

nv ' v nv " nv - ' j=l r=l s=j+l By the triangle inequality and proposition 1 .

e-i j-i e |Tr MA (f1 ie)\ <, 2 HM ff T (fr)ll llfJ'll., II 17 fSll...

n n ri n.

j=l r=l s=j+l By the submultiplicativity of the operatornorm and the Krein norm, see (2.1), and using also lemma A2 (ii) the theorem follows immediately. □

Remark. The proof of proposition 2' of Coursol and Dacunha-Castelle (1982) seems to use their formula (lemma 7)

dJ

— (log det Tngz) = (-l)J_1(j-l)!tr(Tnf(TngzfV. dzJ

where g := g+zf. They propose to prove theorem 1 by calculating

.. ae ' ■ ' ■

— — (log det T g ) | 0.

ox dz~-■-3z 1 « 1 i

(56)

The following proposition, needed in the proof of theorem 2, shows that T (f)"1 is close to T (f_ 1). nv ' nv ' P r o p o s i t i o n 2 L e t f € L * fl Bg, f > 0 a n d f"1 € L™. T h e n ( 3 . 2 ) |Tr M ( Tn( f_ 1) - Tn( f )_ 1) | < IIMII llf"1!!,, Ilf~1llJt"llf 11^. P r o o f

By lemma A3(ii), f_1 € B*. Writing

|Tr M(T

n

(f

-1

) - T

n

(f)

_1

)| = |Tr M T^f)"

1

(TJf JTJf"

1

) - TJff

-1

)) |

the assertion follows immediately from proposition 1 and lemma A2 (iii).D

Now we prove theorem 2.

4 ,pi W r i t e , f o r s i m p l i c i t y , g := (f ) By lemma A 3 ( i i ) , a l l g a r e in L D B_. I t i s s t r a i g h t f o r w a r d to show

A

(1)

:= IT y g

1

) - U TJf

1

**) * =**

i = l i = l 2 j = l

r J "

1 IT T (f ) 1=1 i ,Pi Tn( gj) - Tn( fJ' )P j' 4 = j + l On the right-hand side the j € N give zero contribution.

By proposition 2, the invariance of the trace under cyclic permutation and lemma A2 (iii),

(3.3) |Tr MA( 1 )| i ■ P. 2 IIMII llgJll lig0'»,. | | fJI L , II IT T ( f1) 1II II 2 T ( g1) ! j € N i = l i = j + l < IIMII

f

€ H = l 2 llgJ'll^ llf jl lJ 4. j€N

(57)

P u t i ( 2 ) i ÏÏ g*

4=1

-n V g

1

) .

i = l By theorem 1 , ( 3 . 4 ) |Tr MA^2^ j < ( £ - 1 ) IIMII Tl llg1! i = l

From (3.3) and (3.4), the inequality of the theorem follows. D

In special cases, the constant in theorem 2 may be improved. We use a corollary to the following proposition in the proof of theorem 4.

Proposition 3

Let the conditions of theorem 2 be fulfilled, with N = {1,2 6}.

Then e ( i ) |Tr M e -, e i 17 f i = l ÏÏ T ( f5) 1 - I nv ' | < e n iifln i ^ f1) i = l i - , - 1 , ( i i ) |Tr M P r o o f T ( II f1)- 1 IT T ( fJ) - I nv ' nv ' 1=1 j = l Pi „ „ ,ri , - l , | < e n iif iiKii(f ) no i=l