ON LEAST SQUARES ESTIMATION OF FOURIER COEFFICIENTS AND OF THE REGRESSION FUNCTION

(1)

W. P O P I ´ N S K I (Warszawa)

ON LEAST SQUARES ESTIMATION OF FOURIER COEFFICIENTS AND OF THE REGRESSION FUNCTION

Abstract. The problem of nonparametric function fitting with the observation model y

i

= f (x

i

) + η

i

, i = 1, . . . , n, is considered, where η

i

are independent random variables with zero mean value and finite variance, and x

i

∈ [a, b] ⊂ R

¹

, i = 1, . . . , n, form a random sample from a distribution with density % ∈ L

¹

[a, b] and are independent of the errors η

i

, i = 1, . . . , n.

The asymptotic properties of the estimator b f

N (n)

(x) = P

N (n)

k=1

b c

k

e

k

(x) for f ∈ L

²

[a, b] and b c

^{N (n)}

= ( b c

1

, . . . , b c

N (n)

)

^T

obtained by the least squares method as well as the limits in probability of the estimators b c

k

, k = 1, . . . , N , for fixed N , are studied in the case when the functions e

k

, k = 1, 2, . . . , forming a complete orthonormal system in L

²

[a, b] are analytic.

1. Introduction. Let y

i

, i = 1, . . . , n, be observations at points x

i

∈ [a, b] ⊂ R

¹

, according to the model y

i

= f (x

i

)+η

i

, where f : [a, b] → R

¹

is an unknown square integrable function (f ∈ L

²

[a, b]) and η

i

, i = 1, . . . , n, are independent identically distributed random variables with zero mean value and finite variance σ

²_η

> 0. Let furthermore the points x

i

, i = 1, . . . , n, form a random sample from a distribution with density % (% ≥ 0, R

b

a

%(x) dx = 1), independent of the observation errors η

i

, i = 1, . . . , n. If the functions e

k

, k = 1, 2, . . . , constitute a complete orthonormal system in L

²

[a, b], then f has the representation

f =

∞

X

k=1

c

k

e

k

, where c

k

= 1 b − a

b

R

a

f (x)e

k

(x) dx, k = 1, 2, . . . We assume that e

k

, k = 1, 2, . . . , are analytic in (a, b) and continuous in [a, b].

Examples of orthonormal systems satisfying these requirements are [6] the

1991 Mathematics Subject Classification: Primary 62G07, 62F12.

Key words and phrases: Fourier series, least squares method, regression, consistent

estimator.

(2)

trigonometric functions in L

²

[0, 2π] and Legendre polynomials in L

²

[−1, 1].

As an estimator of the vector of coefficients c

^N

= (c

1

, . . . , c

N

)

^T

, for fixed N , we take the vector b c

^N

obtained by the least squares method:

b c

^N

= arg min

a^N∈R^N n

X

i=1

(y

i

− ha

^N

, e

^N

(x

i

)i)

²

, where b c

^N

= ( b c

1

, . . . , b c

N

)

^T

, e

^N

(x) = (e

1

(x), . . . , e

N

(x))

^T

.

To such estimators of the Fourier coefficients c

k

, k = 1, . . . , N , there corresponds an estimator of the regression function f of the form

f b

N

(x) =

N

X

k=1

b c

k

e

k

(x) , called a projection type estimator [4].

The vector b c

^N

can be obtained as a solution of the normal equations

(1) G

n

b c

^N

= g

n

,

where

G

n

= 1 n

n

X

i=1

e

^N

(x

i

)e

^N

(x

i

)

^T

, g

n

= 1 n

n

X

i=1

y

i

e

^N

(x

i

) .

The asymptotic properties of the least squares estimators of the regression function obtained in the same way as described above but for the fixed point design case were examined in [5]. The problem of choosing the regression order for least squares estimators in the case of equidistant observation points was investigated in [4].

In order to investigate the asymptotic properties of the estimators b c

k

, k = 1, . . . , N , we introduce the probability space (Ω, F, P ), where

Ω =

×

∞ i=1

[a, b], F =

×

∞ i=1

F

i

, P =

×

∞ i=1

P

i

,

where each F

i

, i = 1, 2, . . . , is the σ-field of Borel subsets of [a, b], and P is a probability measure with the property

P

A

1

× . . . × A

_n

×

∞ i=n+1

[a, b]

= (P

1

× . . . × P

_n

)(A

1

× . . . × A

_n

) for A

i

∈ F

i

, i = 1, . . . , n, with P

i

, for i = 1, 2, . . . , being the probability measure defined on F

i

and having density % with respect to the Lebesgue measure µ. The construction and properties of such a probability measure P are described in [2]. The elements of Ω are denoted by ω = (x

1

, x

2

, . . .), x

i

∈ [a, b] , i = 1, 2, . . .

If the distribution of the observation errors η

i

, i = 1, 2, . . . (defined on

a certain probability space (Ψ, Θ, ν)), is known, a similar probability space

(3)

can be constructed, with elements of the form η = (η

1

, η

2

, . . .). From the two above described probability spaces we can of course construct in the usual way the corresponding product space with elements (ω, η) [2].

In the following section we examine the uniqueness of the estimators b c

k

(ω, η), k = 1, . . . , N , for fixed N , and determine their limits in probability, depending on the density %. In the third section we prove that the estimator b f

N (n)

of the regression function corresponding to the Fourier coefficient estimators b c

k

, k = 1, . . . , N (n), is consistent in the sense of the mean square prediction error

D

_{N (n)}

= 1 n E

ω

E

η

n

X

i=1

(f (x

i

) − b f

_{N (n)}

(x

i

))

²

(i.e. lim

n→∞

D

N (n)

= 0), on the condition that the density % is bounded and the sequence N (n) is properly chosen.

2. Uniqueness and consistency of Fourier coefficient estimators.

First we check whether the Fourier coefficient estimators b c

k

, k = 1, . . . , N , are uniquely determined. In order to do this we need the following two lemmas.

Lemma 2.1. Let v

1

, . . . , v

n

∈ R

ⁿ

. The matrix G

n

= P

n

i=1

v

i

v

^T_i

is singular (det G

n

= 0) if and only if v

1

, . . . , v

n

are linearly dependent.

P r o o f. Suppose that G

n

is singular and v

1

, . . . , v

n

are linearly independent. Then there exists a vector x 6= 0 for which G

n

x = 0 so that

n

X

i=1

v

i

(v

_i^T

x) =

n

X

i=1

hv

_i

, xiv

i

= 0 .

Since v

1

, . . . , v

n

are linearly independent, hv

i

, xi = 0 for i = 1, . . . , n. But span{v

1

, . . . , v

n

} = R

ⁿ

and consequently x must be zero, contrary to our assumption.

Conversely, if v

1

, . . . ,v

n

are linearly dependent, then dim span{v

1

, . . . ,v

n

}

< n and we can choose x 6= 0 such that hv

i

, xi = 0 for i = 1, . . . , n. Conse- quently, G

n

x = P

n

i=1

hv

_i

, xiv

i

= 0, which means that G

n

is singular.

By the way, observe that a matrix of the form G

m

= P

m

i=1

v

i

v

_i^T

, where m < n, is always singular since dim span{v

1

, . . . , v

m

} ≤ m and there exist nonzero vectors orthogonal to span{v

1

, . . . , v

m

}.

Lemma 2.2. If % ∈ L

¹

[a, b] is a density (i.e. % ≥ 0, R

b

a

%(x) dx = 1), then for n ≥ N the matrices

G

n

(ω) = 1 n

n

X

i=1

e

^N

(x

i

)e

^N

(x

i

)

^T

, ω = (x

1

, x

2

, . . .) ,

(4)

of the normal equations (1) are positive-definite with probability one (in the probability space (Ω, F, P )).

P r o o f. From the definition of G

n

it follows that G

n+1

(ω) = n

n + 1 G

n

(ω) + 1

n + 1 e

^N

(x

n+1

)e

^N

(x

n+1

)

^T

. So for x ∈ R

^N

we have the inequality

hG

_n+1

(ω)x, xi

= n

n + 1 hG

_n

(ω)x, xi + 1

n + 1 he

^N

(x

n+1

)e

^N

(x

n+1

)

^T

x, xi

= n

n + 1 hG

_n

(ω)x, xi + 1

n + 1 he

^N

(x

n+1

), xi

²

≥ n

n + 1 hG

_n

(ω)x, xi . Hence Ω

n+1

= {ω : det G

n+1

(ω) = 0} ⊂ {ω : det G

n

(ω) = 0} = Ω

n

since the matrices G

n

(ω) are nonnegative-definite for n = 1, 2, . . . Thus in order to prove that P (Ω

n

) = 0 for n ≥ N it suffices to prove P (Ω

N

) = 0. (For n < N we have P (Ω

n

) = 1, which is a simple consequence of our remark after the proof of Lemma 2.1.) By Lemma 2.1,

det G

N

(ω) = 0 ⇔ e

^N

(x

1

), . . . , e

^N

(x

N

) are linearly dependent, where ω = (x

1

, x

2

, . . .), and consequently,

(2) Ω

N

=

N

[

j=1

{ω : e

^N

(x

j

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

j−1

), e

^N

(x

j+1

), . . . , e

^N

(x

N

)}} . Moreover,

P ({ω : e

^N

(x

j

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

j−1

), e

^N

(x

j+1

), . . . , e

^N

(x

N

)}})

= P ({ω : e

^N

(x

N

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)}}) for j = 1, . . . , N , by the properties of the product measure P

1

× . . . × P

_N

. Further,

P ({ω : e

^N

(x

N

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)}})

=

b

R

a

. . .

b

R

a

P

N

(A

N

) dP

1

. . . dP

N −1

, where A

N

= (e

^N

)

⁻¹

(span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)}) ⊂ [a, b], for fixed x

1

, x

2

, . . . , x

N −1

, is the counter-image of the closed linear subspace span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)} by the continuous mapping [a, b] 3 x

N

7→

e

^N

(x

N

) ∈ R

^N

(the continuity follows from the continuity of e

k

, k = 1, 2, . . .).

Assume now that P

N

(A

N

) > 0 for fixed x

1

, . . . , x

N −1

. This means that the Lebesgue measure µ(A

N

) is positive. For x

N

∈ A

_N

we have

e

^N

(x

N

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)} ,

(5)

and dim span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)} ≤ N − 1. On the other hand, span{e

^N

(x

N

) : x

N

∈ A

_N

} = R

^N

since for any v = (v

1

, . . . , v

N

)

^T

∈ R

^N

orthogonal to the left-hand side he

^N

(x), vi =

N

X

k=1

v

k

e

k

(x) = 0 for x ∈ A

N

,

and the condition µ(A

N

) > 0 and the analyticity of e

k

, k = 1, 2, . . . , imply immediately that v

1

= . . . = v

N

= 0.

Thus we obtain a contradiction. Consequently, P

N

(A

N

) = 0 for all x

1

, . . . , x

N −1

. This implies that

P ({ω : e

^N

(x

N

) ∈ span{e

^N

(x

1

), . . . , e

^N

(x

N −1

)}}) = 0 and, by (2), P (Ω

N

) = 0.

Lemma 2.2 assures that the estimators b c

1

, . . . , b c

N

obtained from the normal equations (1) are uniquely determined with probability one in the probability space (Ω, F, P ), provided n ≥ N .

Observe now that the elements of the matrix G

n

(ω) in (1) have the form g

nij

(ω) = 1

n

X

k=1

e

i

(x

k

)e

j

(x

k

), ω = (x

1

, x

2

, . . .), i, j = 1, . . . , N , and we easily obtain

(3) E

ω

g

nij

(ω) = 1 n

n

X

k=1

E

ω

e

i

(x

k

)e

j

(x

k

) =

b

R

a

e

i

(x)e

j

(x)%(x) dx = g

ij

. The expected value exists because e

k

, k = 1, 2, . . . , are continuous in [a, b].

Further, since x

1

, x

2

, . . . are chosen independently, E

ω

(g

nij

(ω) − g

ij

)

²

= 1

n

²

n

X

k=1

E

ω

(e

i

(x

k

)e

j

(x

k

) − g

ij

)

²

= 1 n

b

R

a

(e

i

(x)e

j

(x) − g

ij

)

²

%(x) dx

and we see that the elements of G

n

(ω) converge in L

²

to g

ij

as n → ∞.

Similarly, for the elements of the right-hand side vector of the normal equations, g

n

(ω, η), we obtain

Eg

ni

(ω, η) = 1 n

n

X

k=1

Ey

k

e

i

(x

k

) = 1 n

n

X

k=1

E

ω

E

η

(f (x

k

) + η

k

)e

i

(x

k

) (4)

= 1 n

n

X

k=1

E

ω

f (x

k

)e

i

(x

k

) =

b

R

a

f (x)e

i

(x)%(x) dx = g

i

(6)

for i = 1, . . . , N , because the observation errors η

k

, k = 1, 2, . . . , have zero mean values; moreover,

E(g

ni

(ω, η) − g

i

)

²

= 1 n

²

n

X

k=1

E

ω

(f (x

k

)e

i

(x

k

) − g

i

)

²

+ 1 n

²

n

X

k=1

E

ω

E

η

²_k

e

²_i

(x

k

)

= 1 n

b

R

a

(f (x)e

i

(x) − g

i

)

²

%(x) dx + 1 n σ

²_η

b

R

a

e

²_i

(x)%(x) dx.

This implies that the elements of g

n

(ω, η) converge in L

²

to g

i

as n → ∞, provided

b

R

a

f

²

(x)%(x) dx < ∞ .

In that case we can determine the limits in probability of the estimators b c

1

, . . . , b c

N

by applying the following lemma.

Lemma 2.3. Let (Ω, F, P ) be a probability space. Let A

n

(ω), n = 1, 2, . . . , be a sequence of random matrices of fixed dimension k, nonsingular with probability one, and let y

n

(ω) be a sequence of random vectors of dimension k. If

1) lim

n→∞

A

n

(ω) = A (in probability), where A is a nonsingular matrix ,

^p

2) lim

n→∞

y

n

(ω) = y,

^p

then the sequence of random vectors x

n

(ω) defined with probability one by the equations

A

n

(ω)x

n

(ω) = y

n

(ω), n = 1, 2, . . . ,

converges in probability to the vector x which is the unique solution of the equation Ax = y.

P r o o f. Apply the fact that the elements of the inverse matrix A

⁻¹

are continuous functions of the elements of the matrix A.

In order to use Lemma 2.3 in the case of the normal equations (1) it is enough to show that the matrix G with elements g

ij

defined in (3) is positive-definite. Clearly, for any v = (v

1

, . . . , v

N

)

^T

∈ R

^N

,

hGv, vi =

N

X

i=1 N

X

j=1

g

ij

v

i

v

j

=

N

X

i=1 N

X

j=1

v

i

v

j b

R

a

e

i

(x)e

j

(x)%(x) dx

=

b

R

a

X

^N

i=1

v

i

e

i

(x)

2

%(x) dx ≥ 0 .

Suppose that hGv, vi = 0. Since % is positive on some set with positive Lebesgue measure, P

N

i=1

v

i

e

i

(x) = 0 for x ∈ ∆, µ(∆) > 0, and then v

1

=

. . . = v

N

= 0 as already remarked in the proof of Lemma 2.2.

(7)

We can now formulate the result concerning the convergence in probability of the estimators b c

1

, . . . , b c

N

for fixed N .

Theorem 2.1. If the density % ∈ L

¹

[a, b] satisfies R

b

a

f

²

(x)%(x) dx

< ∞, then the estimators b c

1

, . . . , b c

N

, N being fixed , are for n ≥ N uniquely determined with probability one and

(5) lim

n→∞

b c

^{N p}

= G

⁻¹

g ,

where b c

^N

= ( b c

1

, . . . , b c

N

)

^T

, G is the matrix with elements g

ij

=

b

R

a

e

i

(x)e

j

(x)%(x) dx and g ∈ R

^N

is the vector with components

g

i

=

b

R

a

f (x)e

i

(x)%(x) dx , i, j = 1, . . . , N .

P r o o f. The assertion follows from earlier considerations and from Lem- mas 2.2 and 2.3.

The vector G

⁻¹

g can be characterized more precisely. Namely, consider the functional defined for z ∈ R

^N

by the formula

J (z) =

b

R

a

f (x) −

N

X

i=1

z

i

e

i

(x)

2

%(x) dx, z = (z

1

, . . . , z

N

)

^T

. In order to find the points of extrema of J (z) we set its partial derivatives with respect to z

i

, i = 1, . . . , N , to be zero and we obtain the system of linear equations Gz = g, with G positive-definite. So the components of b c

^N

converge in probability to the components of the vector G

⁻¹

g which minimizes the value of J (z).

In the case of constant density (% = 1/(b − a)) we obtain, by (5),

n→∞

lim b c

^{N p}

= c

^N

, c

^N

= (c

1

, . . . , c

N

)

^T

,

and so b c

1

, . . . , b c

N

are then consistent estimators of the Fourier coefficients of f ∈ L

²

[a, b].

3. Mean square prediction error and choice of the order of regression. Now we deal with the asymptotic properties of the projection type estimator of the regression function f :

f b

N

(x) =

N

X

k=1

b c

k

e

k

(x) ,

(8)

where the vector of Fourier coefficient estimators b c

^N

= ( b c

1

, . . . , b c

N

)

^T

is obtained from the normal equations (1),

b c

^N

(ω, η) = G

⁻¹_n

(ω)g

n

(ω, η) = G

⁻¹_n

(ω) 1 n

n

X

i=1

(f (x

i

) + η

i

)e

^N

(x

i

)

. From the above equality and the decomposition

f (x) =

N

X

k=1

c

k

e

k

(x) + r

N

(x) = he

^N

(x), c

^N

i + r

_N

(x), where r

N

=

∞

X

k=N +1

c

k

e

k

, we obtain

b c

^N

(ω, η) = c

^N

+ G

⁻¹_n

(ω) 1 n

n

X

i=1

r

N

(x

i

)e

^N

(x

i

)

+ G

⁻¹_n

(ω) 1 n

n

X

i=1

η

i

e

^N

(x

i

)

. Set a

^N

= (1/n) P

n

i=1

r

N

(x

i

)e

^N

(x

i

). In view of the equalities G

n

= 1

n

X

i=1

e

^N

(x

i

)e

^N

(x

i

)

^T

, E

η

(η

i

η

j

) = σ

_η²

δ

ij

, i, j = 1, . . . , n , f (x) − b f

N

(x) = hc

^N

− b c

^N

, e

^N

(x)i + r

N

(x)

it is easy to show that E

η

(f (x) − b f

N

(x))

²

= E

η

r

_N²

(x) + 2r

N

(x)E

η

hc

^N

− b c

^N

, e

^N

(x)i + E

η

hc

^N

− b c

^N

, e

^N

(x)i

²

= r

_N²

(x) − 2r

N

(x)hG

⁻¹_n

a

^N

, e

^N

(x)i + hG

⁻¹_n

a

^N

, e

^N

(x)i

²

+ 1

n σ

_η²

he

^N

(x), G

⁻¹_n

e

^N

(x)i , and further,

1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

= 1 n

n

X

i=1

r

²_N

(x

i

) − 2hG

⁻¹_n

a

^N

, a

^N

i + hG

⁻¹_n

a

^N

, a

^N

i + σ

²_η

N n . Finally, we obtain the formula

(6) 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

= 1 n

n

X

i=1

r

_N²

(x

i

) − hG

⁻¹_n

a

^N

, a

^N

i + σ

_η²

N

n .

(9)

Since G

n

is a.s. positive-definite for n ≥ N ,

(7) 0 ≤ 1

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

≤ 1 n

n

X

i=1

r

_N²

(x

i

) + σ

_η²

N n . In the case of constant density % = 1/(b − a), this inequality yields

E 1 n

n

X

i=1

(f (x

i

) − b f

N

(x

i

))

²

≤ 1 n

n

X

i=1

E

ω

r

_N²

(x

i

) + σ

²_η

N n

= 1

b − a

b

R

a

r

²_N

(x) dx + σ

_η²

N n , and since

1 b − a

b

R

a

r

_N²

(x) dx = 1 b − a

∞

X

k=N +1

c

²_k

we can rewrite the last inequality in the form

D

N

= E 1 n

n

X

i=1

(f (x

i

) − b f

N

(x

i

))

²

≤ p

N

b − a + σ

²_η

N n ,

where p

N

=

∞

X

k=N +1

c

²_k

. Since the series P

∞

k=1

c

²_k

is convergent (f ∈ L

²

[a, b]) we conclude from the above inequality that in the case % = 1/(b − a) we have lim

n→∞

D

N (n)

= 0 provided lim

n→∞

N (n) = ∞ and lim

n→∞

N (n)/n = 0. The estimator b f

N (n)

is then consistent in the sense of the mean square prediction error D

_{N (n)}

. A similar result holds for the case of bounded density % as one can see from inequality (7).

If we define the prediction error by d

_{N (n)}

= 1

n

X

i=1

(f (x

i

) − b f

_{N (n)}

(x

i

))

²

,

then the condition lim

n→∞

D

_{N (n)}

= lim

n→∞

Ed

_{N (n)}

= 0 implies of course lim

n→∞

d

N (n)

= 0. Consequently, the previously proved facts concerning

p

the convergence of the mean square prediction error D

_{N (n)}

allow us to formulate the following theorem.

Theorem 3.1. If the density % ∈ L

¹

[a, b] is bounded and the sequence of natural numbers N (n), n = 1, 2, . . . , satisfies

n→∞

lim N (n) = ∞, lim

n→∞

N (n)

n = 0 ,

(10)

then the estimator of the regression function

f b

_{N (n)}

=

N (n)

X

k=1

b c

k

e

k

is consistent in the sense of the prediction error d

_{N (n)}

(i.e. lim

n→∞

d

_{N (n)}

=

^p

0 in (Ω, F, P )).

P r o o f. The assertion follows from Lemma 2.2 and from earlier considerations of Section 3.

Now we consider the problem of choosing the regression order N . If we know the values of p

N

, N = 1, 2, . . . , and of σ

_η²

, we can choose N according to the criterion

(8) N

^∗

= arg min

1≤N ≤n

p

N

b − a + σ

²_η

N n

. Then

D

N^∗

≤ p

N^∗

b − a + σ

_η²

N

^∗

n = min

1≤N ≤n

p

N

b − a + σ

²_η

N n

.

If we only know some estimates p

⁰_N

≥ p

_N

we can replace p

N

by p

⁰_N

in (8).

If the sequence |c

k

|, k = 1, 2, . . . , is decreasing, then p

_N

is a convex function (of N ) and so is A

N

= p

N

/(b − a) + σ

_η²

N/n, which cannot then have local minima; we thus have N

^∗

= max{N : c

²_N

≥ (b − a)σ

²_η

/n} [4].

The values of p

N

, N = 1, 2, . . . , can of course be unknown, but we can define the statistic

s

N

= 1 n

n

X

i=1

(y

i

− b f

N

(x

i

))

²

for which

E

η

s

N

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

) + η

i

)

²

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2 n

n

X

i=1

E

η

f b

N

(x

i

)η

i

+ σ

²_η

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2 n

n

X

i=1

E

η

h b c

^N

, e

^N

(x

i

)iη

i

+ σ

²_η

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2 n

n

X

i=1

E

η

G

⁻¹_n

1 n

n

X

j=1

y

j

e

^N

(x

j

)

, e

^N

(x

i

)

η

i

+ σ

_η²

(11)

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2 n

n

X

i=1

E

η

G

⁻¹_n

1 n

n

X

j=1

η

j

e

^N

(x

j

)

, e

^N

(x

i

)

η

i

+ σ

_η²

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2 n

²

σ

_η²

n

X

i=1

hG

⁻¹_n

e

^N

(x

i

), e

^N

(x

i

)i + σ

²_η

= 1 n

n

X

i=1

E

η

(f (x

i

) − b f

N

(x

i

))

²

− 2σ

_η²

N n + σ

²_η

. Hence, remembering the definition of D

N

, we obtain (9) Es

N

= E

ω

E

η

s

N

= D

N

− 2σ

²_η

N

n + σ

_η²

, which can be rewritten in the form

E

s

N

+ 2σ

_η²

N n

= D

N

+ σ

_η²

.

So if we choose N (the order of regression) according to the criterion N

^∗

= arg min

1≤N ≤n

s

N

+ 2σ

²_η

N n

we can assert that in the mean we obtain those values of N which minimize D

N

[4]. This kind of criterion for the choice of N is known in the literature as the Mallows–Akaike criterion [1], [3].

4. Conclusions. It is worth remarking that we can obtain a better lower bound for the mean square prediction error than the obvious one D

N

≥ 0. We apply the following lemma proved in [5].

Lemma 4.1. Let h = (h

¹

, . . . , h

n

)

^T

∈ R

ⁿ

. Then 1

n

²

n

X

i=1 n

X

j=1

h

i

h

j

e

^N

(x

i

)

^T

G

⁻¹_n

e

^N

(x

j

) ≤ 1 n

n

X

i=1

h

²_i

.

Since a

^N

= (1/n) P

n

i=1

r

N

(x

i

)e

^N

(x

i

) and G

n

> 0 a.s. for n ≥ N , putting h

i

= r

N

(x

i

), i = 1, . . . , n, by Lemma 4.1 we obtain

0 ≤ hG

⁻¹_n

a

^N

, a

^N

i ≤ 1 n

n

X

i=1

r

N

(x

i

)

²

almost surely for n ≥ N . Now, taking into account (6) we easily obtain the

(12)

lower and upper bounds for D

N

, valid for n ≥ N : (10) σ

_η²

N

n ≤ D

N

≤ M

%

p

N

+ σ

_η²

N

n , where M

%

= sup

a≤x≤b

%(x) .

From (9) and (10) it follows immediately that in the case when % is bounded and the conditions lim

n→∞

N (n) = ∞ and lim

n→∞

N (n)/n = 0 are satis- fied, s

N (n)

is an asymptotically unbiased estimator of σ

_η²

.

The lower and upper bounds for D

N (n)

also allow us to estimate the bias of s

_{N (n)}

for n ≥ N (n), namely

−σ

²_η

N (n)

n ≤ Es

_{N (n)}

− σ

²_η

≤ M

%

p

N (n)

− σ

_η²

N (n) n .

The results presented in the two preceding sections can be easily proved in the case of regression functions f ∈ L

²

(A), A ⊂ R

^m

, m > 1, µ(A) < ∞, and certain complete orthonormal systems of functions (like the functions

exp(ikx + ily)/2π, 0 ≤ x, y ≤ 2π, k, l = 0, ±1, ±2, . . . , forming a complete orthonormal system in L

²

([0, 2π] × [0, 2π])).

References

[1] H. A k a i k e, A new look at the statistical model identification, IEEE Trans. Automat.

Control AC-19 (1974), 716–723.

[2] Y. S. C h o w and H. T e i c h e r, Probability Theory, Independence, Interchangeability, Martingales, Springer, Heidelberg, 1978.

[3] C. L. M a l l o w s, Some comments on C

p

, Technometrics 15 (1973), 661–675.

[4] B. T. P o l y a k and A. B. T s y b a k o v, Asymptotic optimality of the C

p

criterion in projection type estimation of a regression function, Teor. Veroyatnost. i Primenen.

35 (1990), 305–317 (in Russian).

[5] E. R a f a j l o w i c z, Nonparametric least-squares estimation of a regression function, Statistics 19 (1988), 349–358.

[6] G. S a n s o n e, Orthogonal Functions, Interscience, New York, 1959.

WALDEMAR POPI ´NSKI

RESEARCH AND DEVELOPMENT CENTER OF STATISTICS AL. NIEPODLEG LO´SCI 208

00-925 WARSZAWA, POLAND

Received on 1.12.1992