W. P O P I ´ N S K I (Warszawa)
ON LEAST SQUARES ESTIMATION OF FOURIER COEFFICIENTS AND OF THE REGRESSION FUNCTION
Abstract. The problem of nonparametric function fitting with the ob- servation model y
i= f (x
i) + η
i, i = 1, . . . , n, is considered, where η
iare independent random variables with zero mean value and finite variance, and x
i∈ [a, b] ⊂ R
1, i = 1, . . . , n, form a random sample from a distribution with density % ∈ L
1[a, b] and are independent of the errors η
i, i = 1, . . . , n.
The asymptotic properties of the estimator b f
N (n)(x) = P
N (n)k=1
b c
ke
k(x) for f ∈ L
2[a, b] and b c
N (n)= ( b c
1, . . . , b c
N (n))
Tobtained by the least squares method as well as the limits in probability of the estimators b c
k, k = 1, . . . , N , for fixed N , are studied in the case when the functions e
k, k = 1, 2, . . . , forming a complete orthonormal system in L
2[a, b] are analytic.
1. Introduction. Let y
i, i = 1, . . . , n, be observations at points x
i∈ [a, b] ⊂ R
1, according to the model y
i= f (x
i)+η
i, where f : [a, b] → R
1is an unknown square integrable function (f ∈ L
2[a, b]) and η
i, i = 1, . . . , n, are independent identically distributed random variables with zero mean value and finite variance σ
2η> 0. Let furthermore the points x
i, i = 1, . . . , n, form a random sample from a distribution with density % (% ≥ 0, R
ba
%(x) dx = 1), independent of the observation errors η
i, i = 1, . . . , n. If the functions e
k, k = 1, 2, . . . , constitute a complete orthonormal system in L
2[a, b], then f has the representation
f =
∞
X
k=1
c
ke
k, where c
k= 1 b − a
b
R
a
f (x)e
k(x) dx, k = 1, 2, . . . We assume that e
k, k = 1, 2, . . . , are analytic in (a, b) and continuous in [a, b].
Examples of orthonormal systems satisfying these requirements are [6] the
1991 Mathematics Subject Classification: Primary 62G07, 62F12.
Key words and phrases: Fourier series, least squares method, regression, consistent
estimator.
trigonometric functions in L
2[0, 2π] and Legendre polynomials in L
2[−1, 1].
As an estimator of the vector of coefficients c
N= (c
1, . . . , c
N)
T, for fixed N , we take the vector b c
Nobtained by the least squares method:
b c
N= arg min
aN∈RN n
X
i=1
(y
i− ha
N, e
N(x
i)i)
2, where b c
N= ( b c
1, . . . , b c
N)
T, e
N(x) = (e
1(x), . . . , e
N(x))
T.
To such estimators of the Fourier coefficients c
k, k = 1, . . . , N , there corresponds an estimator of the regression function f of the form
f b
N(x) =
N
X
k=1
b c
ke
k(x) , called a projection type estimator [4].
The vector b c
Ncan be obtained as a solution of the normal equations
(1) G
nb c
N= g
n,
where
G
n= 1 n
n
X
i=1
e
N(x
i)e
N(x
i)
T, g
n= 1 n
n
X
i=1
y
ie
N(x
i) .
The asymptotic properties of the least squares estimators of the regres- sion function obtained in the same way as described above but for the fixed point design case were examined in [5]. The problem of choosing the regres- sion order for least squares estimators in the case of equidistant observation points was investigated in [4].
In order to investigate the asymptotic properties of the estimators b c
k, k = 1, . . . , N , we introduce the probability space (Ω, F, P ), where
Ω =
×
∞ i=1[a, b], F =
×
∞ i=1F
i, P =
×
∞ i=1P
i,
where each F
i, i = 1, 2, . . . , is the σ-field of Borel subsets of [a, b], and P is a probability measure with the property
P
A
1× . . . × A
n×
×
∞ i=n+1[a, b]
= (P
1× . . . × P
n)(A
1× . . . × A
n) for A
i∈ F
i, i = 1, . . . , n, with P
i, for i = 1, 2, . . . , being the probability measure defined on F
iand having density % with respect to the Lebesgue measure µ. The construction and properties of such a probability measure P are described in [2]. The elements of Ω are denoted by ω = (x
1, x
2, . . .), x
i∈ [a, b] , i = 1, 2, . . .
If the distribution of the observation errors η
i, i = 1, 2, . . . (defined on
a certain probability space (Ψ, Θ, ν)), is known, a similar probability space
can be constructed, with elements of the form η = (η
1, η
2, . . .). From the two above described probability spaces we can of course construct in the usual way the corresponding product space with elements (ω, η) [2].
In the following section we examine the uniqueness of the estimators b c
k(ω, η), k = 1, . . . , N , for fixed N , and determine their limits in proba- bility, depending on the density %. In the third section we prove that the estimator b f
N (n)of the regression function corresponding to the Fourier coef- ficient estimators b c
k, k = 1, . . . , N (n), is consistent in the sense of the mean square prediction error
D
N (n)= 1 n E
ωE
ηn
X
i=1
(f (x
i) − b f
N (n)(x
i))
2(i.e. lim
n→∞D
N (n)= 0), on the condition that the density % is bounded and the sequence N (n) is properly chosen.
2. Uniqueness and consistency of Fourier coefficient estimators.
First we check whether the Fourier coefficient estimators b c
k, k = 1, . . . , N , are uniquely determined. In order to do this we need the following two lemmas.
Lemma 2.1. Let v
1, . . . , v
n∈ R
n. The matrix G
n= P
ni=1
v
iv
Tiis singu- lar (det G
n= 0) if and only if v
1, . . . , v
nare linearly dependent.
P r o o f. Suppose that G
nis singular and v
1, . . . , v
nare linearly indepen- dent. Then there exists a vector x 6= 0 for which G
nx = 0 so that
n
X
i=1
v
i(v
iTx) =
n
X
i=1
hv
i, xiv
i= 0 .
Since v
1, . . . , v
nare linearly independent, hv
i, xi = 0 for i = 1, . . . , n. But span{v
1, . . . , v
n} = R
nand consequently x must be zero, contrary to our assumption.
Conversely, if v
1, . . . ,v
nare linearly dependent, then dim span{v
1, . . . ,v
n}
< n and we can choose x 6= 0 such that hv
i, xi = 0 for i = 1, . . . , n. Conse- quently, G
nx = P
ni=1
hv
i, xiv
i= 0, which means that G
nis singular.
By the way, observe that a matrix of the form G
m= P
mi=1
v
iv
iT, where m < n, is always singular since dim span{v
1, . . . , v
m} ≤ m and there exist nonzero vectors orthogonal to span{v
1, . . . , v
m}.
Lemma 2.2. If % ∈ L
1[a, b] is a density (i.e. % ≥ 0, R
ba
%(x) dx = 1), then for n ≥ N the matrices
G
n(ω) = 1 n
n
X
i=1
e
N(x
i)e
N(x
i)
T, ω = (x
1, x
2, . . .) ,
of the normal equations (1) are positive-definite with probability one (in the probability space (Ω, F, P )).
P r o o f. From the definition of G
nit follows that G
n+1(ω) = n
n + 1 G
n(ω) + 1
n + 1 e
N(x
n+1)e
N(x
n+1)
T. So for x ∈ R
Nwe have the inequality
hG
n+1(ω)x, xi
= n
n + 1 hG
n(ω)x, xi + 1
n + 1 he
N(x
n+1)e
N(x
n+1)
Tx, xi
= n
n + 1 hG
n(ω)x, xi + 1
n + 1 he
N(x
n+1), xi
2≥ n
n + 1 hG
n(ω)x, xi . Hence Ω
n+1= {ω : det G
n+1(ω) = 0} ⊂ {ω : det G
n(ω) = 0} = Ω
nsince the matrices G
n(ω) are nonnegative-definite for n = 1, 2, . . . Thus in order to prove that P (Ω
n) = 0 for n ≥ N it suffices to prove P (Ω
N) = 0. (For n < N we have P (Ω
n) = 1, which is a simple consequence of our remark after the proof of Lemma 2.1.) By Lemma 2.1,
det G
N(ω) = 0 ⇔ e
N(x
1), . . . , e
N(x
N) are linearly dependent, where ω = (x
1, x
2, . . .), and consequently,
(2) Ω
N=
N
[
j=1
{ω : e
N(x
j) ∈ span{e
N(x
1), . . . , e
N(x
j−1), e
N(x
j+1), . . . , e
N(x
N)}} . Moreover,
P ({ω : e
N(x
j) ∈ span{e
N(x
1), . . . , e
N(x
j−1), e
N(x
j+1), . . . , e
N(x
N)}})
= P ({ω : e
N(x
N) ∈ span{e
N(x
1), . . . , e
N(x
N −1)}}) for j = 1, . . . , N , by the properties of the product measure P
1× . . . × P
N. Further,
P ({ω : e
N(x
N) ∈ span{e
N(x
1), . . . , e
N(x
N −1)}})
=
b
R
a
. . .
b
R
a
P
N(A
N) dP
1. . . dP
N −1, where A
N= (e
N)
−1(span{e
N(x
1), . . . , e
N(x
N −1)}) ⊂ [a, b], for fixed x
1, x
2, . . . , x
N −1, is the counter-image of the closed linear subspace span{e
N(x
1), . . . , e
N(x
N −1)} by the continuous mapping [a, b] 3 x
N7→
e
N(x
N) ∈ R
N(the continuity follows from the continuity of e
k, k = 1, 2, . . .).
Assume now that P
N(A
N) > 0 for fixed x
1, . . . , x
N −1. This means that the Lebesgue measure µ(A
N) is positive. For x
N∈ A
Nwe have
e
N(x
N) ∈ span{e
N(x
1), . . . , e
N(x
N −1)} ,
and dim span{e
N(x
1), . . . , e
N(x
N −1)} ≤ N − 1. On the other hand, span{e
N(x
N) : x
N∈ A
N} = R
Nsince for any v = (v
1, . . . , v
N)
T∈ R
Northogonal to the left-hand side he
N(x), vi =
N
X
k=1
v
ke
k(x) = 0 for x ∈ A
N,
and the condition µ(A
N) > 0 and the analyticity of e
k, k = 1, 2, . . . , imply immediately that v
1= . . . = v
N= 0.
Thus we obtain a contradiction. Consequently, P
N(A
N) = 0 for all x
1, . . . , x
N −1. This implies that
P ({ω : e
N(x
N) ∈ span{e
N(x
1), . . . , e
N(x
N −1)}}) = 0 and, by (2), P (Ω
N) = 0.
Lemma 2.2 assures that the estimators b c
1, . . . , b c
Nobtained from the nor- mal equations (1) are uniquely determined with probability one in the prob- ability space (Ω, F, P ), provided n ≥ N .
Observe now that the elements of the matrix G
n(ω) in (1) have the form g
nij(ω) = 1
n
n
X
k=1
e
i(x
k)e
j(x
k), ω = (x
1, x
2, . . .), i, j = 1, . . . , N , and we easily obtain
(3) E
ωg
nij(ω) = 1 n
n
X
k=1
E
ωe
i(x
k)e
j(x
k) =
b
R
a
e
i(x)e
j(x)%(x) dx = g
ij. The expected value exists because e
k, k = 1, 2, . . . , are continuous in [a, b].
Further, since x
1, x
2, . . . are chosen independently, E
ω(g
nij(ω) − g
ij)
2= 1
n
2n
X
k=1
E
ω(e
i(x
k)e
j(x
k) − g
ij)
2= 1 n
b
R
a
(e
i(x)e
j(x) − g
ij)
2%(x) dx
and we see that the elements of G
n(ω) converge in L
2to g
ijas n → ∞.
Similarly, for the elements of the right-hand side vector of the normal equations, g
n(ω, η), we obtain
Eg
ni(ω, η) = 1 n
n
X
k=1
Ey
ke
i(x
k) = 1 n
n
X
k=1
E
ωE
η(f (x
k) + η
k)e
i(x
k) (4)
= 1 n
n
X
k=1
E
ωf (x
k)e
i(x
k) =
b
R
a
f (x)e
i(x)%(x) dx = g
ifor i = 1, . . . , N , because the observation errors η
k, k = 1, 2, . . . , have zero mean values; moreover,
E(g
ni(ω, η) − g
i)
2= 1 n
2n
X
k=1
E
ω(f (x
k)e
i(x
k) − g
i)
2+ 1 n
2n
X
k=1
E
ωE
ηη
2ke
2i(x
k)
= 1 n
b
R
a
(f (x)e
i(x) − g
i)
2%(x) dx + 1 n σ
2ηb
R
a
e
2i(x)%(x) dx.
This implies that the elements of g
n(ω, η) converge in L
2to g
ias n → ∞, provided
b
R
a
f
2(x)%(x) dx < ∞ .
In that case we can determine the limits in probability of the estimators b c
1, . . . , b c
Nby applying the following lemma.
Lemma 2.3. Let (Ω, F, P ) be a probability space. Let A
n(ω), n = 1, 2, . . . , be a sequence of random matrices of fixed dimension k, nonsingular with probability one, and let y
n(ω) be a sequence of random vectors of dimension k. If
1) lim
n→∞A
n(ω) = A (in probability), where A is a nonsingular matrix ,
p2) lim
n→∞y
n(ω) = y,
pthen the sequence of random vectors x
n(ω) defined with probability one by the equations
A
n(ω)x
n(ω) = y
n(ω), n = 1, 2, . . . ,
converges in probability to the vector x which is the unique solution of the equation Ax = y.
P r o o f. Apply the fact that the elements of the inverse matrix A
−1are continuous functions of the elements of the matrix A.
In order to use Lemma 2.3 in the case of the normal equations (1) it is enough to show that the matrix G with elements g
ijdefined in (3) is positive-definite. Clearly, for any v = (v
1, . . . , v
N)
T∈ R
N,
hGv, vi =
N
X
i=1 N
X
j=1
g
ijv
iv
j=
N
X
i=1 N
X
j=1
v
iv
j bR
a
e
i(x)e
j(x)%(x) dx
=
b
R
a
X
Ni=1
v
ie
i(x)
2%(x) dx ≥ 0 .
Suppose that hGv, vi = 0. Since % is positive on some set with positive Lebesgue measure, P
Ni=1
v
ie
i(x) = 0 for x ∈ ∆, µ(∆) > 0, and then v
1=
. . . = v
N= 0 as already remarked in the proof of Lemma 2.2.
We can now formulate the result concerning the convergence in proba- bility of the estimators b c
1, . . . , b c
Nfor fixed N .
Theorem 2.1. If the density % ∈ L
1[a, b] satisfies R
ba
f
2(x)%(x) dx
< ∞, then the estimators b c
1, . . . , b c
N, N being fixed , are for n ≥ N uniquely determined with probability one and
(5) lim
n→∞
b c
N p= G
−1g ,
where b c
N= ( b c
1, . . . , b c
N)
T, G is the matrix with elements g
ij=
b
R
a
e
i(x)e
j(x)%(x) dx and g ∈ R
Nis the vector with components
g
i=
b
R
a
f (x)e
i(x)%(x) dx , i, j = 1, . . . , N .
P r o o f. The assertion follows from earlier considerations and from Lem- mas 2.2 and 2.3.
The vector G
−1g can be characterized more precisely. Namely, consider the functional defined for z ∈ R
Nby the formula
J (z) =
b
R
a
f (x) −
N
X
i=1
z
ie
i(x)
2%(x) dx, z = (z
1, . . . , z
N)
T. In order to find the points of extrema of J (z) we set its partial derivatives with respect to z
i, i = 1, . . . , N , to be zero and we obtain the system of linear equations Gz = g, with G positive-definite. So the components of b c
Nconverge in probability to the components of the vector G
−1g which minimizes the value of J (z).
In the case of constant density (% = 1/(b − a)) we obtain, by (5),
n→∞
lim b c
N p= c
N, c
N= (c
1, . . . , c
N)
T,
and so b c
1, . . . , b c
Nare then consistent estimators of the Fourier coefficients of f ∈ L
2[a, b].
3. Mean square prediction error and choice of the order of regression. Now we deal with the asymptotic properties of the projection type estimator of the regression function f :
f b
N(x) =
N
X
k=1
b c
ke
k(x) ,
where the vector of Fourier coefficient estimators b c
N= ( b c
1, . . . , b c
N)
Tis obtained from the normal equations (1),
b c
N(ω, η) = G
−1n(ω)g
n(ω, η) = G
−1n(ω) 1 n
n
X
i=1
(f (x
i) + η
i)e
N(x
i)
. From the above equality and the decomposition
f (x) =
N
X
k=1
c
ke
k(x) + r
N(x) = he
N(x), c
Ni + r
N(x), where r
N=
∞
X
k=N +1
c
ke
k, we obtain
b c
N(ω, η) = c
N+ G
−1n(ω) 1 n
n
X
i=1
r
N(x
i)e
N(x
i)
+ G
−1n(ω) 1 n
n
X
i=1
η
ie
N(x
i)
. Set a
N= (1/n) P
ni=1
r
N(x
i)e
N(x
i). In view of the equalities G
n= 1
n
n
X
i=1
e
N(x
i)e
N(x
i)
T, E
η(η
iη
j) = σ
η2δ
ij, i, j = 1, . . . , n , f (x) − b f
N(x) = hc
N− b c
N, e
N(x)i + r
N(x)
it is easy to show that E
η(f (x) − b f
N(x))
2= E
ηr
N2(x) + 2r
N(x)E
ηhc
N− b c
N, e
N(x)i + E
ηhc
N− b c
N, e
N(x)i
2= r
N2(x) − 2r
N(x)hG
−1na
N, e
N(x)i + hG
−1na
N, e
N(x)i
2+ 1
n σ
η2he
N(x), G
−1ne
N(x)i , and further,
1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2= 1 n
n
X
i=1
r
2N(x
i) − 2hG
−1na
N, a
Ni + hG
−1na
N, a
Ni + σ
2ηN n . Finally, we obtain the formula
(6) 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2= 1 n
n
X
i=1
r
N2(x
i) − hG
−1na
N, a
Ni + σ
η2N
n .
Since G
nis a.s. positive-definite for n ≥ N ,
(7) 0 ≤ 1
n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2≤ 1 n
n
X
i=1
r
N2(x
i) + σ
η2N n . In the case of constant density % = 1/(b − a), this inequality yields
E 1 n
n
X
i=1
(f (x
i) − b f
N(x
i))
2≤ 1 n
n
X
i=1
E
ωr
N2(x
i) + σ
2ηN n
= 1
b − a
b
R
a
r
2N(x) dx + σ
η2N n , and since
1 b − a
b
R
a
r
N2(x) dx = 1 b − a
∞
X
k=N +1
c
2kwe can rewrite the last inequality in the form
D
N= E 1 n
n
X
i=1
(f (x
i) − b f
N(x
i))
2≤ p
Nb − a + σ
2ηN n ,
where p
N=
∞
X
k=N +1
c
2k. Since the series P
∞k=1
c
2kis convergent (f ∈ L
2[a, b]) we conclude from the above inequality that in the case % = 1/(b − a) we have lim
n→∞D
N (n)= 0 provided lim
n→∞N (n) = ∞ and lim
n→∞N (n)/n = 0. The estimator b f
N (n)is then consistent in the sense of the mean square prediction error D
N (n). A similar result holds for the case of bounded density % as one can see from inequality (7).
If we define the prediction error by d
N (n)= 1
n
n
X
i=1
(f (x
i) − b f
N (n)(x
i))
2,
then the condition lim
n→∞D
N (n)= lim
n→∞Ed
N (n)= 0 implies of course lim
n→∞d
N (n)= 0. Consequently, the previously proved facts concerning
pthe convergence of the mean square prediction error D
N (n)allow us to for- mulate the following theorem.
Theorem 3.1. If the density % ∈ L
1[a, b] is bounded and the sequence of natural numbers N (n), n = 1, 2, . . . , satisfies
n→∞
lim N (n) = ∞, lim
n→∞
N (n)
n = 0 ,
then the estimator of the regression function
f b
N (n)=
N (n)
X
k=1
b c
ke
kis consistent in the sense of the prediction error d
N (n)(i.e. lim
n→∞d
N (n)=
p0 in (Ω, F, P )).
P r o o f. The assertion follows from Lemma 2.2 and from earlier consid- erations of Section 3.
Now we consider the problem of choosing the regression order N . If we know the values of p
N, N = 1, 2, . . . , and of σ
η2, we can choose N according to the criterion
(8) N
∗= arg min
1≤N ≤n
p
Nb − a + σ
2ηN n
. Then
D
N∗≤ p
N∗b − a + σ
η2N
∗n = min
1≤N ≤n
p
Nb − a + σ
2ηN n
.
If we only know some estimates p
0N≥ p
Nwe can replace p
Nby p
0Nin (8).
If the sequence |c
k|, k = 1, 2, . . . , is decreasing, then p
Nis a convex function (of N ) and so is A
N= p
N/(b − a) + σ
η2N/n, which cannot then have local minima; we thus have N
∗= max{N : c
2N≥ (b − a)σ
2η/n} [4].
The values of p
N, N = 1, 2, . . . , can of course be unknown, but we can define the statistic
s
N= 1 n
n
X
i=1
(y
i− b f
N(x
i))
2for which
E
ηs
N= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i) + η
i)
2= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2 n
n
X
i=1
E
ηf b
N(x
i)η
i+ σ
2η= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2 n
n
X
i=1
E
ηh b c
N, e
N(x
i)iη
i+ σ
2η= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2 n
n
X
i=1
E
ηG
−1n1 n
n
X
j=1
y
je
N(x
j)
, e
N(x
i)
η
i+ σ
η2= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2 n
n
X
i=1
E
ηG
−1n1 n
n
X
j=1
η
je
N(x
j)
, e
N(x
i)
η
i+ σ
η2= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2 n
2σ
η2n
X
i=1
hG
−1ne
N(x
i), e
N(x
i)i + σ
2η= 1 n
n
X
i=1
E
η(f (x
i) − b f
N(x
i))
2− 2σ
η2N n + σ
2η. Hence, remembering the definition of D
N, we obtain (9) Es
N= E
ωE
ηs
N= D
N− 2σ
2ηN
n + σ
η2, which can be rewritten in the form
E
s
N+ 2σ
η2N n
= D
N+ σ
η2.
So if we choose N (the order of regression) according to the criterion N
∗= arg min
1≤N ≤n
s
N+ 2σ
2ηN n
we can assert that in the mean we obtain those values of N which minimize D
N[4]. This kind of criterion for the choice of N is known in the literature as the Mallows–Akaike criterion [1], [3].
4. Conclusions. It is worth remarking that we can obtain a better lower bound for the mean square prediction error than the obvious one D
N≥ 0. We apply the following lemma proved in [5].
Lemma 4.1. Let h = (h
1, . . . , h
n)
T∈ R
n. Then 1
n
2n
X
i=1 n
X
j=1
h
ih
je
N(x
i)
TG
−1ne
N(x
j) ≤ 1 n
n
X
i=1
h
2i.
Since a
N= (1/n) P
ni=1
r
N(x
i)e
N(x
i) and G
n> 0 a.s. for n ≥ N , putting h
i= r
N(x
i), i = 1, . . . , n, by Lemma 4.1 we obtain
0 ≤ hG
−1na
N, a
Ni ≤ 1 n
n
X
i=1
r
N(x
i)
2almost surely for n ≥ N . Now, taking into account (6) we easily obtain the
lower and upper bounds for D
N, valid for n ≥ N : (10) σ
η2N
n ≤ D
N≤ M
%p
N+ σ
η2N
n , where M
%= sup
a≤x≤b
%(x) .
From (9) and (10) it follows immediately that in the case when % is bounded and the conditions lim
n→∞N (n) = ∞ and lim
n→∞N (n)/n = 0 are satis- fied, s
N (n)is an asymptotically unbiased estimator of σ
η2.
The lower and upper bounds for D
N (n)also allow us to estimate the bias of s
N (n)for n ≥ N (n), namely
−σ
2ηN (n)
n ≤ Es
N (n)− σ
2η≤ M
%p
N (n)− σ
η2N (n) n .
The results presented in the two preceding sections can be easily proved in the case of regression functions f ∈ L
2(A), A ⊂ R
m, m > 1, µ(A) < ∞, and certain complete orthonormal systems of functions (like the functions
exp(ikx + ily)/2π, 0 ≤ x, y ≤ 2π, k, l = 0, ±1, ±2, . . . , forming a complete orthonormal system in L
2([0, 2π] × [0, 2π])).
References
[1] H. A k a i k e, A new look at the statistical model identification, IEEE Trans. Automat.
Control AC-19 (1974), 716–723.
[2] Y. S. C h o w and H. T e i c h e r, Probability Theory, Independence, Interchangeability, Martingales, Springer, Heidelberg, 1978.
[3] C. L. M a l l o w s, Some comments on C
p, Technometrics 15 (1973), 661–675.
[4] B. T. P o l y a k and A. B. T s y b a k o v, Asymptotic optimality of the C
pcriterion in projection type estimation of a regression function, Teor. Veroyatnost. i Primenen.
35 (1990), 305–317 (in Russian).
[5] E. R a f a j l o w i c z, Nonparametric least-squares estimation of a regression function, Statistics 19 (1988), 349–358.
[6] G. S a n s o n e, Orthogonal Functions, Interscience, New York, 1959.
WALDEMAR POPI ´NSKI
RESEARCH AND DEVELOPMENT CENTER OF STATISTICS AL. NIEPODLEG LO´SCI 208
00-925 WARSZAWA, POLAND