• Nie Znaleziono Wyników

LEAST SQUARES ESTIMATOR CONSISTENCY:

N/A
N/A
Protected

Academic year: 2021

Share "LEAST SQUARES ESTIMATOR CONSISTENCY:"

Copied!
27
0
0

Pełen tekst

(1)

LEAST SQUARES ESTIMATOR CONSISTENCY:

A GEOMETRIC APPROACH Jo ˜ A o Tiago Mexia

Universidade Nova de Lisboa, Departamento de Matem´ atica da Faculdade de Ciˆencias e Tecnologia

Quinta da Torre, 2825–114 Monte de Caparica, Portugal e-mail: parcr@fct.unl.pt

and

Jo ˜ A o Lita da Silva

Universidade Nova de Lisboa, Departamento de Matem´ atica da Faculdade de Ciˆencias e Tecnologia

Quinta da Torre, 2825–114 Monte da Caparica, Portugal e-mail: jfls@fct.unl.pt

Abstract

Consistency of LSE estimator in linear models is studied assuming that the error vector has radial symmetry. Generalized polar coordi- nates and algebraic assumptions on the design matrix are considered in the results that are established.

Keywords: linear models, least squares estimator, consistency, radial symmetry, generalized polar coordinates.

1. Introduction

Consistency of the least square estimator (LSE) in linear models has been

lately derived by several authors from distinct approaches (see for example

[3], [4], [5], [15], [16], [17], [18], [19] and [20]). We will assume the random er-

ror sequence e 1 , e 2 , . . . to have radial symmetry in the study of this problem.

(2)

It is worthwhile to point out that no assumption of error independence or of identical distribution for the e 1 , e 2 , . . . will be made. As a matter of fact radial symmetry ensures, as we shall see, the independence of the new random variables that we get when we use generalized polar coordinates. In connection with the use of this coordinates we will obtain the distributions of the relevant random variables. These results will be useful in establishing consistency for the LSE.

We now state the following definition.

Definition 1. A random vector (X 1 , . . . , X n ) has radial symmetry if it has joint density

f X 1 ,...,X n (x 1 , . . . , x n ) = g(r), r = q

x 2 1 + . . . + x 2 n ,

which depends only on the distance to the origin through some non-negative function g.

Given the linear model

(1.1) y n = X n β + e n ,

where the random vector e n := (e 1 , . . . , e n ) has radial symmetry, we will study, as mentioned above, the consistency of LSE estimator of the vector of (unknown) parameters

β := (β 1 , . . . , β κ ) .

2. Notations and preliminaries

Let us now recall some relevant notations and results. The spectral radius of A ∈ M κ ( ) is defined by ρ(A) := sup 

|λ| : λ ∈ Spec(A)

where Spec(A) is the spectrum of A and the transpose matrix of A will be A T . When there is no ambiguity, we will write

ρ n := ρ 

(X T n X n ) 1 

.

(3)

Moreover, F X (respectively F X 1 ,...,X n ) will be the distribution function of a given random variable X (respectively joint distribution function of the random vector (X 1 , . . . , X n )), f X (respectively f X 1 ,...,X n ) its probability density function (respectively joint probability density function), the symbol ∼ will be used to indicate distributed as, ≈ will mean asymptotically (or approximately) equal to, Ω n will be the range space of X n and P n e n (respectively P

n e n ) the orthogonal projection of e n on Ω n (respectively on Ω n ).

Our main purpose will be the study of the convergence of LSE using a geometrical approach, assuming the error vector to have radial symmetry. Thus it will be quite natural to avail ourselves of generalized polar coordinates. The transformation from cartesian to the new coordinates corresponds to the mapping,

n 3 (e 1 , . . . , e n ) 7→ (r, θ 1 , . . . , θ n−1 ) ∈]0, +∞[×]0, π[× . . . ×]0, π[×]0, 2π[

defined by

 

 

 

 

 

 

 

 

 

 

 

e 1 = r cos θ 1 e 2 = r sin θ 1 cos θ 2

.. .

e n−1 = r sin θ 1 sin θ 2 . . . sin θ n−2 cos θ n−1 e n = r sin θ 1 sin θ 2 . . . sin θ n−1

.

This mapping has the jacobian

J = r n−1 (sin θ 1 ) n−2 (sin θ 2 ) n−3 . . . sin θ n−2 > 0 . We now have a new pair of random variables (R n , Θ n−1 ) with

R n :=

q

e 2 1 + . . . + e 2 n

and Θ n−1 := (Θ 1 , . . . , Θ n−1 ) the vector of central angles. The joint density

of (R n , Θ n−1 ) will be given by

(4)

(2.1)

f R n n−1 (r, θ 1 , . . . , θ n−1 ) =

= g(r)r n−1 (sin θ 1 ) n−2 (sin θ 2 ) n−3 . . . sin θ n−2 .

Moreover, integrating this joint density on ]0, π[× . . . ×]0, π[×]0, 2π[ in order to θ 1 , . . . , θ n−1 leads to the probability density function of R n

(2.2) f R n (r) = nπ n/2

Γ n 2 + 1  g(r)r n−1 , r > 0 .

Likewise, if we integrate the joint density on ]0, +∞[ in order to r then the joint probability density function of Θ n−1 will be

(2.3) f Θ n−11 , . . . , θ n−1 ) = Γ n 2 + 1 

n π n/2 (sin θ 1 ) n−2 (sin θ 2 ) n−3 . . . sin θ n−2 which does not depend on the real function g. We extract now an important result.

Proposition 2.1. The random variables R n , Θ 1 , . . . , Θ n−1 are (mutually) independent.

P roof. From (2.3) it follows that the densities of the angles Θ 1 , . . . , Θ n−1 are,

f Θ 1 (x) = (sin x) n−2 Z π

0

(sin t) n−2 dt

, 0 < x < π ,

f Θ 2 (x) = (sin x) n−3 Z π

0

(sin t) n−3 dt

, 0 < x < π , .. .

f Θ n−2 (x) = sin x

2 , 0 < x < π ,

f Θ n−1 (x) = 1

2π , 0 < x < 2π

(5)

and thus

f Θ n−1 (x 1 , . . . , x n−1 ) = f Θ 1 (x 1 ) . . . f Θ n−1 (x n−1 ) . Hence

f R n n−1 (r, x 1 , . . . , x n−1 ) = f R n (r)·f Θ n−1 (x 1 , . . . , x n−1 ) =

= f R n (r)f Θ 1 (x 1 ) . . . f Θ n−1 (x n−1 ) , which proves the (mutual) independence of R n , Θ 1 , . . . , Θ n−1 .

The factorization,

(2.4) ||P n e n || 2 = ||P Ω n e n || 2

||e n || 2 ||e n || 2 = Z n ||e n || 2 = Z n R 2 n , where

(2.5) Z n := ||P n e n || 2

||e n || 2 gives us a first result about this random variable.

Lemma 2.1. The random variable Z n is bounded.

P roof. By Pitagoras formulae (see [13]) we have

0 6 Z n :=

P n e n 2 e n 2 =

P n e n 2 P n e n 2 + P

n e n 2 6 1 , which establishes the thesis.

When there is not multicollinearity, the LSE e β of β is given by:

β e = (X T n X n ) 1 X T n y n = β + (X T n X n ) 1 X T n e n .

We now present an upper bound for the LSE error

(6)

Lemma 2.2. In the linear model (1.1) we have (2.6)

e β − β

2 6 ρ n R n 2 Z n a.s.

P roof. We have, see [16], e β − β

2 6 ρ n ||P Ω n e n || 2 a.s.

so the thesis follows from factorization (2.4) .

An fundamental result for the last section are annunciated above.

Proposition 2.2. There exist an orthonormal basis 

w 1 (n), . . . , w n (n) of

n such that Z n =

w 1 (n), (cos Θ 1 , sin Θ 1 cos Θ 2 , . . . , sin Θ 1 . . . sin Θ n−1 ) 2

+ +. . . +

w κ (n), (cos Θ 1 , sin Θ 1 cos Θ 2 , . . . , sin Θ 1 . . . sin Θ n−1 ) 2

, n > κ , where h · , · i is the usual inner product defined on the vector space n . P roof. Since the design matrix X n := 

x ij 

i=1,2,...,n j=1,2,...,κ

has rank κ let us consider a basis of Ω n given by

x 1 := (x 11 , x 21 , . . . , x n1 ) , .. .

x κ := (x , x , . . . , x ) .

Using the Gram-Schmidt orthogonalization we can construct a orthonormal basis 

w 1 (n), . . . , w κ (n)

of Ω n and it is well known that the orthogonal complement Ω n of Ω n will admit a orthonormal basis 

w κ+1 (n), . . . , w n (n) . Supposing

w 1 (n) := w 11 (n), w 21 (n), . . . , w n1 (n)  , .. .

w n (n) := w (n), w (n), . . . , w (n) 

.

(7)

We can take the matrix

W n (n) =

 

 

 

 

w 11 (n) w 12 (n) . . . w 1n (n) w 21 (n) w 22 (n) . . . w 2n (n)

.. . .. . . .. .. . w n1 (n) w n2 (n) . . . w nn (n)

 

 

 

 

and the error vector e n can be expressed on the basis 

w 1 (n), . . . , w n (n) by e 0 n = W n (n)  T

e n . Therefore

||P Ω n e n || 2 = hw 1 (n), e n i 2 + . . . + hw κ (n), e n i 2 and the conclusion follows from the generalized polar coordinates.

Remark 1. Let us observe that from Proposition 2.1 we can conclude the independence of R n and Z n since Z n only depends of Θ 1 , . . . , Θ n−1 by the last Proposition 2.2.

Let us consider, on a probability space (Σ, F, P), a sequence {X n } of random variables and a random variable X. Given p > 0 we write:

1. X n −→ X if X a.s. n converges almost surely to X;

2. X n −→ X if X P n converges in probability to X;

3. X n −→ X if X L p n converges in mean of order p to X.

Let X n and X be random variables with distribution functions F n and F, respectively. If

n→+∞ lim F n (x) = F (x)

for every continuity point x of F, then X n is said to converge in distribution or in law to X, and we write X n d

−→ X. 1

1 Clearly, convergence in distribution is a property of the distribution functions of the

random variables and not of the random variables themselves. Recall that the random

variables X n may be defined on entirely different probability spaces. Moreover, given a

distribution function F there always exists, on some probability space, a random variable

X for which F (x) = P(X 6 x) (see [1]).

(8)

Let (X 1 , . . . , X n ) be a random vector induced by n observations x 1 , . . . , x n , which has joint distribution function depending, among others, on a vec- tor of (unknown) parameters λ belonging to a parameter space Λ ⊆ κ . The estimator t n := t n (X 1 , . . . , X n ) will be called strongly consistent for λ if t n −→ λ for each fixed λ ∈ Λ. Given s > 0 the estimator t a.s. n :=

t n (X 1 , . . . , X n ) will be called consistent in mean of order s for λ, if t n L s

−→ λ for each fixed λ ∈ Λ. Convergence in mean of order 2 will be called consistent in mean square.

3. The distribution of Z n

On the previous section, we showed that if e n had radial symmetry we could replace it by the pair (R n , Θ n−1 ) with joint density given by (2.1).

Now Z n only depends on Θ n−1 so that its density will not depend on the real function g. From Proposition 2.2 we get,

Z n =

w 1 (n), (cos Θ 1 , sin Θ 1 cos Θ 2 , . . . , sin Θ 1 . . . sin Θ n−1 ) 2

+ + . . . +

w κ (n), (cos Θ 1 , sin Θ 1 cos Θ 2 , . . . , sin Θ 1 . . . sin Θ n−1 ) 2

for some orthonormal basis 

w 1 (n), . . . , w κ (n)

of Ω n . Thus, to obtain the probability density function of Z n it can be assumed that g(r) is whatever non-negative function. Choosing

g(r) = 1

(2π) n/2 e r2 2 , we get for e n the joint density

f e 1 ,...,e n (x 1 , . . . , x n ) = 1

(2π) n/2 e x2 1 +...+x

2 n

2 ,

which is the standard multinormal distribution, i.e. e n ∼ N(0, I). Hence (see [22]) the components e 1 , . . . , e n are independent having each of them (univariate) standard normal distribution. By Cochran theorem (see [8]) the random variables

P n e n

2 ,

P ⊥ n e n

2 are independent and

P n e n

2 ∼ χ 2 (κ),

P ⊥ n e n

2 ∼ χ 2 (n − κ) .

(9)

We now establish

Proposition 3.1. Let X 1 , . . . , X m be independent random variables with densities f X i (x i ) (i = 1, . . . , m), and Y 1 , . . . , Y m random variables given by

Y j := X 1 + . . . + X j , j = 1, . . . , m .

Then (Y 1 , . . . , Y m ) has joint probability density function given by f Y 1 ,...,Y m (y 1 , . . . , y m ) = f X 1 (y 1 ) f X 2 (y 2 − y 1 ) . . . f X m (y m − y m−1 ) .

P roof. The system of m linear equations in m unknowns x 1 , . . . , x m

 

 

 

 

 

 

x 1 = y 1 x 1 + x 2 = y 2

.. .

x 1 + x 2 + . . . + x m = y m has an unique solution given by,

 

 

 

 

 

 

x 1 = y 1

x 2 = y 2 − y 1 .. .

x m = y m − y m−1 .

Hence, the joint probability density function of (Y 1 , . . . , Y m ) is expressed by

f Y 1 ,...,Y m (y 1 , . . . , y m ) = f X 1 ,...,X m (x 1 , . . . , x m )

|J(x 1 , . . . , x m )|

=f X 1 (y 1 ) · f X 2 (y 2 − y 1 ) . . . f X m (y m − y m−1 )

(10)

since the transformation

 

 

 

 

 

 

ϕ 1 (x 1 , . . . , x m ) = x 1

ϕ 2 (x 1 , . . . , x m ) = x 1 + x 2

.. .

ϕ n (x 1 , . . . , x m ) = x 1 + x 2 + . . . + x m has jacobian

J (x 1 , . . . , x m ) = det

 

 

 

 

 

 

∂ϕ 1

∂x 1 . . . ∂ϕ 1

∂x m .. . . .. .. .

∂ϕ m

∂x 1 . . . V ∂ϕ ∂x m

m

 

 

 

 

 

 

= det

 

 

 

 

1 . . . 0 .. . . .. ...

1 . . . 1

 

 

 

  = 1.

Setting X 1 :=

P n e n

2 , X 2 :=

P

n e n

2 , Y 1 :=

P n e n

2 and Y 2 :=

e n

2 the pair of random variables (Y 1 , Y 2 ) has, according to Proposition 3.1, the joint density

f Y 1 ,Y 2 (x, y) = f X 1 (x) f X 2 (y − x) . Hence, the density of Z n will be

f Z n (z) = Z

−∞ | t | f Y 1 ,Y 2 (zt, t) dt

= Z

−∞ | t | f X 1 (zt) f X 2 (t − zt) dt

= 1

Γ κ 2

 Γ

 n − κ 2

z κ 2 1 (1 − z) n−κ 2 1 Z

0

1

2 n 2 t n 2 1 e 2 t dt

=

Γ n 2



Γ κ 2

 Γ

 n − κ 2

z κ 2 1 (1 − z) n−κ 2 1 ,

(11)

if 0 < z < 1 and f Z n (z) = 0 otherwise. Therefore, the random variable Z n has distribution beta with parameters

 κ

2 , n − κ 2

 .

We could have obtained this last result through a different approach in which Proposition 3.1 is not used. The random variable W n defined by

W n :=

||P Ω n e n || 2 κ

P Ω n e n

2 n − κ

has F distribution with κ and (n − κ) degrees of freedom i.e.

f W n (w) =

=

Γ n 2



Γ κ 2

 Γ

 n − κ 2



 κ

n − κ

 κ 2 w κ 2 1



1 + κw n − κ

 n 2

, w > 0 .

Therefore, the probability density function of the random variable V n := κ

n − κ W n will be

f V n (v) =

Γ n 2



Γ κ 2

 Γ

 n − κ 2

 v κ 2 1 (1 + v) n 2 , v > 0 .

Applying the transformation u = v

v + 1 we get the integral identity Z

t

v a−1 (1 + v) a−b dv = Z 1

t t+1

u a−1 (1 − u) b−1 du, t > 0 .

(12)

Thus, taking a = κ 2 and b = n−κ 2 we get

(3.1) F V n (t) = F U n

 t t + 1



, t > 0 ,

where U n has the beta distribution with parameters

 κ 2 , n − κ

2

 . Moreover, since

(3.2) Z n 6 V n

we will have

(3.3) P(Z n > t ) 6 P (V n > t ) = P



U n > t t + 1



, t > 0

and we can use the distribution of U n to obtain upper bounds for P(Z n > t).

4. Estimator consistency

The main purpose of this section is to establish the consistency of LSE.

Nevertheless, we will start with some preparatory results for the LSE strong consistency.

Lemma 4.1. For any 0 6 α < 1 there exists n 0 ∈ such that

P 

Z n > ε n α



<

Γ n 2



Γ κ 2

 Γ

 n − κ 2

  ε n α

 κ 2 1  1 − ε

n α

 n−κ 2

(ε > 0)

for all n > n 0 .

P roof. If κ > 2 and n > κ + 2 then Z n has a unique mode (see [10]) given by

z 0 = κ − 2

n − 4

(13)

(if κ = 1 and n > 3 then f Z n (z) is monotonically decreasing on ]0, 1[). For all 0 < α < 1 we have

n→+∞ lim κ − 2 n − 4

ε n α

= 0, ε > 0 ,

which implies the existence of an order n 0 ∈ such that

z 0 = κ − 2 n − 4 < ε

n α , ∀n > n 0 , ε > 0 . Since f Z n (z) is monotonically decreasing on [z 0 , 1[ we can write

P 

Z n > ε n α



<  1 − ε

n α



f Z n  ε n α



, ∀n > n 0 , ε > 0 , completing the Lemma proof.

Lemma 4.2. For any 0 < α < 1 we have

n→+∞ lim n ξ

 1 − ε

n α

 n−κ 2

= 0, ξ ∈ , ε > 0 .

P roof. Choosing ξ ∈ we have for all ε > 0

n ξ  1 − ε

n α

 n−κ 2

= e log n ξ ( 1−ε ) n 2   1 − ε

n α

 κ 2

= e ξ log n+ n1−α 2 log ( 1− ε )  1 − ε

n α

 κ 2

= e n

1−α



ξ log n

n1−α + 1 2 log ( 1− ε ) 

 1 − ε

n α

 κ 2

.

(14)

Since 0 < α < 1 it follows

n→+∞ lim log  1 − ε

n α

 n α

= −ε and lim

n→+∞

ξ log n n 1−α = 0 so that,

n→+∞ lim n ξ

 1 − ε

n α

 n−κ 2

= 0 .

Nextly, we present two important results with stringent and direct influence on the LSE strong consistency.

Proposition 4.1. For any 0 < α < 1 we have P

 lim sup

n→+∞



Z n > 1 m n α

 

= 0, m = 1, 2, . . .

P roof. Since

n→+∞ lim

Γ  n 2



Γ

 n − κ 2

 n κ 2

= 2 κ 2

(see [14]) we obtain from Lemma 4.2, with ξ = α  1 − κ

2

 + s (s > 1) and

ε = 1

m (m = 1, 2, . . .)

n→+∞ lim

Γ n 2



Γ κ 2

 Γ

 n − κ 2



 1 m n α

 κ 2 1 

1 − 1 m n α

 n−κ 2

1 n s

=

= 1

m κ 2 1 Γ κ 2

 2 κ 2

n→+∞ lim n α+ κ 2 (1−α)+s



1 − 1 m n α

 n−κ 2

= 0 .

(15)

Thus, Lemma 4.1 ensures

n→+∞ lim P



Z n > 1 m n α



1 n s

6

6 lim

n→+∞

Γ n 2



Γ κ 2

 Γ

 n − κ 2



 1 m n α

 κ 2 1 

1 − 1 m n α

 n−κ 2

1 n s

= 0

and consequently,

+∞ X

n=1

P



Z n > 1 m n α



< +∞, m = 1, 2, . . .

provided that series

+∞ X

n=1

1

n s (s > 1) converges. The thesis now follows from the First Borel-Cantelli Lemma.

Remark 2. If we take the alternative path described at the end of the last section it is clear that, for any 0 < α < 1, there exits an order n 0 ∈ such that,

P



U n > ε n α + ε



<

Γ n 2



Γ κ 2

 Γ

 n − κ 2

  ε n α

 κ 2 1 

1− ε

n α + ε

 n−κ 2

(ε > 0)

for all n > n 0 . Moreover,

n→+∞ lim n ξ



1 − ε n α + ε

 n−κ 2

= 0, ξ ∈ , ε > 0

and the thesis of Proposition 4.1 follows from (3.2).

(16)

Proposition 4.2. For any 0 < α < 1 we have, n α Z n −−−−−→ a.s.

n→+∞ 0 .

P roof. According to Proposition 4.1 we get,

P

 lim sup

n→+∞



|n α Z n | > 1 m

 

= P

 lim sup

n→+∞



Z n > 1 m n α

 

= 0

for all m = 1, 2, . . . so that (see [6])

P



n→+∞ lim n α Z n = 0



= 1 .

Let us present now the first results on strong consistency of LSE.

Theorem 4.1. If e n has radial symmetry and for some 0 < α < 1, (4.1) ∃K > 0: lim sup

n→+∞



n α ρ n R 2 n



6 K a.s. ,

then e β is strongly consistent.

P roof. The thesis follows from estimate (2.6) and Proposition 4.2.

Corollary 4.1. If e n has radial symmetry, R n −→ R a.s. and ρ n = O(n α ) for some 0 < α < 1 then e β is strongly consistent.

P roof. According to the Corollary’s assumptions, the term n α ρ n R 2 n is bounded almost surely and the thesis follows from Theorem 4.1

The consistency in mean square of LSE can be obtained from the independence of R n and Z n (see Remark 1). We now consider assumptions on E R 2 n

 . Firstly we get

(17)

Theorem 4.2. If e n has radial symmetry, E R 2 n

 = O (n α ) and ρ n = o n 1−α 

, then e β is mean square consistent.

P roof. According to estimate (2.6) we have

E

 e β − β

2



6 ρ n E R 2 n

 E (Z n ) 6 C ρ n n α κ

n = κ C · ρ n

n 1−α = o(1) and so the thesis follows.

More generally, the independence of the random variables R n and Z n leads to s order mean consistency.

Theorem 4.3. If e n has radial symmetry, E (R s n ) = O (n α ) and ρ n = o 

n 1− s 

, then e β is s order mean consistent.

P roof. Given s > 0 the estimate (2.6) yields E 

e β − β

s 

6 ρ n s 2 · E (R n s ) E  Z n s 2 

= ρ n s 2 · E (R s n ) Γ n

2

 Γ

 s + κ 2



Γ  κ 2



Γ n+s 2  = o(1) ,

since Γ

 n + s 2



≈ n 2

 s/2

Γ  n 2



for n large enough.

Remark 3. The LSE consistency in mean of order s still remains valid if e n has radial symmetry and

ρ n = o n

[E (R s n )] 2/s

!

.

(18)

5. Extension to case α = 1

On Section 3 we saw that, if the random error sequence had radial symmetry then the random variable Z n will had distribution beta with parameters

κ 2 , n−κ 2 

. Therefore,

f nZ n (z) =

Γ n 2



Γ κ 2

 Γ

 n − κ 2

 n κ 2

z κ 2 1  1 − z

n

 n−κ 2 1

, 0 < z < n

and f nZ n (z) = 0 otherwise. From Proposition 2.2 we know that the random variable nZ n can be expressed by

n Z n = 

S (n−1)1 (n)  2

+ . . . + 

S (n−1)κ (n)  2

,

where for each i = 1, . . . , κ, S (n−1)i (n) := √

n cos Θ 1 w 1i (n) + √

n sin Θ 1 cos Θ 2 w 2i (n)+

+ . . . + √

n sin Θ 1 . . . sin Θ n−2 cos Θ n−1 w (n−1)i (n)+

+ √

n sin Θ 1 . . . sin Θ n−2 sin Θ n−1 w ni (n) .

Let us consider the triangular array T mi (n), n > 2, 1 6 m 6 n−1 of random variables defined by

 T 1i (2) := √ 2 

cos Θ 1 w 1i (2) + sin Θ 1 w 2i (2) 

 

T 1i (3) := √

3 cos Θ 1 w 1i (3) T 2i (3) := √

3 sin Θ 1 

cos Θ 2 w 2i (3) + sin Θ 2 w 3i (3) 

.. .

(19)

.. .

 

 

 

 

 

 

 

 

 

 

 

 

 

T 1i (n) := √ n cos Θ 1 w 1i (n) T 2i (n) := √ n sin Θ 1 cos Θ 2 w 2i (n)

.. .

T (n−2)i (n) := √ n sin Θ 1 . . . sin Θ n−3 cos Θ n−2 w (n−2)i (n) T (n−1)i (n) := √ n sin Θ 1 . . . sin Θ n−2 

cos Θ n−1 w (n−1)i (n)+sin Θ n−1 w ni (n) 

for each i = 1, . . . , κ. Thus

S 1i (2) = T 1i (2)

S 2i (3) = T 1i (3) + T 2i (3) .. .

S (n−1)i (n) = T 1i (n) + T 2i (n) + . . . + T (n−1)i (n)

for each i = 1, . . . , κ and since

E  T 1i (2) 

= 0 a.s.

E 

T 2i (3) | T 1i (3) 

= √

3 sin Θ 1 

w 2i (3) E(cos Θ 2 )+w 3i (3) E(sin Θ 2 ) 

= 0 a.s.

.. .

E 

T (n−1)i (n) | T 1i (n), . . . , T (n−2)i (n) 

=

= √

n sin Θ 1 . . . sin Θ n−2 

w (n−1)i (n)E(cos Θ n−1 )+w ni (n)E(sin Θ n−1 ) 

= 0 a.s.,

(20)

we can apply the Extended Kolmogorov Inequality and the Extended Bienaym´e Equality (see [12]) to each term of the sequence S (n−1)i (n), n > 2, which give us 2

P



16j61 max |S ji (2)| > ε

 6 1

ε 2 · E 

S 1i (2)  2 

6 E 2 Z 2

 ε 2 = κ

ε 2

P



16j62 max |S ji (3)| > ε

 6 1

ε 2 · E 

S 2i (3)  2 

6 E 3 Z 3

 ε 2 = κ

ε 2 .. .

P



16j6n−1 max |S ji (n)| > ε

 6 1

ε 2 · E 

S (n−1)i (n)  2 

6 E n Z n

 ε 2 = κ

ε 2

provided that E nZ n 

= κ, ∀n ∈ . Hence

P



16j6n−1 max |S ji (n)| < ε



> 1 − κ

ε 2 , ∀n ∈

and choosing the subsequence (η n ) which give us

n→+∞ lim



16j6η max n − 1 |S ji (η n )|



= lim sup

n→+∞



16j6n−1 max |S ji (n)|

 ,

it follows

P



16j6η max n − 1 |S ji (η n )| < ε



> 1 − κ

ε 2 , ∀n ∈ .

2 Observe that ES (n−1)i (n) = 0, ∀n > 2.

(21)

Thus, for n sufficiently large we get P

 sup

n6m



16j6n−1 max |S ji (m)|



< ε



> 1 − κ ε 2 that is, the random sequence 

S (n−1)i (n)

, i = 1, . . . , κ, is bounded in probability. Therefore, almost surely

lim sup

n→+∞ nZ n

exists and is finite. On the other side, it is easy to check that the sequence of functions f nZ n (z) converges pointwise on as n → +∞ i.e.

f nZ n (z) −−−−−→

n→+∞ f (z) =

 

 

 

 

 

 1 2 κ 2 Γ κ

2

 z κ 2 1 e z 2 se z > 0

0 se z 6 0

.

Consequently

(5.1) nZ n −−−−−→ n→+∞ d χ 2 (κ)

that is, nZ n converges in law to a random variable which has the χ 2 distribution with κ degrees of freedom, which implies lim sup

n→+∞

nZ n > 0 almost surely.

Natural improvements of the results of the last section follows now immediately

Theorem 5.1. Let p > 0. If e n has radial symmetry, ρ n = o n p  and (5.2) ∃K > 0: lim sup

n→+∞

R n 2

n p+1 6 K a.s. , then e β is strongly consistent.

P roof. The condition (5.2) leads to ρ n R 2 n Z n = ρ n n p · R 2 n

n p+1 · nZ n 6 K C ρ n n p a.s.

and the conclusion follows from the condition on spectral radius.

(22)

Corollary 5.1. Let p > 0. If e n has radial symmetry, R n n p

−→ R a.s. and ρ n = o 

n 1−2p 

, then e β is strongly consistent.

P roof. We have R 2 n n 2p

−→ R a.s. 2 (see [6]) and the thesis follows from Theorem 5.1.

Remark 4. If, see [22], we assume the radial symmetry of e n with differentiable function g or continuous marginal densities and also the independency of e 1 , e 2 , . . . , then each error has normal distribution with zero mean and equal variance.

6. Examples

To complete this geometrical approach to the consistency of LSE we present a few examples of application.

1. Multivariate normal distribution Suppose

g(r) = 1

(2π σ 2 ) n/2 e 2 σ2 r2 , σ > 0 ,

then each component of the random vector e n has distribution N (0, σ 2 ).

Hence, the sequence e 1 , e 2 , . . . is i.i.d. (see [22]) and according to classical Kolmogorov’s Strong Law of Large Numbers we get

R 2 n

n = e 2 1 + . . . + e 2 n n

−−−−−→ n→+∞ a.s. σ 2 ,

since E e 2 i 

= σ 2 for all i. According to Corollary 5.1 the LSE is strongly consistent if ρ n = o(1). Moreover, R 2 n has χ 2 distribution with n degrees of freedom so that E R 2 n

 = n and the LSE mean square consistency holds for

ρ n = o(1).

(23)

2. Kotz type distribution Let a, b > 0 and n + 2q > 2. If

g(r) =

a Γ n 2



b 2q+n−2 2a π n 2 Γ

 2q + n − 2 2a

 r 2q−2 e b r 2a ,

then the random vector e n has Kotz type distribution 3 (see [23]). For example, taking q = a − n 2 + 1 we obtain

f R 2

n (r) = 1

2 √ r f R n √ r 

= ab r a−1 e b r a , r > 0 ,

that is, R n 2 has Weibull-Gnedenko distribution with parameters (a, b). Since

E R 2 n

 = b a 1 Γ

 1 + 1

a

 ,

the LSE mean square consistency holds if ρ n = o(n). The strong consistency of LSE holds if ρ n = o(n) since

sup

m>n E

R 2 m − R 2 n

= sup

m>n

 E R 2 m

 − E R 2 n

 = 0 ,

implies R n 2 −→ R L 1 for some R ∈ L 1 . Therefore, R 2 n −→ R a.s. since R 2 n is an increasing sequence and R 2 n −→ R P .

3. Multivariate uniform distribution Let a > 0 and consider

g(r) = Γ  n

2 + 1  (a √

π) n , r < a ,

3 Note that if q = a = 1 and b = 1 2 (σ > 0) then we recover the multivariate normal

distribution of last example.

(24)

which corresponds to the situation where the random variable e n is distributed uniformly on an open ball centered in origin with radius a.

It is easy to check that R n has density f R n (r) = n

a n r n−1 , 0 < r < a and distribution

F R n (r) =

 

 

 

 

 

 

0 when r 6 0

 r a

 n

when 0 < r < a 1 when r > a

.

Thus

F R n (r) −−−−−→ n→+∞ F (r) =

 

 

 

 

1 when r > a

0 when r < a ,

and so R n −→ a which implies R P 2 n

−→ a a.s. 2 since R 2 n is an increasing sequence.

From Corollary 5.1 the strong consistency of LSE is ensured if ρ n = o(n).

On the other hand, the random variable R 2 n has density, f R 2

n (r) = n

2 a n r n 2 1 , 0 < r < a 2 , which implies E R n 2

 = n a 2

n + 2 and LSE mean square consistency holds if ρ n = o(n).

4. Multivariate t distribution Let q ∈ and consider

g(r) = Γ

 n + q 2

 n n 2 Γ  q

2

 (qπ) n 2

 1 + n

q r 2

 n+q 2

,

(25)

which corresponds to the situation in which e n := (e 1 , . . . , e n ) has multivariate t distribution with q degrees of freedom, precision matrix

T = diag n, . . . , n  and null skewness. The density of R 2 n will be

f R 2

n (r) = 1

2 √ r f R n √ r 

= n π n/2 2 Γ  n

2 + 1  g √ r 

r n 2 1 , r > 0 ,

so that

f R 2 n (r) =

Γ

 n + q 2



Γ  n 2

 Γ  q

2



 n q

 n 2 r n 2 1

 1 + n r

q

 n+q 2

, r > 0 .

Therefore, R 2 n has F distribution with degrees of freedom n and q, so E R 2 n

 = q

q − 2 (q > 2) .

Hence, LSE mean square consistency is guaranteed if ρ n = o(n). The strong consistency of LSE holds if ρ n = o(n): as a matter of fact

sup

m>n E

R m 2 − R 2 n

= sup

m>n

 E R 2 m

 − E R 2 n

 = 0

implies R 2 n −→ R L 1 for some R ∞ ∈ L 1 . Thus, R 2 n −→ R a.s. since R 2 n is an increasing sequence and R 2 n −→ R P .

References

[1] P. Billingsley, Probability and Measure, (third edition), John Wiley & Sons 1995.

[2] S. Cambanis, S. Huang and G. Simons, On the theory of elliptically contoured distributions, Journal of Multivariate Analysis 11 (1981), 368–385.

[3] X. Chen, Some results on consistency of LS estimates, Chin. Sci. Bull. 39

(22) (1994), 1872–1876.

(26)

[4] X. Chen, Consistency of LS estimates of multiple regression under a lower order moment condition, Sci. Chin. 38 (12) (1995), 1420–1431.

[5] X. Chen, A note on the consistency of LS estimates in linear models, Chin.

Ann. Math. Ser. B, 22 (4) (2001), 471–474.

[6] Y.S. Chow and H. Teicher, Probability Theory: Independence, Interchange- ability, Martingales (third edition), Springer 1997.

[7] K.L. Chung, A Course in Probability Theory (third edition), Academic Press 2001.

[8] D. Dacunha-Castelle et M. Duflo, Probabilit´es et Statistiques: Probl`emes

`

a Temps Fixe, Masson 1982.

[9] K. Fang, S. Kotz and K. Ng, Symmetric Multivariate and Related Distribu- tions, Monographs on Statistics and Applied Probability 36, Chapman & Hall 1990.

[10] V. Koroliouk, N. Portenko, A. Skorokhod and A. Tourbine, Aide-M´emoire de Th´eorie des Probabilit´es et de Statistique Math´ematique. Mir. (1983).

[11] M. Lo`eve, Probability Theory I (fourth edition), Springer 1977.

[12] M. Lo`eve, Probability Theory II (fourth edition), Springer 1978.

[13] L.T. Magalh˜ aes, ´ Algebra Linear como Introdu¸c˜ ao ´ a Matem´ atica Aplicada, Texto Editora 1992.

[14] B.M. Makarov, M.G. Goluzina, A.A. Lodkin, and A.N. Podkorytov, Selected Problems in Real Analysis, American Mathematical Society 1992.

[15] J.T. Mexia, and P. Corte Real, Extension of Kolmogorov’s strong law to multiple regression, Revista de Estat´ıstica, (2 o quadrimestre de 2001), 24 (2001), 277–278.

[16] J.T. Mexia, P. Corte Real, M.L. Esqu´ıvel, e J. Lita da Silva, Convergˆencia do estimador dos m´ınimos quadrados em modelos lineares, Estat´ıstica Jubilar.

Actas do XII Congresso da Sociedade Portuguesa de Estat´ıstica, Edi¸c˜ oes SPE, (2005), 455–466.

[17] J.T. Mexia e J. Lita da Silva, A consist¸encia do estimador dos m´ınimos quadrados em dom´ınios de atrac¸c˜ ao maximais, (to appear) 2005.

[18] J.T. Mexia and J. Lita da Silva, Least squares estimator consistency: on error

stability, (to appear) 2005.

(27)

[19] J. Mingzhong, Some new results of the strong consistency of multiple regres- sion coefficients, Proceedings of the Second Asian Mathematical Conference 1995 (Tangmanee, S. & Schulz, E. eds.), World Scientific (1995), 514–519.

[20] J. Mingzhong and X. Chen, Strong consistency of least squares estimate in multiple regression when the error variance is infinite, Stat. Sin. 9 (1) (1999), 289–296.

[21] W. Pestman, Mathematical Statistics, Walter de Gruyter Berlin 1998.

[22] C.R. Rao, Linear Statistical Inference and Its Applications, (second edition), John Wiley & Sons (1973).

[23] R. Schmidt, Tail dependence for elliptically contoured distributions, Mathe- matical Methods of Operations Research 55 (2002), 301–327.

[24] D. Williams, Probability with Martingales, Cambridge University Press, Cambridge 1991.

Received 9 September 2005

Cytaty

Powiązane dokumenty

Dzięki temu struktura pasji, w której dały się zauważyć cechy formy przekomponowanej, oratoryjnej i akompaniowanej, sukcesywnie ule- gała poszerzeniu, a teksty ewangeliczne

Struktura produkcji materiałów ogniotrwałych w Polsce w stosunku do struktury w całej UE charakteryzuje się więk- szym udziałem produkcji materiałów szamotowych, wysoko- glinowych

Following the literature review that identified the above DfBC approaches, an online survey was completed to gain a broad understanding of industries position within

Po skończeniu U n iw ersy tetu M iklaszew ski zostaje starszym asy sten­ tem przy K atedrze Chem ii Rolnej u prof.. M iklaszew skiego św iad­ czy depesza przesłan a

The prestige economy model offers a possible answer to the question of why throughout the Bronze Age metal found no application in the produc- tion of artefacts connected

W takim stadzie, chociaż na mniejszej populacji i w ograniczonym zakresie, prowadzi się nadal

, &#34;The Prediction of Yacht Per- formance From Tank Tests,&#34; Paper Read in Southampton at a Meeting of the Southern Joint Branch of The Royal Institution o f Naval Archi-

W tym celu nauczyciel omawia, jak jest zbudowany przewodnik/atlas i w jaki sposób należy z niego korzystać (wskazuje spis gatunków ptaków znajdujący się na przodzie lub