THE TWO-DIMENSIONAL LINEAR RELATION IN THE ERRORS-IN-VARIABLES MODEL WITH REPLICATION OF ONE VARIABLE

(1)

A. C Z A P K I E W I C Z and A. L. D A W I D O W I C Z (Krak´ ow)

THE TWO-DIMENSIONAL LINEAR RELATION IN THE ERRORS-IN-VARIABLES MODEL WITH REPLICATION OF ONE VARIABLE

Abstract. We present a two-dimensional linear regression model where both variables are subject to error. We discuss a model where one vari- able of each pair of observables is repeated. We suggest two methods to construct consistent estimators: the maximum likelihood method and the method which applies variance components theory. We study asymptotic properties of these estimators. We prove that the asymptotic variances of the estimators of regression slopes for both methods are comparable.

1. Introduction. A problem sometimes encountered in data analysis is to find a relation between two or more variables. In this paper we discuss the two-dimensional case, where both observables are not measured precisely.

Thus let us consider the model

(1) X i = s i + ε i , Y i = as i + b + δ i , i = 1, . . . , n,

where the disturbance errors ε i and δ i are independent random variables, with mean and variance equal to zero and σ _ε ² , σ _δ ² , respectively. We assume s i

to be an unknown constant. This case is known in the literature as a func- tional model (Kendall and Stuart 1979). It is well known (Reiersol 1950) that this model, with errors having normal distributions with unknown vari- ances, is nonidentifiable. To overcome this difficulty we need an additional assumption, for example, that the distribution of errors is nonnormal or that either one error variance is known or the ratio of the variances are known.

Another approach to construct consistent estimators of regression slopes in model (1) is repeating the random variables X i , Y i m i times (Cox 1976,

2000 Mathematics Subject Classification: Primary 62J05; Secondary 62F10, 62J12.

Key words and phrases: linear regression, consistent estimator.

[335]

(2)

Dolby 1976, Bunke and Bunke 1989). In this case we have

(2) X ij = s i + ε ij , Y ij = as i + b + δ ij , i = 1, . . . , n, j = 1, . . . , m i . In this paper we consider a particular case of the model with replications.

We will prove that repeating only one variable, for example Y i , enables us to construct consistent estimators of the unknown parameters of the linear relation.

We discuss the model

(3) X i = s i + ε i , Y ij = as i + b + δ ij , i = 1, . . . , n, j = 1, . . . , m.

The variables X i , Y ij are observables, the variables s i are unknown constants and ε i , δ ij are assumed to have independent normal distribution with mean zero and unknown variances σ _ε ² and σ ² _δ :

ε i ∼ N (0, σ ² _ε ), δ ij ∼ N (0, σ ² _δ ).

For constructing consistent estimators of the unknown parameters we present two methods: the maximum likelihood method and a method (Czap- kiewicz 1999) based on variance components theory. We compare these two methods by comparing the mean squared errors.

2. Maximum likelihood method

2.1. Methodology. We can express the observations X i , Y ij in (3) as z i = [X i , Y i1 , . . . , Y im ] ⁰ , i = 1, . . . , n.

The independent random vectors z i have means depending on i:

µ i = [s i , as i + b, . . . , as i + b] ⁰ and a common (m + 1) × (m + 1) covariance matrix:

Σ =







σ _ε ² 0 . . . 0 0 σ ² _δ . . . 0

.. .

0 0 . . . σ ² _δ





 .

The log-likelihood function has the form L(θ) = const − n ln σ ε − nm ln σ _δ

− 1 2

ⁿ X

i=1

(X i − s _i ) ² σ ² _ε +

n

X

i=1 m

X

j=1

(Y ij − as _i − b) ² σ _δ ²

where L(θ) = L(a, s 1 , . . . , s n , b, σ ε , σ δ ). Solving the log-likelihood equations is not easy. Cox (1976) gives the solutions for model (2) where both X i and Y i are repeated m times. When we assume that X ij = X i for each j in Cox’

model, we can use his solutions for our purposes.

(3)

To write estimators, set s yy =

n

X

i=1 m

X

j=1

(Y ij − Y _i. ) ² /(nm), b yy =

n

X

i=1

(Y i. − Y ) ² /n,

b xx =

n

X

i=1

(X i − X) ² /n, b xy =

n

X

i=1

(X i − X)(Y i. − Y )/n and

B(a) = b yy − 2ab _xy + a ² b xx .

Solving the likelihood equations we get estimators in terms of a:

(4)

b b = Y − aX,

σ b _ε ² = s yy + (b yy − ab _xy ) ² /B(a), σ b _δ ² = s xx + (ab xx − b _xy ) ² /B(a),

s b i = (ab xx − b _xy )(Y i. − Y + aX) + (b _yy − ab _xy )X i /B(a).

But to get an estimator of a we must solve an equation of the fourth degree in a:

(5) −s yy (ab xx − b yx )B(a) − (b yy − ab xy )(ab xx − b xy )(b yy − a ² b xx ) = 0.

When m > 2 we solve (5) numerically and then check whether the absolute maximum has been found.

2.2. Asymptotic behaviour of maximum likelihood estimators. In this section we look for the asymptotic properties of maximum likelihood esti- mators in the model discussed in the previous section. The random vectors z i are independent, with normal but not identical distribution. The expecta- tions of their distributions depend on i. The number of unknown parameters which we estimate increases with n.

Assume that s i , i = 1, . . . , n, belong to a bounded set as n tends to infinity and the following two limits exist:

n→∞ lim 1 n

n

X

i=1

s i and lim

n→∞

1 n

n

X

i=1

s ² _i . Then we can prove:

Lemma 1. When n → ∞ and m → ∞, the solutions of the likelihood equations give strongly consistent estimators of the unknown parameters a, b, σ δ , σ ε . For sufficiently large n and m, the variance of the vector (6) [ b a − a, b s 1 − s ₁ , . . . , ( b s n − s _n )(b b − b), b σ ε − σ _ε , b σ δ − σ _δ ] can be approximated by

(7)

− E

∂ ²

∂ξ∂φ L(θ)

−1

where ξ, φ belong to the set of unknown parameters.

(4)

This lemma may be proved by a method analogous to that described in Lehmann’s monograph (1983, p. 404, Th. 4.1). We thus have the following asymptotic variances of unknown regression slopes:

Theorem 1. When n and m are large, the asymptotic variances of b a and b b, avar( b a) and avar(b b), are

avar( b a) = ma ² σ _ε ² + σ ² _δ m P n

i=1 (s i − s) ² , (8)

avar(b b) = ma ² σ _ε ² + σ ² _δ

mn ·

P n i=1 s ² _i P n

i=1 (s i − s) ² . (9)

P r o o f. To show the formula for avar( b a), let us calculate ∂L/∂ξ∂φ where ξ, φ ∈ {a, s 1 , . . . , s n , b, σ ε , σ δ }. The matrix (7) has the form

Θ _n ⁻¹ =







m

σ

_δ²

P s ² _i ^ma

σ

²_δ

s ⁰ _σ ^m

2 δ

P s _i 0 0

ma

σ

_δ²

s ^ma

2

σ

²_ε

+σ

_δ²

σ

_ε²

σ

²_δ

I n am

σ

²_δ

1 n 0 0

m

σ

_δ²

P s i am

σ

_δ²

1 ⁰ _n ^mn _σ

2 δ

0 0

0 . . . 0 ²ⁿ _σ

2

ε

0 0 . . . 0 0 ^2nm _σ

2

δ







−1

where s = (s 1 , . . . , s n ) ⁰ and 1 n is the n-dimensional vector of ones.

Let us partition Θ n as Θ n =

m

σ

_δ²

P s ² _i w ⁰

w M

.

The element in Θ ⁻¹ _n which is the required asymptotic variance of b a can be obtained by a standard result on the inverse of a partition matrix:

avar( b a) = m σ _δ ²

X s ² _i − w ⁰ M ⁻¹ w

−1

. But

M ⁻¹ = Q ⁻¹ 0 0 T ⁻¹

where Q ⁻¹ =

" ma

²

σ

²_ε

+σ

²_δ

σ

²_ε

σ

_δ²

I n am σ

²_δ

1 n am

σ

²_δ

1 ⁰ _n ^mn _σ

2 δ

# ⁻¹

and T ⁻¹ =

" 2n σ

²_ε

0 0 ^2nm _σ

2 δ

# −1

so

(10) avar( b a) = m P s ² _i σ δ

− q ⁰ Q ⁻¹ q − t ⁰ T ⁻¹ t

−1

where q ⁰ = _ma

σ

_δ²

s ⁰ , _σ ^m

2 δ

P s i and t ⁰ = [0, 0].

(5)

Taking into account that t ⁰ T ⁻¹ t = 0 and inserting the calculated value of q ⁰ Q ⁻¹ q into expression (10) we obtain the asymptotic variance of the estimator b a.

To obtain the asymptotic variance of the estimator b b, avar(b b), we repeat the previous argument for the matrix of ∂L/∂ξ∂φ where ξ, φ are taken in the order {b, s 1 , . . . , s n , a, σ ε , σ δ }.

3. Variance components estimation method. In this section we present another method of estimating unknown parameters in model (3).

This method (Czapkiewicz 1999) is based on some properties of a linear model with two variance components. We discuss the model

X i = s i + ε i , Y ij = as i + b + δ ij , i = 1, . . . , n, j = 1, . . . , m.

If we substitute s i in the last formula we obtain (11) Y ij = aX i + b + δ ij − aε _i .

Replacement of the distribution of (X i , Y ij ) by the conditional distribution of Y ij with respect to X i enables us to use a different model (treating X i

as a constant) to estimate the same parameters a, b, σ δ , σ ε as in model (3).

The technique of variance components can be applied for this purpose.

We obtain a model

Y = Xβ + U 1 Φ 1 + U 2 Φ 2 , where β is a vector of unknown parameters a and b,

Y = [y 1 , . . . , y n ] ⁰ , y i = [Y i1 , . . . , Y im ], X =





X 1 1 m 1 m

.. . .. . X n 1 m 1 m



 , the matrix U 1 is

U 1 = I n ⊗ 1 _m ,

whereas U 2 is the nm × nm unit matrix. The vectors Φ 1 , Φ 2 are Φ 1 = [γ 1 , . . . , γ n ] ⁰ , Φ 2 = [δ 11 , . . . , δ 1m , . . . , δ n1 , . . . , δ nm ] ⁰ where γ i = −aε i . The variance components are

(12) σ ² ₁ = a ² σ ² _ε , σ ₂ ² = σ _δ ² . First we recall the following result:

Theorem 2. The uniformly best , invariant unbiased estimators of σ ¹ and σ 2 are

(13) e σ 1 = nm − 2

m ² (n − 2)(m − 1)n Y ⁰ M V M Y − 1

mn(m − 1) Y ⁰ M Y,

(6)

(14) e σ 2 = 1

n(m − 1) Y ⁰ M Y − 1

mn(m − 1) Y ⁰ M V M Y.

The estimator of e β = [ e a, e b] ⁰ has the form

(15) β = (X e ⁰ Z e ⁻¹ X) ⁻¹ X ⁰ Z e ⁻¹ Y where e Z = e σ 1 V + σ e 2 I mn .

The proof of this theorem is given in Czapkiewicz (1999). Now, we prove the following theorem:

Theorem 3. The estimators of unknown parameters, based on variance components theory, have the following properties:

(i) The estimator defined in (15) does not depend on the values of e σ 1

and σ e 2 .

(ii) The estimator e β has a normal distribution with expectation β and covariance matrix

(16) (mσ ₁ ² + σ ² ₂ )(X ⁰ X) ⁻¹ = (ma ² σ _ε ² + σ ² _δ )(X ⁰ X) ⁻¹ .

(iii) The estimator e β is unbiased with minimal covariance matrix in the class of linear unbiased estimators, the estimator e σ δ is the uniformly best unbiased estimator of σ δ , and the estimator

e σ ε =

√ e σ ₂ ² / e a ² is weakly consistent.

P r o o f. A simple calculation shows that for every p and q we have (17) X ⁰ (pV + qI mn ) = (mp + q)X ⁰ .

(i) From (17) we have

X ⁰ Z = X e ⁰ ( e σ ₁ ² V + σ e ² ₂ I mn ) = (m e σ ₁ ² + σ e ² ₂ )X ⁰ , so

X ⁰ = (m σ e ₁ ² + σ e ² ₂ )X ⁰ ( e Z) ⁻¹ and

β = (m e σ e ₁ ² + σ e ² ₂ )(X ⁰ X) ⁻¹ 1

m e σ ₁ ² + σ e ² ₂ X ⁰ Y = (X ⁰ X) ⁻¹ X ⁰ Y.

(ii) If we assume that Y has a normal distribution, then the estimator β, which is a linear function of Y , also has a normal distribution. The e expectation of e β is

E((X ⁰ X) ⁻¹ X ⁰ Y ) = (X ⁰ X) ⁻¹ X ⁰ E(Y ) = (X ⁰ X) ⁻¹ X ⁰ Xβ = β.

(7)

The covariance matrix of e β, Var( e β), is

Var( e β) = E((X ⁰ X) ⁻¹ X ⁰ Y Y ⁰ X(X ⁰ X) ⁻¹ ) − ββ ⁰

= (X ⁰ X) ⁻¹ X ⁰ E(Y Y ⁰ )X(X ⁰ X) ⁻¹ − ββ ⁰ . Because

E(Y Y ⁰ ) = σ ² ₁ V + σ ₂ ² I + (Xβ)(Xβ) ⁰ , from (17) we obtain

Var( e β) = (mσ ₁ ² + σ ² ₂ )(X ⁰ X) ⁻¹ .

(iii) Let us consider another linear unbiased estimator L ⁰ Y of β. We wll prove that its covariance matrix is not smaller than the covariance matrix of (X ⁰ X) ⁻¹ X ⁰ Y (i.e. the difference between these matrices is non-negative definite). Set

A ⁰ = L ⁰ − (X ⁰ X) ⁻¹ X ⁰ .

Notice that E(A ⁰ Y ) = 0 and A ⁰ X = 0. From this and from (17) we have E(A ⁰ Y ((X ⁰ X) ⁻¹ X ⁰ Y ) ⁰ ) = A ⁰ E(Y Y ⁰ )X(X ⁰ X) ⁻¹

(18)

= A ⁰ (σ ₁ ² V + σ ₂ ² I mn )X(X ⁰ X) ⁻¹

= (mσ ₁ ² + σ ² ₂ )A ⁰ X(X ⁰ X) ⁻¹ = 0.

Now we write Var(L ⁰ Y ) as

Var(L ⁰ Y ) = Var(L ⁰ Y − (X ⁰ X) ⁻¹ X ⁰ Y + (X ⁰ X) ⁻¹ X ⁰ Y ).

By (18),

Var(L ⁰ Y ) = Var(L ⁰ Y − (X ⁰ X) ⁻¹ X ⁰ Y ) + Var((X ⁰ X) ⁻¹ X ⁰ Y ).

The first component is a non-negative definite matrix, so we have Var(L ⁰ Y ) ≥ Var((X ⁰ X) ⁻¹ X ⁰ Y ).

The properties of e σ δ follow from Theorem 2 whereas the properties of σ e ε

follow from S lucki’s Theorem (see e.g. Bartoszewicz 1989, p. 53, Th. 5.3).

Remark. The variances of the estimators e a and e b are var( e a) = ma ² σ ² _ε + σ _δ ²

m P n

i=1 (X i − X) ² , (19)

avar(e b) = ma ² σ _ε ² + σ _δ ²

mn ·

P n i=1 X _i ² P n

i=1 (X i − X) ² . (20)

Conclusion. The variances of the estimators of a and b obtained using

the maximum likelihood method and the theory of variance components are

comparable. The differences between formulas (8), (19) and (9), (20) result

(8)

from differences in the definitions of the models from which we estimated the same parameter a or b.

References

J. B a r t o s z e w i c z (1989), Lectures in Mathematical Statistics, PWN, Warszawa (in Polish).

O. B u n k e and H. B u n k e (1989), Non-Linear Regression, Functional Relationships, and Robust Methods, Wiley, New York.

A. C z a p k i e w i c z (1999), On estimation of parameters in the bivariate linear errors-in- variables model , Appl. Math. (Warsaw) 25, 401–410.

N. R. C o x (1976), The linear structural relation for several groups of data, Biometrika 63, 231–237.

G. R. D o l b y (1976), The ultrastructural relation: A synthesis of the functional and struc- tural relations, Biometrika 63, 39–50.

W. A. F u l l e r (1987), Measurement Error Models, Wiley, New York.

S. G n o t (1991), Estimation of Variance Components in Linear Models, Wyd. Naukowo- Techniczne, Warszawa (in Polish).

M. G. K e n d a l l and A. S t u a r t (1979), The Advanced Theory of Statistics, Vol. 2, Griffin, London.

E. L. L e h m a n n (1983), Theory of Point Estimation, Wiley, New York.

A. O l s e n, J. S e e l y and D. B i r k e s (1976), Invariant quadratic unbiased estimation for two variance components, Ann. Statist. 4, 878–890.

C. R. R a o and J. K l e f f e (1988), Estimation of Variance Components and Applications, North-Holland Ser. Statist. Probab. 3, North-Holland, Amsterdam.

O. R e i e r s o l (1950), Identifiability of a linear relation between variables which are subject to error , Econometrica 18, 575–589.

Anna Czapkiewicz Faculty of Management

University of Mining and Metallurgy Gramatyka 10, 30-067 Krak´ ow E-mail: gzrembie@cyf-kr.edu.pl

Antoni Leon Dawidowicz Institute of Mathematics Jagiellonian University Reymonta 4/510, 30-059 Krak´ ow E-mail: dawidowi@im.uj.edu.pl

Received on 17.6.1999;

revised version on 25.10.1999

THE TWO-DIMENSIONAL LINEAR RELATION IN THE ERRORS-IN-VARIABLES MODEL WITH REPLICATION OF ONE VARIABLE

A. C Z A P K I E W I C Z and A. L. D A W I D O W I C Z (Krak´ ow)

THE TWO-DIMENSIONAL LINEAR RELATION IN THE ERRORS-IN-VARIABLES MODEL WITH REPLICATION OF ONE VARIABLE

1. Introduction. A problem sometimes encountered in data analysis is to find a relation between two or more variables. In this paper we discuss the two-dimensional case, where both observables are not measured precisely.

Thus let us consider the model

(1) X i = s i + ε i , Y i = as i + b + δ i , i = 1, . . . , n,

where the disturbance errors ε i and δ i are independent random variables, with mean and variance equal to zero and σ ε 2 , σ δ 2 , respectively. We assume s i

Another approach to construct consistent estimators of regression slopes in model (1) is repeating the random variables X i , Y i m i times (Cox 1976,

2000 Mathematics Subject Classification: Primary 62J05; Secondary 62F10, 62J12.

Key words and phrases: linear regression, consistent estimator.

Dolby 1976, Bunke and Bunke 1989). In this case we have

(2) X ij = s i + ε ij , Y ij = as i + b + δ ij , i = 1, . . . , n, j = 1, . . . , m i . In this paper we consider a particular case of the model with replications.

We will prove that repeating only one variable, for example Y i , enables us to construct consistent estimators of the unknown parameters of the linear relation.

We discuss the model

(3) X i = s i + ε i , Y ij = as i + b + δ ij , i = 1, . . . , n, j = 1, . . . , m.

The variables X i , Y ij are observables, the variables s i are unknown constants and ε i , δ ij are assumed to have independent normal distribution with mean zero and unknown variances σ ε 2 and σ 2 δ :

ε i ∼ N (0, σ 2 ε ), δ ij ∼ N (0, σ 2 δ ).

For constructing consistent estimators of the unknown parameters we present two methods: the maximum likelihood method and a method (Czap- kiewicz 1999) based on variance components theory. We compare these two methods by comparing the mean squared errors.

2. Maximum likelihood method

2.1. Methodology. We can express the observations X i , Y ij in (3) as z i = [X i , Y i1 , . . . , Y im ] 0 , i = 1, . . . , n.

The independent random vectors z i have means depending on i:

µ i = [s i , as i + b, . . . , as i + b] 0 and a common (m + 1) × (m + 1) covariance matrix:

Σ =









σ ε 2 0 . . . 0 0 σ 2 δ . . . 0

.. .

0 0 . . . σ 2 δ







 .

The log-likelihood function has the form L(θ) = const − n ln σ ε − nm ln σ δ

− 1 2

 n X

i=1

(X i − s i ) 2 σ 2 ε +

n

X

i=1 m

X

j=1

(Y ij − as i − b) 2 σ δ 2



where L(θ) = L(a, s 1 , . . . , s n , b, σ ε , σ δ ). Solving the log-likelihood equations is not easy. Cox (1976) gives the solutions for model (2) where both X i and Y i are repeated m times. When we assume that X ij = X i for each j in Cox’

model, we can use his solutions for our purposes.

To write estimators, set s yy =

n

X

i=1 m

X

j=1

(Y ij − Y i. ) 2 /(nm), b yy =

n

X

i=1

(Y i. − Y ) 2 /n,

b xx =

n

X

i=1

(X i − X) 2 /n, b xy =

n

X

i=1

(X i − X)(Y i. − Y )/n and

B(a) = b yy − 2ab xy + a 2 b xx .

Solving the likelihood equations we get estimators in terms of a:

(4)

b b = Y − aX,

σ b ε 2 = s yy + (b yy − ab xy ) 2 /B(a), σ b δ 2 = s xx + (ab xx − b xy ) 2 /B(a),

s b i = (ab xx − b xy )(Y i. − Y + aX) + (b yy − ab xy )X i /B(a).

But to get an estimator of a we must solve an equation of the fourth degree in a:

(5) −s yy (ab xx − b yx )B(a) − (b yy − ab xy )(ab xx − b xy )(b yy − a 2 b xx ) = 0.

When m > 2 we solve (5) numerically and then check whether the absolute maximum has been found.

Assume that s i , i = 1, . . . , n, belong to a bounded set as n tends to infinity and the following two limits exist:

n→∞ lim 1 n

n

where the disturbance errors ε i and δ i are independent random variables, with mean and variance equal to zero and σ _ε ² , σ _δ ² , respectively. We assume s i

The variables X i , Y ij are observables, the variables s i are unknown constants and ε i , δ ij are assumed to have independent normal distribution with mean zero and unknown variances σ _ε ² and σ ² _δ :

ε i ∼ N (0, σ ² _ε ), δ ij ∼ N (0, σ ² _δ ).

2.1. Methodology. We can express the observations X i , Y ij in (3) as z i = [X i , Y i1 , . . . , Y im ] ⁰ , i = 1, . . . , n.

µ i = [s i , as i + b, . . . , as i + b] ⁰ and a common (m + 1) × (m + 1) covariance matrix:

σ _ε ² 0 . . . 0 0 σ ² _δ . . . 0

0 0 . . . σ ² _δ

The log-likelihood function has the form L(θ) = const − n ln σ ε − nm ln σ _δ

ⁿ X

(X i − s _i ) ² σ ² _ε +

(Y ij − as _i − b) ² σ _δ ²

(Y ij − Y _i. ) ² /(nm), b yy =

(Y i. − Y ) ² /n,

(X i − X) ² /n, b xy =

B(a) = b yy − 2ab _xy + a ² b xx .

σ b _ε ² = s yy + (b yy − ab _xy ) ² /B(a), σ b _δ ² = s xx + (ab xx − b _xy ) ² /B(a),

s b i = (ab xx − b _xy )(Y i. − Y + aX) + (b _yy − ab _xy )X i /B(a).

(5) −s yy (ab xx − b yx )B(a) − (b yy − ab xy )(ab xx − b xy )(b yy − a ² b xx ) = 0.

s ² _i . Then we can prove:

∂ ²

−1

avar( b a) = ma ² σ _ε ² + σ ² _δ m P n

i=1 (s i − s) ² , (8)

avar(b b) = ma ² σ _ε ² + σ ² _δ

P n i=1 s ² _i P n

i=1 (s i − s) ² . (9)

Θ _n ⁻¹ =

P s ² _i ^ma

s ⁰ _σ ^m

P s _i 0 0

s ^ma

1 ⁰ _n ^mn _σ

0 . . . 0 ²ⁿ _σ

0 . . . 0 0 ^2nm _σ