DOI: 10.2478/v10006-007-0017-0
ITERATIVE ESTIMATORS OF PARAMETERS IN LINEAR MODELS WITH PARTIALLY VARIANT COEFFICIENTS
SHAOLIN HU
∗,∗∗, KARL MEINKE
∗∗, RUSHAN CHEN
∗, OUYANG HUAJIANG
∗∗∗∗
Nanjing University of Science and Technology 210071, Nanjing, China
e-mail: hshaolin@ustc.edu.cn
∗∗
Department of Computer Science, Royal Institute of Technology Stockholm, 100-44, Sweden
e-mail:karlm@csc.kth.se
∗∗∗
University of Liverpool, Liverpool, United Kingdom L69 3 GH, Liverpool, UK
e-mail: h.ouyang@liverpool.ac.uk
A new kind of linear model with partially variant coefficients is proposed and a series of iterative algorithms are introduced and verified. The new generalized linear model includes the ordinary linear regression model as a special case. The iterative algorithms efficiently overcome some difficulties in computation with multidimensional inputs and incessantly appending parameters. An important application is described at the end of this article, which shows that this new model is reasonable and applicable in practical fields.
Keywords: linear model, parameter estimation, iterative algorithms, variant coefficients
1. Introduction
In the last centuries, many statisticians and mathemati- cians have considered the following kind of linear regres- sion model:
Y
i= B
iβ + ε
i, (1) where Y
i∈ R
p, B
i∈ R
p×r(i = 1, 2, . . . , n) and the vector β ∈ R
ris a constant parameter vector to be es- timated while ε
i∈ R
pare errors from measurements or stochastic noise from disturbances.
Some excellent theories and practical results were published for statistical inference and stochastic decisions under this model. This model was also successfully used in many different kinds of practical fields (see Draper and Smith, 1981; Frank and Harrell, 2002; Graybill and Iyer, 1994; Hu and Sun, 2001).
Further research into this model shows that the lim- itation of constant coefficients in (1) is very rigorous and critical. In other words, there are some practical situa- tions in which this linear model cannot be applied (see Brown, 1964; Hu and Sun, 2001). Although statisticians
and researchers (see Fahrmeier and Tutz, 2001; Dodge and Kova, 2000) have done a lot to generalize and/or to adapt the linear model (2), the limitation on constant co- efficients has not been essentially untied.
In order to overcome this limitation on the model structure, we set up a new linear model with partially vari- ant coefficients as follows:
Y
i= A
iX
i+ B
ifβ + ε
i, (2) where Y
i∈ R
p, A
i∈ R
p×q, B
i∈ R
p×r, and a variant vector series {X
i∈ R
q}. Generally, the dimension p of the measurement output must be greater than the dimen- sion q of the variant coefficient, i.e., p > q, so as to assure that the structure of the time-variant multidimensional lin- ear system is identifiable.
Obviously, the ordinary linear regression model (1) is just a special case of the generalized model (2). If there are not any time-variant parts, namely, q = 0, the model (2) degenerates to the ordinary linear model (1).
In Section 2, much attention is paid to estimate all
the model coefficients in (2) under the Gauss-Markov as-
sumptions (in Kala R. and Kłaczy´nski K., 1988). A series of iterative algorithms are built to estimate the coefficients which include the constant parameters β ∈ R
rand the variant vector series {X
i∈ R
q}. In Section 3, a practical application is described and computation results show that this new model is valuable.
2. Iterative Estimators of Variant Coefficients
In order to make the results of this section universal, we assume first that the coefficient series {X
i∈ R
q} are not related at different sampling points. For sim- plicity, we adapt the notation of stacked vectors Φ
n= ( β
τ, X
1τ, . . . , X
nτ)
τ∈ R
r+nq, where the superscript τ denotes the matrix transpose. Moreover,
H
n=
⎛
⎜ ⎜
⎝
B
1A
1. . . 0 .. . .. . . .. .. . B
n0 · · · A
n⎞
⎟ ⎟
⎠ ∈ R
np×(r+nq). (3)
Under the following famous Gauss-Markov assump- tions (Kala R. and Kłaczy´nski K., 1988; Rencher, 2000) on random errors { ε
i∈ R
p}:
(i) the error ε
ihas expected value 0;
(ii) the errors series { ε
i, i = 1, 2, . . . } are uncorrelated, and
(iii) the error series { ε
i, i = 1, 2, . . . } are homoscedas- tic, i.e.,
they all have the same variance, the least-squares (LS) es- timators of coefficients in the model (2) can be expressed as follows:
Φ ˆ
LS(n)n= arg min
{β∈Rp,Xi∈Rq}
n i=1Y
i− (A
iX
i+ B
iβ)
2. (4)
We can directly deduce a compact formula, which is very similar to the LS estimators of the model (1). The com- pact formula of the LS estimator of the coefficients in the model (2) is as follows:
Φ ˆ
LS(n)n= ( H
nτH
n)
−1H
nτY ¯
n, (5) where ¯ Y
n= ( Y
1τ, . . . , Y
nτ)
τ∈ R
np.
In order to guarantee that the matrix H
nτH
nis in- vertible, the dimensions of the model (2) must satisfy the requirement p > q and the sample cardinality must sat- isfy n > r/(p − q). Otherwise, the matrix inversion in (5) must be replaced by the pseudo-inverse operator “+”, namely, (H
nτH
n)
+.
Theorem 1. Assume that the number of sampling points is n > r/(p − q), where the LS estimators of the coefficients in the model (2) can be iteratively expressed by
⎧ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎨
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎩
β ˆ
LS(n+1)= ˆ β
LS(n)+ ( L
n+ B
n+1τB
n+1)
−1× B
τn+1( R
−1n+1− Ξ
n+1)
× R
n+1( Y
n+1− B
n+1β ˆ
LS(n)), X ˆ
iLS(n+1)= ˆ X
iLS(n)+ ( A
τiA
i)
−1A
τiB
i× ( ˆ β
LS(n)− ˆ β
LS(n+1)),
X ˆ
n+1LS(n+1)= ( A
τn+1R
n+1A
n+1)
−1A
τn+1R
n+1× (Y
n+1− B
n+1β ˆ
LS(n)),
(6)
where
L
n=
n i=1B
τi[ I − A
i( A
τiA
i)
−1A
τi] B
i∈ R
r×r,
Ξ
n+1= A
n+1( A
τn+1R
n+1A
n+1)
−1A
τn+1∈R
p×p, R
n+1= I−B
n+1( L
n+ B
n+1τB
n+1)
−1B
n+1τ∈R
p×p. Here the superscript in ˆ X
iLS(n)denotes the LS estimators of X
i, i = 1, 2, . . . , n.
The proof of Theorem 1 is given in Appendix.
Obviously, the algorithm (6) is iterative and very common in practical engineering fields. Some obvious advantages are the following:
• ˆ β
LS(n+1)is a linear combination of the estimator β ˆ
LS(n)with innovation Y
n+1− B
n+1β ˆ
LS(n)from new sampling data.
• ˆ β
LS(n+1)can be computed from the estimator β ˆ
LS(n)without directly involving old sampling data {Y
i, i = 1, . . . , n} as well as the estimates { ˆ X
iLS(n)}.
• It is reasonable that the estimators ˆ X
n+1LS(n+1)are de- termined with innovation {Y
n+1− B
n+1β ˆ
LS(n)}.
• The estimate ˆ X
iLS(n+1)of X
i(i ≤ n) can be ad- justed accurately in succession with the estimator er- ror of the constant parameter vector β.
In order to use these iterative algorithms to effec- tively solve practical problems, the initial estimates of the iterative algorithms (6) must be carefully selected. Gener- ally, the initial estimators can be selected as the LS esti- mators processed in batch as follows:
Φ ˆ
LS(nn0 0)= ( H
nτ0H
n0)
−1H
nτ0Y ¯
n0, (7) where n
0∈ N must satisfy the constraint
n
0> r
(p − q) .
If the disturbance {ε
n∈ R
p, n ≤ n
0} is a stationary Gaussian white noise process with zero mean, then it can be easily shown that the ordinary LS estimators given by (4) are unbiased.
Theorem 2. Given the LS estimators (7) as an initial es- timate of the coefficients of (2), if the disturbance {ε
n∈ R
p, n ∈ N} is a stationary Gaussian white noise process with zero mean, then the iterative estimators (6) are unbi- ased.
Proof. We just have to show that E ˆ β
LS(n)= β. We get E{ ˆ β
LS(n)} = E{ ˆ β
LS(n−1)} + ˜ L
−1nB
nτ[ R
−1n−Ξ
n] R
n×
A
nX
n+ B
n( β − E{ ˆ β
LS(n−1)})
= β + ˜ L
−1nB
τnI − A
n( A
τnR
nA
n)
−1×A
τnR
nA
nX
n= β, (8) where
L ˜
n=
n−1i=1
B
τiI − A
i( A
τiA
+i) A
τiB
i+ B
τnB
n. Using (6), we get
E{ ˆ X
iLS(n)} = E{ ˆ X
iLS(n−1)} + (A
τiA
i)
−1A
τiB
i×
E{ ˆ β
LS(n−1)} − E{ ˆ β
LS(n)}
= X
i+ ( A
τiA
i)
−1A
τiB
i( β − β) = X
i, i = 1, 2, . . . , n − 1, (9)
E{ ˆ X
nLS(n)} = (A
τnR
nA
n)
−1A
τnR
n×
E{Y
n} − B
nE{ ˆ β
LS(n−1)}
= ( A
τnR
nA
n)
−1A
τnR
nA
nX
n= X
n. (10) Mathematical induction is implicitly used in the above proof, which justifies its correctness.
3. Applications
The new linear model (2) can be widely used in many dif- ferent fields, e.g., in data fusion, in modeling and identi- fying a computer controlled system, in signal processing, in spacecraft control engineering, etc. In this section, we present an application of the model (2) to the trajectory of a rocket.
Suppose that there are m transducers which are suit- ably located at different sites. These m devices are simul- taneously used to track a carrying rocket M in space. Us- ing these transducers, we get a series of measurement data {(A
j(t
i), E
j(t
i)) | i = 1, 2, . . . , n; j = 1, 2, . . . , m}, where A
j(t
i) denotes the azimuth and E
j(t
i) denotes the elevation of the rocket M at time t
iwith respect to a ref- erence frame fixed at the center of Transducer j.
In order to simplify the expressions below, we use a simplified notation such as A
ij= A
j(t
i) and E
ij= E
j(t
i), etc. Accordingly, the error decomposition models used in determining the location of the spacecraft M can be set up as follows (Brown, 1964; Hu and Sun, 2001):
⎧ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎨
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎩
A
ij= tan
−1x−x
0jy−y
0j+α
j1+α
j3tan(E
ij)sin(A
ij) +α
j4tan(E
ij) cos(A
ij) + α
j5tan(E
ij) +α
j6sec(E
ij)+ε
Aij,
E
ij= tan
−1z − z
0j[(x − x
0j)
2+(y − y
0j)
2]
1/2+α
j2+α
j3cos(A
ij) −α
j4sin(A
ij) + ε
Eij,
(11) where the coefficients (α
j1, α
j2) are non-zero errors of Transducer j used to measure the azimuth and eleva- tion of the spacecraft, the vectors (α
j3, . . . , α
j6) are non- orthogonal coefficients representing measurement errors arising from departures from right angles between each pair of axes in measurement equipment (the mechani- cal axis, the laser axis and the electrical axis) separately, (ε
A, ε
E) are stochastic errors included in the measure- ment data.
Assuming that we get a series of imprecise location data
p
i∗= (x
∗i, y
i∗, z
i∗)
for the spacecraft M at different sampling times t
i(i = 1, 2, . . . ), what we want to do is to estimate all of the in- strument error coefficients as well as the precise location of the spacecraft M .
According to the geometrical relationship between the ordinates and measurement data from radars, two functions are defined as follows:
f
j(x, y, z) = tan
−1x − x
0jy − y
0j, g
j(x, y, z) = tan
−1z − z
0j[(x − x
0j)
2+ (y − y
0j)
2]
1/2,
and the design matrix is
Θ
ij=
1 0 tan (E
ij) sin(A
ij) tan (E
ij) cos(A
ij)
0 1 cos(A
ij) − sin(A
ij)
tan (E
ij) sec(E
ij)
0 0
.
Then we get the following linear model:
Δ ˜ A
ijΔ ˜ E
ij= J
j( P
i)
P = Pi∗
⎛
⎜ ⎝ Δx
iΔy
iΔz
i⎞
⎟ ⎠
+ Θ
ij⎛
⎜ ⎜
⎝ a
j1.. . a
j6⎞
⎟ ⎟
⎠ +
ε
Aijε
Eij, (12)
for j = 1, . . . and i = 1, 2, . . . , where Δ ˜ A
ij= A
ij− f
j( P
i∗)Δ ˜ E
ij= E
ij− g
j( P
i∗),
J
1( P ) = ∂(f
j, g
j)
∂(x, y, z) ,
Integrating all m instruments, we get the following inte- grated error decomposition model:
⎡
⎢ ⎢
⎢ ⎢
⎢ ⎢
⎣ Δ ˜ A
i1Δ ˜ E
i1.. . Δ ˜ A
imΔ ˜ E
im⎤
⎥ ⎥
⎥ ⎥
⎥ ⎥
⎦
=
⎛
⎜ ⎜
⎝ J
1( P )
.. . J
m( P )
⎞
⎟ ⎟
⎠
P = Pi∗
⎛
⎜ ⎝ Δx
iΔy
iΔz
i⎞
⎟ ⎠
+ ¯ B
i⎛
⎜ ⎜
⎝ α
11.. . α
m6⎞
⎟ ⎟
⎠ +
⎡
⎢ ⎢
⎢ ⎢
⎢ ⎢
⎣ ε
Ai1ε
Ei1.. . ε
Aimε
Eim⎤
⎥ ⎥
⎥ ⎥
⎥ ⎥
⎦ ,
i = 1, 2, . . . (13) where ¯ B
i= diag {Θ
i1, . . . , Θ
im}.
Obviously, the model (13) is very similar to the lin- ear model (2) with partially variant parameters. Conse- quently, we can use the iterative algorithm (6) to calibrate the error coefficients in the transducers and, at the same time, to accurately determine the rocket trajectory.
In our simulations, there are four transducers track- ing a rocket in space. Selecting the computation parameter n
0= 100[s], we use (6) to get the modification values.
Table 1 contains the estimated values of the error coefficients estimated at 110 seconds. Table 2 includes inerements in the rocket trajectories after n
0= 100[s].
The computation results given in Tables 1 and 2 show that the iterative algorithms given in Section 2 not only decrease the computation time, but also efficiently mod- ify the precision of the rocket trajectory. What is more, this practical application shows that this new kind of linear model with variant coefficients is reasonable and valuable not only in theory, but also in various engineering fields.
Table 1. Estimation of error coefficients.
[mrad]
Transducer α
j1α
j2α
j3α
j4α
j5α
j6i = 1 1.203 0.686 0.018 −0.006 0.006 0.003 i = 2 −0.058 −0.070 −0.000 0.000 −0.000 0.001 i = 3 1.087 −1.837 0.041 −0.016 −0.019 −0.004 i = 4 −0.514 1.449 −0.024 0.009 0.015 0.001
Table 2. Increments in values of trajectories.
[m]
i = 100+ Δx
iΔy
iΔz
i1 −0.662445 0.631514 0.001622 2 −0.763551 0.687524 −0.006546 3 −0.760541 0.677017 −0.012729 4 −0.752472 0.673932 −0.017189 5 −0.793005 0.699980 −0.023827 6 0.835997 0.730210 −0.029760 7 −0.832480 0.731190 −0.036693 8 −0.802443 0.710644 −0.037840 9 −0.739471 0.661947 −0.036633 10 −0.739483 0.660732 −0.038780
4. Discussion
The paper not only presents a new kind of linear model, but also builds a series of convenient algorithms. This new model usefully generalized the widely used ordinary linear regression model. The new model can be used in many different kinds of fields, e.g., in data fusion, process monitoring, control engineering, etc.
As for the new algorithms, their advantages are evi- dent. Obviously, if we use the old LS algorithm (5), we must compute the inverse of a large matrix ( H
nτH
n)
−1∈ R
(r+nq)×(r+nq). What is more, its dimension is contin- ually increased as long as the number n of samples in- creases or the process moves on. On the other hand, if we use the new iterative algorithm (6), we just need to deal with a series of lower-dimensional inverse matrices, the highest dimension of which is equal to max{p, q, r}.
In fact, the iterative algorithm (6) involves three inverse matrices (A
τiA
i)
−1∈ R
q×q, R
−1n+1∈ R
p×pand (L
n+ B
τn+1B
n+1)
−1∈ R
r×r.
Acknowledgments
We gratefully acknowledge the partial financial support
from the National Nature Science Fund of China (NSFC
90305007), the SI Project (SI-210-05483) of the Sweden
Institute, the Jiangsu Nature Science Fund (BK-06200)
and the NSFC-RS Joint Project (NSFC-RS/0510881-
207043).
References
Brown D.C. (1964): The Error Model Best Estimation Trajec- tory. — Tech. Rep. AD 602799:
http://stinet.dtic.mil/oai/oai?&verb
=getRecorg&metadataPrefix=
html&identifier=AD0602799
Dodge Y. and Kova J. (2000): Adaptive Regression. – Berlin:
Springer.
Draper N.R. and Smith H. (1981): Applied Regression Analysis.
– New York: Wiley.
Eubank R., Chunfeng H., Maldonado Y., Naisyin W., Suojin W., Buchanan R.J. (2004): Smoothing spline estimation in varying coefficient models. – J. Roy. Stat. Soc. B, Vol. 66, No. 3, pp. 653–667.
Fahrmeier L. and Tutz G.(2001): Multivariate Statistical Mod- eling Based on Generalized Linear Models. — Berlin:
Springer.
Frank E. and Harrell J.(2002): Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. — New York: Springer.
Graybill F.A. and Iyer H.K. (1994): Regression Analysis: Con- cepts and Applications. — Massachusetts: Duxbury Press.
Hu Shaolin and Sun Guoji (2001): Process Monitoring Tech- nique and Applications. — Bejing: National Defense In- dustry Press.
Kala R. and Kłaczy´nski K. (1988): Recursive improvement of es- timates in a Gauss-Markov model with linear restrictions.
Canad. J. Stat., Vol. 16, No. 3, pp. 301–305.
Rencher A. (2000): Linear Models in Statistics. – New York:
Wiley.
Appendix
In order to prove Theorem 1, recall two lemmas without proofs. They are fundamental in linear algebra.
Lemma 1. If a block matrix A and its submatrix A
11in A are invertible, then we have
A
11A
12A
21A
22 −1=
A
−1110
0 0
+
A
−111A
12−I
( A
22−A
21A
−111A
12)
−1( A
21A
−111.. . − I).
(A1)
Similarly, if a matrix A and its submatrix A
22are invert- ible, then we have
A
11A
12A
21A
22 −1=
0 0 0 A
−122+
−I A
−122A
21( A
11−A
12A
−122A
21)
−1( −I .. .A
12A
−122) (A2)
Lemma 2. If matrices F and G are invertible and an inverse matrix (F − HG
−1K)
−1exists, then
( F −HG
−1K)
−1= F
−1+ F
−1H(G−KF
−1H)
−1KF
−1. (A3) The proofs of these two lemmas can be found in the references (Draper and Smith, 1981).
Proof of Theorem 1. With the model (2) and n samples, Eqn. (5) shows that the LS estimator is
Φ ˆ
LS(n)n= ( H
nτH
n)
−1H
nτY ¯
n. If there is another sampling datum
Y
n+1= A
n+1X
n+1+ B
n+1β + ε
n+1which has been added into the sampling set, the LS es- timators of all the coefficients in the model (2) must be changed in accordance with the following expressions:
Φ ˆ
LS(n+1)n+1= ( Ψ
τnΨ
n)
−1Ψ
τnY ¯
nY
n+1=
D
11C
n+1τA
n+1A
τn+1C
n+1A
τn+1A
n+1 −1Ψ
τn×
Y ¯
nY
n+1, (A4)
where Φ ˆ
LS(n+1)n+1=
Φ ˆ
LS(n+1)nX ˆ
n+1LS(n+1), Ψ
n=
H
n0 C
n+1A
n+1,
C
n+1= ( B
n+1, 0) ∈ R
p×(q+nq)D
11= H
nτH
n+ C
n+1τC
n+1.
Now, using the notation
D
22= A
τn+1A
n+1, D
12= C
n+1τA
n+1, Ω = D
22− D
21D
−111D
12,
the following formula can be directly derived from Lemma 1:
Φ ˆ
LS(n+1)nX ˆ
n+1LS(n+1)=
D
−1110 0 0
+
D
11−1D
12−I
Ω
−1D
21D
−111.. . − I
H
nτY ¯
n+ C
n+1τY
n+1A
τn+1Y
n+1= E
nH
nτY ¯
n+ F
nY
n+1, (A5) where
F
n=
⎛
⎜ ⎜
⎝
D
−111[ C
n+1τ+ D
12Ω
−1D
21D
11−1C
n+1τ−D
12Ω
−1A
τn+1]
−Ω
−1[ D
21D
11−1C
n+1τ− A
τn+1]
⎞
⎟ ⎟
⎠
and
E
n=
D
−111+ D
−111D
12Ω
−1D
21D
11−1−Ω
−1D
21D
−111.
Step 1. We analyze the expression E
n. From the expres- sion for the block matrix D
11and Lemma 1 we have
D
11−1=
⎡
⎢ ⎢
⎢ ⎢
⎢ ⎢
⎣
n+1
i=1
B
τlB
iB
τ1A
1· · · B
nτA
nA
τ1B
1A
τ1A
1· · · 0 .. . .. . . .. .. . A
τnB
n0 · · · A
τnA
n⎤
⎥ ⎥
⎥ ⎥
⎥ ⎥
⎦
−1
.
It can be shown that the first r rows of the matrix D
11−1can be expressed as follows:
D
11−1=
⎡
⎢ ⎢
⎢ ⎣
n+1i=1
B
iτB
i−
n+1i=1
B
τiU
iA
τiB
i −1∗
−T
n n+1i=1
B
iτB
i−
n+1i=1
B
iτU
iA
τiB
i −1∗
⎤
⎥ ⎥
⎥ ⎦ ,
(A6) where
U
i= A
i( A
τiA
i)
−1, T
n=
B
τ1U
1, . . . , B
τnU
nτ, and the asterisk ‘*’ denotes an omitted matrix block which is rather complicated and does not have any effect on the following deduction process.
Analyzing the formulas for the matrix blocks D
22and D
12= D
21, we get
D
−111+ D
−111D
12Ω
−1D
21D
11−1= !
I + D
−111C
nτV
n+1C
n+1"
D
11−1, (A7) where
V
n+1= A
n+1( A
τn+1A
n+1− A
τn+1C
n+1D
−111C
n+1τA
n+1)
−1A
τn+1, and
D
11−1C
nτV
n+1C
n+1= D
11−1B
n+1τA
n+1W
k+1−1A
τn+1B
n+10
0 0
=
L ˜
−1n+1B
n+1τA
n+1W
k+1−1A
τn+1B
n+10
−T
nL ˜
−1n+1B
n+1τA
n+1W
k+1−1A
τn+1B
n+10
, (A8) where
W
k+1= A
τn+1( I − B
n+1L ˜
−1n+1B
τn+1) A
n+1, L ˜
n+1=
n i=1B
iτ[ I − U
iA
τi] B
i+ B
n+1τB
n+1.
The matrix D
11−1can be expressed as follows:
D
−111= ( H
nτH
n)
−1− (H
nτH
n)
−1C
n+1τ×
I + C
n+1τ( H
nτH
n)
−1C
n+1τ −1× C
n+1( H
nτH
n)
−1. (A9) On second thoughts, using the notation
L
n=
n i=1B
iτI − A
i( A
τiA
i)
−1A
τiB
i,
we have
( H
nτH
n)
−1=
⎡
⎢ ⎢
⎢ ⎢
⎢ ⎢
⎣
ni=1
B
τ1B
iB
1τA
1. . . B
τnA
nA
τ1B
1A
τ1A
1. . . 0
.. . .. . . . . .. . A
τnB
n0 · · · A
τnA
n⎤
⎥ ⎥
⎥ ⎥
⎥ ⎥
⎦
−1