Decoupled Kalman filter based identification of time-varying FIR systems

(1)

Decoupled Kalman Filter Based Identification of Time-Varying FIR Systems

MARCIN CIOŁEK , (Member, IEEE), MACIEJ NIEDŹWIECKI , (Senior Member, IEEE), AND ARTUR GAŃCZA , (Member, IEEE)

Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, 80-233 Gdańsk, Poland Corresponding author: Marcin Ciołek (marciole@pg.edu.pl)

This work was supported in part by the National Science Center under Grant UMO-2018/29/B/ST7/00325.

ABSTRACT When system parameters vary at a fast rate, identification schemes based on model-free local estimation approaches do not yield satisfactory results. In cases like this, more sophisticated parameter tracking procedures must be used, based on explicit models of parameter variation (often referred to as hypermodels), either deterministic or stochastic. Kalman filter trackers, which belong to the second category, are seldom used in practice due to difficulties in adjusting their internal parameters such as the smoothness coefficient and the order of the hypermodel. The paper presents a new solution to this problem, based on the concept of preestimation of system parameters. The resulting identification algorithms, which can be characterized as decoupled Kalman trackers, are computationally attractive, easy to tune and can be optimized in an adaptive fashion using the parallel estimation approach. The decoupled KF algorithms can be regarded as an attractive alternative to the state-of-the-art algorithms which are much more computationally demanding.

INDEX TERMS Kalman filter, parallel estimation, preestimation of system parameters, system identification.

I. INTRODUCTION

Consider the problem of identification/tracking of a time-varying finite impulse response (FIR) system governed by

y(t) = ϕ

^T

(t) θ(t) + e(t) (1) where t = . . . , −1, 0, 1, . . . denotes discrete (normalized, i.e., dimensionless) time, y(t) denotes system output, ϕ(t) = [u(t − 1) , . . . , u(t − n)]

^T

is the regression vector made up of past measurements of the observable (locally) wide sense stationary input signal {u(t)}, {e(t)} denotes white measurement noise, and θ(t) = [θ

1

(t) , . . . , θ

n

(t)]

^T

is the parameter vector made up of unknown time-varying system coefficients, independent of {u(t)} and {e(t)}.

Linear time-varying FIR models are used, among others, to describe rapidly fading mobile communication channels.

Their identification allows one to efficiently solve the channel equalization (inverse filtering) problem [1], [2] or to miti- gate self-interference in full-duplex communication systems [3], [4]. The FIR structure describes well the so-called

The associate editor coordinating the review of this manuscript and approving it for publication was Claudia Raibulet .

multi-path effect: due to scattering the transmitted signal reaches the receiver along different paths, i.e., with different time delays; the values of FIR coefficients depend on the strength of ‘‘natural reflectors’’ and their time variation is caused by the receiver motion [1], [5].

When system parameters vary slowly with time, their estimation can be successfully carried out using the local estimation approach, such as the method of weighted least squares (WLS) [6], [7]. Since local methods fail in the presence of fast parameter changes, in cases like this more sophisticated frameworks must be used, such as those incorporating explicit models (hypermodels) of parameter variation.

Hypermodels can be deterministic or stochastic. In the first case, parameter trajectory is modeled as a linear combi- nation of known functions of time, called basis functions (BF), such as powers of time (polynomial basis) or sine and cosine functions (harmonic basis). The BF approach can be traced back to the paper of Subba Rao [8], which was fol- lowed by a large number of contributions exploring different aspects of the BF-based estimation [1]–[4], [7], [9]–[23].

This research covered different types of system/signal mod-

els, such as FIR [1]–[4], [22], [23], AR/ARX (autoregres-

sive/autoregressive with exogenous inputs) [8], [18]–[21] and

(2)

ARMA (autoregressive moving average) [12], [16], [19], and their various applications in telecommunications [1]–[4], biomedical signal analysis [18], [20], speech analysis [11]

and diagnostics of mechanical systems [19]. It involved different functional bases, such as polynomial [3], [4], [8]–[11], [15], [16], [21], [22], harmonic [1], [2], prolate spheroidal [12] and wavelets [17], [20]. For FIR systems and free choice of basis functions some pretty general analytical results, describing both static and dynamic parameter tracking characteristics of BF algorithms, were presented in [7], [14], [15], [22] and [23].

In the case of stochastic hypermodels it is assumed that system parameters change in a random way, e.g. that their evolu- tion can be described by a generalized random walk equation.

Then the problem of estimation of θ(t) can be expressed and solved as the problem of estimation of a state vector of an associated dynamical system. In such a setup parameter estimation can be carried out using appropriately designed Kalman filtering (KF) algorithms [24]–[31]. The available research results are focused on identification of ARX [24], [25] and FIR [31] systems, and AR signals [28]–[30], and on different approaches to tuning of KF trackers, such as the smoothness priors technique [27]–[29] or cross-validation [31]. Some interesting applications of KF-based identification algorithms can be found in [26], [29] and [30].

In spite of qualitative similarities, pointed out in Section II, the KF approach is less frequently used than the BF approach, mainly because of problems with tuning its design parameters. In the current contribution we propose decoupled ver- sions of the KF-based identification algorithms, free of the drawback mentioned above. Decoupling means that identification is carried out independently for each coefficient of the analyzed system. Such a solution is possible owing to a new estimation paradigm for identification of nonstationary stochastic systems, based on the concept of preestimation.

Preestimates are very noisy but unbiased estimates of parameter trajectories. They can be used to ‘‘X-ray’’ the structure of system parameter variation without making any assumptions about its functional form or speed. Due to large variability, preestimates must be postprocessed. We will use for this purpose Kalman filters/smoothers.

It will be shown that, unlike in the case of the classical KF-based parameter trackers, the estimation memory span of the proposed algorithm is data-independent and hence it can be easily controlled by the user. It will be also shown that optimization of the tracking performance of the decoupled KF algorithm, i.e., tuning of its design parameters to the unknown and possibly time-varying form and rate of parameter changes, can be achieved by means of parallel estimation and cross-validation. Finally, we will derive the simplified, steady state version of the proposed identification algorithm with reduced computational load (depend- ing linearly on the number of estimated parameters), which is a computationally attractive alternative to the current state-of-the-art.

II. CLASSICAL KALMAN FILTER BASED IDENTIFICATION ALGORITHMS

The integrated random walk (IRW) model of parameter variation is the most frequently used stochastic hypermodel [7].

The IRW model [25], [28] of order m is governed by

∇

^m

θ(t) = w(t) (2)

where ∇

^m

θ(t) denotes the m-th order difference of θ(t):

∇

^m

θ(t) = (1 − q

⁻¹

)

^m

θ(t) = P

^m_i=0

f

_i

θ(t − i), f

i

= (−1)

^{i m}_i

, i = 0 , . . . , m (q

⁻¹

is the backward shift operator), and {w(t)}

denotes a zero-mean i.i.d. sequence, independent of {e(t)}

and { ϕ(t)}. To reduce the number of degrees of freedom it is usually assumed that cov[w(t)] = σ

_w²

I, i.e., that the rate of variation is the same for all system coefficients. Generally, the larger the order m of the IRW model, the smoother the corresponding parameter trajectory.

Equations (1) and (2) can be put in the state space form [25], [28]

x(t) = F

_m

x(t − 1) + C

_m

w(t)

y(t) = ϕ

^T

(t)C

^T_m

x(t) + e(t) (3) where x(t) = [ θ

^T

(t) , θ

^T

(t − 1) , . . . , θ

^T

(t − m + 1)]

^T

is the mn × 1 regression vector,

F

_m

=







−f

1

I −f

2

I . . . −f

m−1

I −f

m

I

I O . . . O O

O O . . . I O





 is the mn × mn state transition matrix, and C

_m

= [I , O, . . . , O]

^T

is the mn × n output matrix (I and O denote n × n identity and null matrices, respectively).

Since θ(t) = C

^T_m

x(t), the problem of estimation of θ(t) can be formulated and solved as the problem of estimation of the state vector x(t) of the dynamical system (3). The optimal, in the mean squared sense, estimate of x(t) based on the available observation history (t) = {y(i), ϕ(i), i ≤ t} has the form b x(t|t) = E[x(t)| (t)]. Under Gaussian assumptions the conditional mean can be computed recursively using the Kalman filtering algorithm. Whenever noncausal estimation – based on the prerecorded data set (N), containing both ‘‘past’’ and ‘‘future’’ measurements (relative to t) – is feasible, the minimum-variance estimate has the form b x(t|N ) = E[x(t)| (N)] and can be evaluated using the algorithm known as Kalman smoother. The corresponding causal and noncausal estimates of θ(t) have the form

b θ(t|t) = C

^T_m

b x(t|t) , b θ(t|N) = C

^T_m

b x(t|N ) .

It is well known [7] that parameter tracking proper- ties of the Kalman filtering/smoothing algorithms based on the hypermodel (2) depend on the variance quotient ξ = σ

_w²

/σ

_e²

, further referred to as the smoothness coefficient (both Kalman filtering and Kalman smoothing algorithms can be written down in a normalized form which depends on ξ, rather than separately on σ

w²

and σ

e²

). Different values of ξ

Downloaded from mostwiedzy.pl

(3)

and m correspond to different estimation memory settings of KF algorithms. In practice ξ and m serve as user-dependent

‘‘knobs’’, allowing one to tune KF tracker/smoother to the rate of system nonstationarity, or in statistical terms – to trade off the bias and variance components of the mean squared parameter estimation error. Tuning can be achieved by running several KF algorithms, equipped with different hypermodel order and smoothness coefficients, and choosing at each time instant the estimates yielded by the algorithm which proves to be ‘‘locally the best’’ [32].

It is straightforward to show that when w(t) in (2) is set to zero, i.e., when ∇

^m

θ(t) = 0, system parameters must obey

θ

j

(t) =

m

X

l=1

a

_jl

t

^l−1

, j = 1, . . . , n. (4)

Hence, the IRW model can be regarded as a local, or per- turbed, power series model of parameter variation, which cor- responds to the deterministic BF hypermodel incorporating polynomial basis {1, t, t

²

, . . . , t

^m−1

} . This is an obvious point of tangency of the deterministic BF approach and the stochastic KF one. The difference lies in the way the polynomial description of parameter variation is time-localized. In the deterministic case the memory of the estimation algorithm is controlled by restricting the model to a local interval only.

In the stochastic case the same goal is achieved by adding to the polynomial generating function a random perturbation.

However, while the estimation memory of the BF algorithm can be easily evaluated and controlled by changing the local approximation range (see Section 7), the dependence of the estimation memory of the KF algorithm on ξ is not known and obscured by the fact that it depends also on the characteristics of the regression vector ϕ(t) [7]. This significantly complicates design of KF algorithms and KF-based parallel estimation schemes mentioned above.

III. PREESTIMATION TECHNIQUE

Preestimation is a technique introduced in [33] and further developed in [34]–[36]. Preestimates are raw parameter estimates, unbiased but with a very large variability. For this reason to obtain reliable parameter estimates, provid- ing satisfactory bias-variance trade-off, preestimates must be further postfiltered. As shown in [34], preestimates, further denoted by θ

^∗

(t), can be obtained by ‘‘inverse filtering’’

short-memory exponentially weighted least squares (EWLS) estimates, namely

θ

^∗

(t) = L

_t

b θ

^EWLS

(t) − λ

0

L

t−1

b θ

^EWLS

(t − 1) (5) where λ

0

, 0 < λ

0

< 1, denotes the so-called forgetting constant and L

_t

= P

t−1

i=0

λ

ⁱ₀

= λ

0

L

t−1

+ 1 denotes the effective width of the exponential window. EWLS estimates can be computed recursively [6]

ε(t) = y(t) − b θ

^T

(t − 1) ϕ(t) g(t) = R(t − 1) ϕ(t)

λ

0

+ ϕ

^T

(t)R(t − 1) ϕ(t)

b θ(t) = b θ(t − 1) + g(t)ε(t) R(t) = 1

λ

0

R(t − 1) − R(t − 1) ϕ(t)ϕ

^T

(t)R(t − 1) λ

0

+ ϕ

^T

(t)R(t − 1) ϕ(t)

. For large values of t the effective window width reaches its steady state value equal to L

∞

= 1 /(1 − λ

0

). In this case the preestimate (5) can be evaluated using the following simplified formula

θ

^∗

(t) = 1 1 − λ

0

[b θ

^EWLS

(t) − λ

0

b θ

^EWLS

(t − 1)] . (6) It can be shown that if the input signal {u(t)} is (locally) stationary, and the measurement noise {e(t)} is white, the preestimates defined in this way are approximately unbiased [34], namely

θ

^∗

(t) = θ(t) + z(t) (7)

where z(t) denotes (approximately) zero-mean white noise with large covariance matrix 6

z

.

Fig. 1 shows the preestimated parameter trajectories obtained for a nonstationary two-tap FIR system governed by y(t) = θ

1

(t)u(t − 1) + θ

2

(t)u(t − 2) + e(t) (8) excited by a zero-mean stationary autoregressive Gaussian process with autocorrelation function E[u(t)u(t −i)] = (0 .8)

ⁱ

, and corrupted by white Gaussian noise with variance σ

e²

= 0 .0025 (SNR = 25 dB). Parameter θ

1

(t) was changing in a chirp-like way, and parameter θ

2

(t) was piecewise constant – see Fig. 1. The forgetting constant λ

0

was set to 0.9.

Note that preestimates provide interesting insights into the structure of parameter variation, and they do so without making any assumptions about the speed and mode of parameter variation. Additionally, such a ‘‘prescreening’’ is provided separately for each system coefficient, which allows one to

FIGURE 1. Preestimated parameter trajectories (blue lines) of a nonstationary two-tap FIR system, superimposed on the true trajectories (red lines).

Downloaded from mostwiedzy.pl

(4)

individually adjust the postprocessing scheme (as different components of the parameter vector may require using different algorithm settings).

IV. DECOUPLED KALMAN FILTER BASED IDENTIFICATION ALGORITHMS

Consider the j-th component of the parameter vector θ(t).

As shown above the preestimate of θ

j

(t) can be written down in the form

θ

_j^∗

(t) = θ

j

(t) + z

_j

(t) (9) where z

j

(t) denotes zero-mean white ‘‘noise’’ with a large variance σ

z²

(for a stationary input signal the variance of z

j

(t) does not depend on j).

The proposed postprocessing will be based on Kalman filtering/smoothing. Similar to the classical case, we will assume that θ

j

(t) obeys the IRW model of order m

∇

^m

θ

j

(t) = w

_j

(t) (10) where w

_j

(t) denotes white perturbation noise with variance σ

w²

.

Rewriting (9) and (10) in the state space form, one obtains x

_j

(t) = F

m

x

_j

(t − 1) + c

m

w

j

(t)

θ

_j^∗

(t) = c

^T_m

x

_j

(t) + z

_j

(t) (11) where x

j

(t) = [θ

j

(t), θ

j

(t − 1), . . . , θ

j

(t − m + 1)]

^T

and c

m

= [1 , 0, . . . , 0]

^T

are m × 1 vectors and

F

_m

=







−f

₁

−f

₂

. . . −f

_m−1

−f

_m

1 0 . . . 0 0

0 0 . . . 1 0





 is a m × m matrix.

Let π = {m, ξ}, where ξ = σ

_w²

/σ

_z²

is a smoothness coefficient, which can be used for tuning purposes. Equations of the normalized version of Kalman filter, which constitute the causal identification algorithm, have the form

b x

_j|π

(t|t − 1) = F

_m

b x

_j|π

(t − 1|t − 1)

P

_π

(t|t − 1) = F

_m

P

_π

(t − 1|t − 1)F

^T_m

+ c

_m

c

^T_m

ξ ε

j|π

(t) = θ

_j^∗

(t) − c

^T_m

b x

_j|_π

(t|t − 1)

q

_π

(t) = c

^T_m

P

_π

(t|t − 1)c

_m

+ 1 k

_π

(t) = P

_π

(t|t − 1)c

m

q

_π

(t)

b x

_j|_π

(t|t) = b x

_j|_π

(t|t − 1) + k

_π

(t) ε

j|π

(t) P

_π

(t|t) = P

_π

(t|t − 1) − k

_π

(t)k

^T_π

(t)q

_π

(t) b θ

j|π

(t|t) = c

^T_m

b x

_j|π

(t|t)

t = 1 , 2, . . . (12)

where P

_π

(t|t − 1) = cov[ b x

_j|π

(t|t − 1)] /σ

z²

and P

_π

(t|t) = cov[ b x

_j|π

(t|t)] /σ

_z²

denote normalized a priori and a posteriori covariance matrices, respectively. Note that for fixed values of m and ξ, the quantities P

π

(t|t −1), P

_π

(t|t), k

_π

(t) and q

_π

(t) can be computed once for all values of j = 1 , . . . , n.

The estimation memory of the KF tracker can be quantified in terms of the noise reduction rate observed when θ

j

(t) = θ

_j⁰

= const, i.e., θ

_j^∗

(t) = θ

_j⁰

+ z

_j

(t). Its steady state value is given by

N

∞

= lim

t→∞

σ

_z²_j

var[b θ

j

(t|t)] (13) and can be easily evaluated numerically for different values of ξ and m. The quantity N

∞

can be interpreted as the width (number of taps) of the averaging filter which in the constant excitation case provides the same noise reduction rate as the KF algorithm. Unlike the classical estimation case, the noise- equivalent width N

∞

does not depend on the characteristics of the regression vector ϕ(t).

For m = 1 the following expressions

N

∞

∼ = 2

√ ξ , ξ ∼ =

2 N

∞

2

(14) can be easily derived by means of analyzing equations of the steady state Kalman filter – see Appendix A.

For m > 1 such calculations become cumbersome, but there are some analytical clues suggesting a possible inverse dependence of N

∞

on

^2m

√

ξ. For m = 1, 2, 3 the empirical for- mulas which provide a pretty good approximation of N

∞

( ξ) and ξ(N

∞

) have the form

N

∞

∼ = 2 − 0 .1(m − 1) m

^2m

√

ξ (15)

ξ ∼ = 2 − 0 .1(m − 1) mN

∞

2m

. (16)

When both ‘‘past’’ and ‘‘future’’ measurements are available, as in the channel identification case [1], more accurate parameter estimates can be obtained using the fixed interval Kalman smoother. Smoothing is achieved by means of backward-time processing of the KF estimates, which in the so-called Bryson-Frazier realization [7], [31] takes the form

B

_π

(t) = F

_m

[I − k

_π

(t)c

^T_m

] r

_π

(t − 1) = B

^T_π

(t)r

_π

(t) + c

_m

ε

j|π

(t)

q

_π

(t) R

_π

(t − 1) = B

^T_π

(t)R

_π

(t)B

_π

(t) + c

_m

c

^T_m

q

_π

(t)

b x

_j|_π

(t|N ) = b x

_j|_π

(t|t − 1) + P

_π

(t|t − 1)r

_π

(t − 1)

P

_π

(t|N ) = P

_π

(t|t − 1)−P

_π

(t|t − 1)R

_π

(t − 1)P

_π

(t|t − 1) b θ

j|π

(t|N ) = c

^T_m

b x

_j|_π

(t|N )

t = N − 1, . . . , 1 (17)

with initial conditions r

_π

(N ) = 0 and R

_π

(N ) = O. The steady state value of the estimation memory span is in this case equal to M

∞

= lim

_t→∞

σ

_z²_j

/var[ b θ

j

(t|2t)] = 2 mN

∞

. The values of the smoothness coefficient ξ corresponding to the selected values of the estimation memory spans N

∞

and M

∞

, and to the order of the IRW model m, are listed in Table 1.

Downloaded from mostwiedzy.pl

(5)

TABLE 1. The values of the smoothness coefficientξ corresponding to the selected values of the estimation memory spans N∞(upper table) and M∞(lower table), and to the order of the IRW model m.

V. SIMPLIFIED ALGORITHMS

To further reduce computational load of the KF tracker, the algorithm (12) can be replaced by its steady state version.

Denote by P

^∞_π

and k

^∞_π

the steady state values of the a priori covariance matrix and Kalman gain, respectively

P

^∞_π

= lim

t→∞

P

_π

(t|t − 1) , k

^∞_π

= P

^∞_π

c

_m

1 + c

^T_m

P

^∞_π

c

_m

. The steady state version of (12) takes the form

b x

_j|π

(t|t − 1) = F

_m

b x

_j|π

(t − 1|t − 1) ε

j|π

(t) = θ

_j^∗

(t) − c

^T_m

b x

_j|_π

(t|t − 1) b x

_j|π

(t|t) = b x

_j|π

(t|t − 1) + k

^∞_π

ε

j|π

(t)

b θ

j|π

(t|t) = c

^T_m

b x

_j|π

(t|t)

t = 1 , 2, . . . (18)

The steady state values P

^∞_π

and k

^∞_π

can be obtained by solving the associated Riccati equation, or by iterating KF equations for a sufficiently long time (in the second case the so-called doubling algorithms can be used to reduce the number of iterations [37]).

Similarly, the smoothing algorithm (17) can be reduced down to

r

_π

(t − 1) = [B

^∞_π

]

^T

r

_π

(t) + c

m

ε

j|π

(t) q

^∞_π

b x

_j|_π

(t|N ) = b x

_j|_π

(t|t − 1) + P

^∞_π

r

_π

(t − 1)

b θ

j|π

(t|N ) = c

^T_m

b x

_j|_π

(t|N )

t = N − 1, . . . , 1 (19) where

B

^∞_π

= F

_m

[I − k

^∞_π

c

^T_m

] , q

^∞_π

= 1 + c

^T_m

P

^∞_π

c

_m

. Exploiting symmetry of all covariance matrices and a special structure of the matrix F

_m

and the vector c

_m

, the computational cost of running n filtering algorithms (12) can be reduced to

³₂

m

²

+

⁵

2

m + 2mn multiply-add operations per time update. The same count for the smoothing algorithm (17) yields 3m

³

+ m

²

+ m + (2m

²

+ 1)n operations per time update. When the matrices/vectors are precomputed, or when their steady state values are used, the computational load is further reduced to 2mn (filtering) and (2m

²

+ 1)n (smoothing)

operations, respectively. Note that in both cases the number of operations depends linearly on the number of estimated coefficients.

VI. PARALLEL ESTIMATION

As already mentioned in Section II, tuning of the identification algorithm to the unknown, and possibly time-varying form and rate of parameter changes can be achieved by means of parallel estimation. Consider L KF-based identification algorithms, equipped with different settings π = {m, ξ} ∈ 5 = {π

1

, . . . , π

L

} , π

i

= {m

_i

, ξ

i

} , i = 1 , . . . , L, yielding the estimates

b θ

π

(t|t) = [b θ

1|π

(t|t) , . . . , b θ

n|π

(t|t)]

^T

b θ

_π

(t|N ) = [b θ

1|π

(t|N ) , . . . , b θ

n|π

(t|N )]

^T

l = 1 , . . . , L.

The algorithms are run simultaneously and at each time instant only one of the competing estimates is selected, i.e., the estimated parameter trajectory has the form

b θ(t|t) = b θ

_bπ(t)

(t|t) , b θ(t|N) = b θ

_bπ(t)

(t|N ) where

b π(t) = { b m(t) ,b ξ(t)} = arg min

π∈5

J

_π

(t) (20)

and J

_π

(t) denotes the local performance index.

In the causal (filtering) case, selection can be based on minimization of the sum of squared one-step-ahead output prediction errors observed in the recent past [32]

J

_π

(t) =

I

X

i=0

ε

²_π

(t − i) (21)

where

ε

π

(t) = y(t) − ϕ

^T

(t)b θ

π

(t|t − 1) (22) and

b θ

_π

(t|t − 1) = [b θ

1|π

(t|t − 1) , . . . , b θ

n|π

(t|t − 1)]

^T

b θ

j|π

(t|t − 1) = c

^T_m

b x

_j|_π

(t|t − 1) , j = 1, . . . , n.

In the noncausal (smoothing) case, one can use the approach known as cross-validation. In this approach prediction errors are replaced with leave-one-out output interpolation errors (sometimes referred to as deleted residuals)

η

_π^◦

(t) = y(t) − ϕ

^T

(t)b θ

^◦_π

(t|N ) (23) where

b θ

^◦_π

(t|N ) = [b θ

_1|^◦_π

(t|N ) , . . . , b θ

_n|π^◦

(t|N )]

^T

denotes the holey estimate of θ(t), i.e., the one that excludes from the estimation process the interpolated sample y(t)

b θ

_j|^◦_π

(t|N ) = c

^T_m

E [x

_j

(t)|

^◦

(N )]

^◦

(N ) = (N) − {y(t)}.

Downloaded from mostwiedzy.pl

(6)

TABLE 2. The values of the width K = 2k + 1 of the analysis interval corresponding to the selected values of the estimation memory span M∞, and to the number of basis functions m.

It can be shown that (see Appendix B)

b θ

_j|^◦_π

(t|N ) = b θ

j|π

(t|N ) − β

π

(t|N ) θ

_j^∗

(t)

1 − β

π

(t|N ) (24)

where

β

_π

(t|N ) = c

^T_m

P

_π

(t|N )c

_m

. More generally

b θ

^◦_π

(t|N ) = b θ

π

(t|N ) − β

π

(t|N ) θ

^∗

(t)

1 − β

π

(t|N ) (25)

which means that there is no need to implement the holey estimation scheme in order to evaluate interpolation errors (23). When the steady state version of the KF algorithm is used, the formula (24) can be further simplified by replacing the time-dependent coefficient β

_π

(t|N ) with its asymptotic (constant) version β

_π^∞

= c

^T_m

e P

^∞_π

c

_m

, where e P

^∞_π

= lim

t→∞

P

_π

(t|2t).

The local cross-validation decision statistic can be defined in the following way

J

_π

(t) =

I/2

X

i=−I/2

[ η

^◦_π

(t + i)]

²

. (26) The recommended values of I are those from the interval [20, 50] for the filtering algorithm, and from the interval [50, 100] for the smoothing algorithm. Another practical advice concerns selection of smoothness coefficients: for a fixed value of m the competing values of ξ should be chosen in such a way that the corresponding memory spans N

∞

and M

∞

form a geometric progression [32], [38], e.g. N

∞

, M

∞

= 20 , 40, 80, 160 etc.

VII. COMPUTER SIMULATIONS

In our simulation study we will focus on comparison of the noncausal KF algorithm (18)-(19) and simplified cross-validation decision rule incorporating β

_π^∞

, with the state-of-the-art BF smoother proposed recently in [23]. The noncausal BF estimates can be obtained by minimizing the local sum of squared output modeling errors

b θ

m|k

(·) = arg min

θ(·) k

X

i=−k

[y(t + i) − ϕ

^T

(t + i) θ(t + i)]

²

(27) under the constraints

θ

j

(t + i) =

m

X

l=1

a

_jl

i

^l−1

, i ∈ T

k

(t) , j = 1, . . . , n (28)

TABLE 3. Mean squared parameter estimation errors obtained for 12 noncausal basis function and Kalman filter estimators with different memory settings (M∞) and different orders of the hypermodel (m), and for the corresponding adaptive parallel estimation schemes (A). Simulations were carried out for 2 signal-to-noise ratios (SNR) and 3 speeds of parameter variation (SoV). All averages were computed for 100 process realizations.

Downloaded from mostwiedzy.pl

(7)

where T

_k

(t) = [t −k , t +k] denotes the local analysis interval of width K = 2k +1, centered at t. Even though such identification scheme allows one to estimate the entire segment of the parameter trajectory { θ(s), s ∈ T

k

(t)}, it is used to generate a sequence of point estimates instead of interval ones, i.e., only the ‘‘central’’ estimate b θ

m|k

(t) is utilized at the instant t, and the estimation process is repeated for a new position of the sliding analysis window T

_k

(t). The resulting local basis function (LBF) algorithm can be regarded as a generalization – to the system identification case – of the classical signal smoothing technique known as Savitzky-Golay filtering [39], [40].

The estimation memory of the LBF algorithm can be obtained from

M

∞

=

"

_k

X

i=−k

h

²_m|k

(i)

#

⁻¹

(29)

where h

m|k

(i) denotes the so-called impulse response associated with the LBF estimator [23]

h

m|k

(i) = f

^T_m

(0)

"

_k

X

i=−k

f

_m

(i)f

^T_m

(i)

#

⁻¹

f

_m

(i) (30)

and f

_m

(i) = [1 , i, . . . , i

^m−1

]

^T

. The values of the width K = 2k + 1 of the analysis interval corresponding to the selected values of M

∞

and to the order m of the polynomial basis are shown in Table 2.

The identified nonstationary system was that presented ear- lier in Section III. In addition to the medium speed parameter variation scenario, depicted in Fig. 1, the two times faster and two times slower variations were considered, obtained by ’’resampling’’ the parameter trajectory without changing its shape. As a result, the length of the simulation interval T

_s

was equal to 1000, 2000 and 4000 for fast, medium speed and slow changes, respectively. To check behavior of the compared algorithms under different noise conditions, two average signal-to-noise ratios were considered: 25 dB ( σ

_e²

= 0 .0025) and 15 dB (σ

e²

= 0 .025).

The KF-based parallel estimation scheme was made up of 12 smoothers designed for 3 IRW models of orders m = 1 , 2, 3 and 4 estimation memory spans M

^∞

= 20 , 40, 80, 160. Analogously, the adaptive BF algorithm combined 12 LBF smoothers designed for 3 polynomial bases of order m = 1 , 2, 3 and 4 estimation memory spans M

∞

= 20 , 40, 80, 160. In both cases the width of the decision window was set to I + 1 = 61. Under such conditions the KF and BF approaches can be expected to behave com- parably, as the corresponding hypermodels and their memory settings are statistically compatible. Data generation was started 1000 instants prior to t = 1 and was continued for 1000 instants after t = T

_s

, so that, irrespective of settings, the estimation process and evaluation of its results could be, for all algorithms, started at the instant 1 and ended at the instant T

_s

. For t < 1 and t > T

s

system parameters were constant and equal to θ(1) and θ(T

s

), respectively.

FIGURE 2. Identification results obtained using the adaptive LBF algorithm (two upper plots) and adaptive decoupled KF algorithm (two lower plots). The estimated trajectories (blue lines), obtained by postprocessing the preestimates shown in Fig.1, are superimposed on the true trajectories (red lines).

Table 3 presents estimation results - the squared parameter estimation errors k b θ(t) − θ(t) k

²

averaged over time and over 100 process realizations. The best results in each group are shown in boldface. Noticeably, in all cases considered, the adaptive parallel estimation algorithms yielded results that were better than those provided by their component algorithms with fixed settings. In the case of low SNR the adaptive KF algorithm yielded slightly better results than the adaptive LBF algorithm, while for the higher SNR the adaptive LBF performed slightly better than the KF one.

Overall, however, for the same values of M

∞

and m both types

Downloaded from mostwiedzy.pl

(8)

of algorithms yielded a similar tracking performance. Typi- cal identification results yielded by the adaptive scheme are shown in Fig. 2. As expected, the results yielded by the deterministic BF approach are comparable with those obtained for the stochastic KF approach. This makes the decoupled KF algorithm a computationally attractive alternative to the LBF algorithm. The high computational requirements of the LBF algorithm, of order O(m

³

n

³

), are due to the fact that it requires evaluation and inversion, every time instant t, of the mn × mn-dimensional generalized regression matrix G

_m|k

(t) = P

k

i=−k

ψ

m

(t , i)ψ

^Tm

(t , i), where ψ

m

(t , i) = ϕ(t + i) ⊗ f

_m

(i) and ⊗ denotes the Kronecker product.

Remark: We note that a meaningful comparison of the proposed decoupled KF approach with the classical one is not possible since without equalization of the estimation memory spans of the compared algorithms – which cannot be performed due to limitations of the classical scheme, mentioned in Section II – such comparison would not make much sense.

VIII. CONCLUSION

The problem of identification of a nonstationary FIR system was considered and solved using the decoupled Kalman filtering/smoothing approach. Unlike the classical KF-based solutions, the proposed estimation algorithms are easy to tune. It was shown that optimization of their tracking performance can be carried out using parallel estimation and cross- validation. Due to low computational requirements (achieved without compromising good tracking capabilities) the decoupled KF algorithms can be regarded as an attractive alternative to the state-of-the-art algorithms based on functional series approximation, known as basis function algorithms.

APPENDIX A DERIVATION OF (14)

Let m = 1. The steady state value of the a priori variance p

^∞_π

= lim

_t→∞

p

_π

(t|t −1) is a solution of the Riccati equation

p

^∞_π

= p

^∞_π

1 + p

^∞_π

+ ξ leading to

p

^∞_π

= ξ + pξ

²

+ 4 ξ 2

∼ = p ξ

where the approximation holds for ξ

j

1 (cf. Table 1).

Without any loss of generality, one can assume that θ

_j⁰

= 0.

In this case the steady state relationship between b θ

j|π

(t|t) = b x

_j|_π

(t|t) and z

_j

(t) has the form

b θ

j|π

(t|t) = (1 − k

_π^∞

)b θ

j|π

(t − 1|t − 1) + k

_π^∞

z

j

(t) where k

_π^∞

= p

^∞_π

/(1 + p

^∞_π

) ∼ =

√ ξ denotes the steady state Kalman gain. Based on this relationship, one obtains

σ

_θ²_j

= k

_π^∞

2 − k

_π^∞

σ

_z²_j

∼ = k

_π^∞

2 σ

_z²_j

which leads to (14).

APPENDIX B DERIVATION OF (24)

Let (t) = {y(i), ϕ(i), 1 ≤ i ≤ t} and

⁺

(t) = {y(i) , ϕ(i), t ≤ i ≤ N}. Using the Mayne-Fraser two-filter formula, the smoothed estimate b x

_j|_π

(t|N ) can be expressed in the form [41]

b x

_j|_π

(t|N ) = P

_π

(t|N )[P

_π

(t|t)]

⁻¹

b x

_j|_π

(t|t) + [P

⁺_π

(t|t + 1)]

⁻¹

b x

⁺_j|_π

(t|t + 1) P

_π

(t|N ) = [P

_π

(t|t)]

⁻¹

+ [P

⁺_π

(t|t + 1)]

⁻¹

−1

(31) where b x

⁺_j|π

(t|t + 1) = E[x

_j

(t)|

⁺

(t + 1)] denotes the backward-time (anticausal) predictor, designed for the back- wards Markovian model of x

_j

(t) and operated in reverse time. Similarly, P

⁺_π

(t|t +1) denotes the associated covariance matrix.

The analogous expression for the holey estimator of x

_j

(t) is b x

^◦_j|_π

(t|N ) = P

^◦_π

(t|N )[P

_π

(t|t − 1)]

⁻¹

b x

_j|π

(t|t − 1)

+ [P

⁺_π

(t|t + 1)]

⁻¹

b x

⁺_j|_π

(t|t + 1) P

^◦_π

(t|N ) = [P

_π

(t|t − 1)]

⁻¹

+ [P

⁺_π

(t|t + 1)]

⁻¹

−1

. (32) In order to prove (24), we first notice that

P

_π

(t|t) = P

_π

(t|t − 1) − P

_π

(t|t − 1)c

_m

c

^T_m

P

_π

(t|t − 1) 1 + c

^T_m

P

_π

(t|t − 1)c

_m

= P

⁻¹_π

(t|t − 1) + c

_m

c

^T_m

−1

(33) where the transition follows from the well-known matrix inversion lemma [6]. Combining (31) - (33) and using matrix inversion lemma again, one arrives at

P

^◦_π

(t|N ) = P

⁻¹_π

(t|N ) − c

m

c

^T_m

−1

= P

_π

(t|N ) + P

_π

(t|N )c

_m

c

^T_m

P

_π

(t|N ) 1 − β

_π

(t|N ) (34) where β

π

(t|N ) = c

^T_m

P

_π

(t|N )c

m

. Furthermore, since [cf.

(33)]

P

⁻¹_π

(t|t) = P

⁻¹_π

(t|t − 1) + c

_m

c

^T_m

(35) and [cf. (12)]

b x

_j|_π

(t|t) = b x

_j|_π

(t|t − 1) + P

_π

(t|t − 1)c

_m

1 + c

^T_m

P

_π

(t|t − 1)c

_m

[ θ

_j^∗

(t) − c

^T_m

b x

_j|_π

(t|t − 1)] (36) after straightforward calculations, one arrives at

P

⁻¹_π

(t|t) b x

_j|π

(t|t) = P

⁻¹_π

(t|t − 1) b x

_j|π

(t|t − 1)+c

_m

θ

_j^∗

(t) . (37) Substituting (34) and (37) into (32), one obtains

b x

^◦_j|_π

(t|N ) =

P

_π

(t|N ) + P

_π

(t|N )c

_m

c

^T_m

P

_π

(t|N ) 1 − β

π

(t|N )

× [P

⁻¹_π

(t|N ) b x

_j|_π

(t|N ) − c

_m

θ

_j^∗

(t)]

Downloaded from mostwiedzy.pl

(9)

= b x

_j|π

(t|N ) + P

_π

(t|N )c

_m

c

^T_m

1 − β

_π

(t|N ) b x

_j|π

(t|N )

− P

_π

(t|N )c

_m

θ

_j^∗

(t) − β

π

(t|N )P

_π

(t|N )c

_m

1 − β

π

(t|N ) θ

_j^∗

(t) .

(38) Finally, after multiplying both sides of (38) by c

^T_m

and noting that c

^T_m

b x

^◦_j|_π

(t|N ) = b θ

_j|^◦_π

(t|N ) and c

^T_m

b x

_j|π

(t|N ) = b θ

j|π

(t|N ), one obtains (24).

ACKNOWLEDGMENT

Computer simulations were carried out at the Academic Computer Centre in Gdańsk.

REFERENCES

[1] M. K. Tsatsanis and G. B. Giannakis, ‘‘Modelling and equalization of rapidly fading channels,’’ Int. J. Adapt. Control Signal Process., vol. 10, nos. 2–3, pp. 159–176, 1996.

[2] J. Bakkoury, D. Roviras, M. Ghogho, and F. Castanie, ‘‘Adaptive MLSE receiver over rapidly fading channels,’’ Signal Process., vol. 80, no. 7, pp. 1347–1360, Jul. 2000.

[3] L. Shen, Y. Zakharov, B. Henson, N. Morozs, and P. D. Mitchell, ‘‘Adaptive filtering for full-duplex UWA systems with time-varying self-interference channel,’’ IEEE Access, vol. 8, pp. 187590–187604, Oct. 2020.

[4] L. Shen, B. Henson, Y. Zakharov, and P. Mitchell, ‘‘Digital self- interference cancellation for full-duplex underwater acoustic systems,’’

IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67, no. 1, pp. 192–196, Jan. 2020.

[5] M. Stojanovic and J. Preisig, ‘‘Underwater acoustic communication chan- nels: Propagation models and statistical characterization,’’ IEEE Commun.

Mag., vol. 47, no. 1, pp. 84–89, Jan. 2009.

[6] T. Söderström and P. Stoica, System Identification. Englewoods Cliffs, NJ, USA: Prentice-Hall, 1988.

[7] M. Niedźwiecki, Identification of Time-varying Processes. New York, NY, USA: Wiley, 2000.

[8] T. S. Rao, ‘‘The fitting of nonstationary time-series models with time- dependent parameters,’’ J. Roy. Stat. Soc., Ser. B (Methodol.), vol. 32, no. 2, pp. 312–322, 1970.

[9] J. M. Mendel, Discrete Techniques of Parameter Estimation: The Equation Error Formulation. New York, NY, USA: Marcel Dekker, 1973.

[10] L. A. Liporace, ‘‘Linear estimation of nonstationary signals,’’ J. Acoust.

Soc. Amer., vol. 58, no. 6, pp. 1288–1295, Dec. 1975.

[11] M. G. Hall, A. V. Oppenheim, and A. S. Willsky, ‘‘Time-varying para- metric modeling of speech,’’ Signal Process., vol. 5, no. 3, pp. 267–285, May 1983.

[12] Y. Grenier, ‘‘Time-dependent ARMA modeling of nonstationary signals,’’

IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-31, no. 4, pp. 899–911, Aug. 1983.

[13] R. Charbonnier, M. Barlaud, G. Alengrin, and J. Menez, ‘‘Results on AR-modelling of nonstationary signals,’’ Signal Process., vol. 12, no. 2, pp. 143–151, Mar. 1987.

[14] M. Niedzwiecki, ‘‘Functional series modeling approach to identification of nonstationary stochastic systems,’’ IEEE Trans. Autom. Control, vol. 33, no. 10, pp. 955–961, Oct. 1988.

[15] M. Niedzwiecki, ‘‘Recursive functional series modeling estimators for identification of time-varying plants-more bad news than good?’’ IEEE Trans. Autom. Control, vol. 35, no. 5, pp. 610–616, May 1990.

[16] R. B. Mrad, S. D. Fassois, and J. A. Levitt, ‘‘A polynomial-algebraic method for non-stationary TARMA signal analysis–Part I: The method,’’

Signal Process., vol. 65, pp. 1–19, Feb. 1998, doi:10.1016/S0165-1684 (97)00145-X.

[17] H.-L. Wei and S. A. Billings, ‘‘Identification of time-varying systems using multiresolution wavelet models,’’ Int. J. Syst. Sci., vol. 33, no. 15, pp. 1217–1228, Jan. 2003.

[18] R. Zou, H. Wang, and K. H. Chon, ‘‘A robust time-varying identification algorithm using basis functions,’’ Ann. Biomed. Eng., vol. 31, no. 7, pp. 840–853, Jul. 2003.

[19] A. G. Poulimenos and S. D. Fassois, ‘‘Parametric time-domain methods for non-stationary random vibration modelling and analysis—A critical survey and comparison,’’ Mech. Syst. Signal Process., vol. 20, no. 4, pp. 763–816, May 2006.

[20] H.-L. Wei, S. A. Billings, and J. J. Liu, ‘‘Time-varying parametric modelling and time-dependent spectral characterisation with applications to EEG signals using multiwavelets,’’ Int. J. Model., Identificat. Control, vol. 9, no. 3, pp. 215–224, 2010.

[21] S. C. Chan and Z. G. Zhang, ‘‘Local polynomial modeling and vari- able bandwidth selection for time-varying linear systems,’’ IEEE Trans.

Instrum. Meas., vol. 60, no. 3, pp. 1102–1117, Mar. 2011.

[22] M. Niedzwiecki and S. Gackowski, ‘‘New approach to noncausal identification of nonstationary stochastic FIR systems subject to both smooth and abrupt parameter changes,’’ IEEE Trans. Autom. Control, vol. 58, no. 7, pp. 1847–1853, Jul. 2013.

[23] M. Niedzwiecki and M. Ciołek, ‘‘Generalized Savitzky–Golay filters for identification of nonstationary systems,’’ Automatica, vol. 108, Oct. 2019, Art. no. 108477.

[24] D. Q. Mayne, ‘‘Optimal non-stationary estimation of the parameters of a linear system with Gaussian inputs,’’ J. Electron. Control, vol. 14, no. 1, pp. 101–112, Jan. 1963.

[25] J. P. Norton, ‘‘Optimal smoothing in the identification of linear time- varying systems,’’ Proc. Inst. Elect. Eng., vol. 122, pp. 663–668, Jun. 1975.

[26] T. Bohlin, ‘‘Four cases of identification of changing systems,’’ in Mathematics in Science and Engineering, vol. 126, R. K. Mehra and D. G. Lainiotis, Eds. Amsterdam, The Netherlands: Elsevier, 1976, pp. 441–518.

[27] H. Akaike, ‘‘Likelihood and the Bayes procedure,’’ in Bayesian Statistics, J. M. Bernado, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, Eds.

Valencia, Spain: Univ. Press, 1980, pp. 143–166.

[28] G. Kitagawa and W. Gersch, ‘‘A smoothness priors time-varying AR coefficient modeling of nonstationary covariance time series,’’ IEEE Trans.

Autom. Control, vol. AC-30, no. 1, pp. 48–56, Jan. 1985.

[29] X.-Q. Jiang and G. Kitagawa, ‘‘A time varying coefficient vector AR mod- eling of nonstationary covariance time series,’’ Signal Process., vol. 33, no. 3, pp. 315–331, Sep. 1993.

[30] M. Arnold, X. H. R. Milner, H. Witte, R. Bauer, and C. Braun, ‘‘Adap- tive AR modeling of nonstationary time series by means of Kalman filtering,’’ IEEE Trans. Biomed. Eng., vol. 45, no. 5, pp. 553–562, May 1998.

[31] M. Niedzwiecki, ‘‘Locally adaptive cooperative Kalman smoothing and its application to identification of nonstationary stochastic systems,’’ IEEE Trans. Signal Process., vol. 60, no. 1, pp. 48–59, Jan. 2012.

[32] M. Niedzwiecki, ‘‘Identification of nonstationary stochastic systems using parallel estimation schemes,’’ IEEE Trans. Autom. Control, vol. 35, no. 3, pp. 329–334, Mar. 1990.

[33] M. Niedźwiecki and T. Kłaput, ‘‘Fast recursive basis function estimators for identification of nonstationary systems,’’ IEEE Trans. Signal Process., vol. 50, pp. 1925–1934, 2002.

[34] M. J. Niedzwiecki, M. Ciołek, and A. Gancza, ‘‘A new look at the statistical identification of nonstationary systems,’’ Automatica, vol. 118, Aug. 2020, Art. no. 109037.

[35] M. Niedzwiecki, M. Ciolek, A. Gancza, and P. Kaczmarek, ‘‘A new method of noncausal identification of time-varying systems,’’ in Proc. Signal Process., Algorithms, Archit., Arrangements, Appl. (SPA), Poznań, Poland, Sep. 2020, pp. 48–52.

[36] M. Niedzwiecki, A. Gancza, and M. Ciolek, ‘‘On the preestimation technique and its application to identification of nonstationary systems,’’ in Proc. 59th IEEE Conf. Decis. Control (CDC), Jeju Island, Republic of Korea, Dec. 2020, pp. 286–293.

[37] N. Assimakis and M. Adam, ‘‘Iterative and algebraic algorithms for the computation of the steady state Kalman filter gain,’’ Int. Scholarly Res.

Notices, vol. 2014, May 2014, Art. no. 417623.

[38] M. Niedzwiecki, M. Ciołek, and Y. Kajikawa, ‘‘On adaptive covariance and spectrum estimation of locally stationary multivariate processes,’’

Automatica, vol. 82, pp. 1–12, Aug. 2017.

[39] A. Savitzky and M. J. E. Golay, ‘‘Smoothing and differentiation of data by simplified least squares procedures,’’ Anal. Chem., vol. 36, no. 8, pp. 1627–1639, Jul. 1964.

[40] R. W. Schafer, ‘‘What is a Savitzky-Golay filter?’’ IEEE Signal Process.

Mag., vol. 28, no. 4, pp. 111–117, Jul. 2011.

[41] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Englewoods Cliffs, NJ, USA: Prentice-Hall, 1979.

Downloaded from mostwiedzy.pl

(10)

MARCIN CIOŁEK (Member, IEEE) received the M.Sc. and Ph.D. degrees from the Gdańsk Uni- versity of Technology (GUT), Gdańsk, Poland, in 2010 and 2017, respectively. Since 2017, he has been working as an Adjunct Professor with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics, GUT. His research interests include speech, music, and biomedical signal processing.

MACIEJ NIEDŹWIECKI (Senior Member, IEEE) received the M.Sc. and Ph.D. degrees from the Technical University of Gdańsk, Gdańsk, Poland, in 1977 and 1981, respectively, and the Dr.Hab.

(D.Sc.) degree from the Technical University of Warsaw, Warsaw, Poland, in 1991.

From 1986 to 1989, he was a Research Fel- low with the Department of Systems Engineer- ing, Australian National University. From 1990 to 1993, he worked as the Vice Chairman of the Technical Committee on Theory of the International Federation of Automatic Control (IFAC). He currently works as a Professor and the Head of the Department of Automatic Control, Faculty of Electronics, Telecommunica- tions and Informatics, Gdańsk University of Technology. He is the author of the book Identification of Time-varying Processes (Wiley, 2000). His main research interests include system identification, statistical signal processing, and adaptive systems. He is also a member of the IFAC committees on Modeling, Identification and Signal Processing and on Large Scale Complex Systems, and a member of the Automatic Control and Robotics Committee of the Polish Academy of Sciences (PAN).

ARTUR GAŃCZA (Member, IEEE) received the M.Sc. degree from the Gdańsk University of Tech- nology (GUT), Gdańsk, Poland, in 2019, where he is currently pursuing the Ph.D. degree with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informat- ics. His research interests include speech recog- nition, system identification, and adaptive signal processing.

Downloaded from mostwiedzy.pl