Reliable LDA-spectra by resampling and ARMA-modeling

(1)

Reliable LDA-Spectra by

Resampling and ARMA-Modeling

Stijn de Waele and Piet M. T. Broersen

Abstract—Laser-Doppler Anemometry (LDA) is used to mea-sure the velocity of gases and liquids with observations ir-regularly spaced in time. Equidistant resampling turns out to be better than slotting techniques. After resampling, two ways of spectral estimation are compared. The first estimate is a windowed periodogram and the second is the spectrum of a time series model. That is an estimated autoregressive moving average (ARMA) process whose orders are automatically selected from the data with an objective statistical criterion. Typically, the ARMA spectrum is better than the best windowed periodogram. Index Terms— Fluid flow measurement, irregular sampling, resampling, spectral analysis, time series analysis.

I. INTRODUCTION

L

ASER-DOPPLER anemometry (LDA) is a nonintrusive way to measure the velocity of gases and liquids. As a consequence of the measurement technique, the samples of the velocity signal are irregularly spaced in time. Moreover, the time between samples may depend on the current value of the velocity [1], [2]. This causes bias in the estimation of many signal properties, like the average velocity. The average of the data is biased upwards, because more samples are taken when the velocity is high. The characteristics of the process can be estimated properly when the sampling rate is high enough to reconstruct the original signal by interpolation [3]. Therefore, the discussion is restricted to signals with a sufficiently high sampling rate.

The main methods for spectral analysis are parametric (or time series models) and nonparametric (or tapered and win-dowed Fourier transform) [4]. Generally time series models are to be preferred if the true model type and order of the process under investigation are known. However, application of the wrong model type or the wrong order can produce completely erroneous spectral estimates. Continuous autoregressive (AR) processes have been related to irregularly sampled data with a stochastic integral [5]. Recently, a time series alternative to a windowed periodogram has been developed for the description of equidistant stationary stochastic processes [6]. With estimation algorithms and order selection criteria an AR model is selected [7] as well as a moving average (MA) model [8] and a combined autoregressive moving average (ARMA) model [9]. From those three models, the single model with the best predictive properties is selected using statistical criteria. ARMA-modeling as well as the

Manuscript received April 27, 1998; revised September 2, 1999. The authors are with the Department of Applied Physics, Delft University of Technology, 2600 GA Delft, The Netherlands (e-mail: broersen@tn.tudelft.nl).

Publisher Item Identifier S 0018-9456(99)09236-0.

windowed periodogram require regularly sampled data, also when applied to LDA-data. Two ways of extracting regularly sampled data from the irregularly sampled LDA-measurements are slotting [10] and resampling [1].

A simulation study has been carried out to compare alter-native solutions to two problems:

• slotting or resampling for the extraction of regularly sampled data;

• ARMA-modeling or windowed periodogram for the esti-mation of the spectrum.

The paper describes a reliable way to estimate LDA-spectra. The spectra estimated with resampling followed by ARMA-modeling are a close fit to the true spectrum when the sampling rate is high enough. ARMA modeling also extracts the most accurate estimate of the integral of the covariance function [11]. This quantity is the integral time scale in turbulent flow [1].

II. SLOTTING OR RESAMPLING

A major problem in LDA signal processing is the occur-rence of velocity bias [1]. This bias is caused by the higher probability that particles with a higher velocity are detected in the measurement volume. Simple solutions called controlled processor or saturable detector [1], discard observations to improve the regularity of the measurements at the cost of loss of statistical accuracy.

A slotting technique to obtain a covariance estimate on a regular sampling basis which can be transformed with the usual efficient FFT algorithms [10] has been developed. Also direct transforms for spectral estimation have been studied [12]. An important conclusion is that the statistical accuracy of irregularly sampled data is inferior to that obtained if regular sampling intervals could be used [10]. The first generation of slotting algorithms produce a very rough covariance estimate. Moreover, the estimated covariance function is not positive definite, which is a requirement for obtaining useful spectra. The Fourier transform of a function that is not positive definite can become negative and negative spectral densities cannot exist.

In the proposed slotting technique, use of a variable spectral window and of local normalization gives much better results in some simulations [13]. Use of these modifications separately gives no improvement; only their combination is shown to be beneficial. However, with this new slotting technique, the estimated covariance function is not automatically positive definite, as is easily demonstrated with an example. With

(2)

local normalization [13], which takes into account the variance of the observations that contribute to the correlation in a particular slot, the correlation estimate for the last possible slot equals 1, as is the case for every slot where only a single pair of observations contribute. When this estimate is used, the estimated correlation function no longer positive definite. In other words, the choice of the window length determines whether the spectral estimate produced by this improved slotting technique is an acceptable spectrum.

The positive definite property is also very important for turbulence estimation. If the covariance is not positive definite, the spectral estimate can become zero for a certain frequency, its logarithm approaches and the spectral estimate around that frequency is completely determined by the actions that are taken to find some nonzero value. In particular, the slopes of the spectral estimate in that frequency region have no relation with the physical spectrum. No good variant of the slotting technique guaranteeing a valid positive definite covariance estimate is known. The use of a variable window in estimating a spectral representation has the disadvantage that the integral of that spectrum is not normalized to 1 or to the power of the signal. As the integral is not equal to the total power, the spectral estimates cannot be considered to be valid estimates for the distribution of power over frequency bands.

By treating the irregular intervals as if they were regular, the pseudo-regular FFT can be computed directly from the observations. A pseudo-regular estimate using time series will be discussed in the next section. However, the resulting spectral estimate is very sensitive to velocity bias because all observations are treated the same although high velocity particles are over represented. Moreover, it is less accurate than the result obtained by interpolation of the irregular data followed by equidistant resampling.

Interpolation schemes for LDA signal reconstruction have been studied extensively. Some complicated interpolation al-gorithms give good reconstructions in some cases but no scheme offered advantages over simpler techniques [3]. Mean square comparisons using Poisson sampling schemes showed a slight preference for linear interpolation [14]. In this method, the resampled output signal is

(1) Other equidistant resampling schemes use a zero order hold which takes the last true observation at time or use nearest neighbor resampling by taking the closest true observation. Exact sampling distributions are difficult to ob-tain, but for a statistical interpretation of individual samples, the maximum resampling rate should not be higher than the average irregular sampling rate. Otherwise the number of resampled observations exceeds the available DOF.

An example of a simulation result is given in Fig. 1. Slotting results in a noise floor while resampling approximates the true spectrum. This result is independent of the window type used to estimate the spectrum. The spectral representation becomes more like the true spectrum at higher frequencies only if

Fig. 1. Comparison of slotting and resampling for spectra estimated as a windowed Fourier transform of the covariance function.

the variable windows technique is used [13]. Resampling is preferred over slotting in this example.

III. TIME SERIES MODELS

Time series models are of three different types: autoregres-sive or AR, moving average or MA and combined ARMA.

An process can be written as [4]

(2) where is a purely random process, so a sequence of independent identically distributed stochastic variables. This process is AR for and MA for . Any stationary stochastic process with a continuous spectral density can

be written as an unique or process [4],

independent of the origin of the process, i.e., it may be the sum of a true AR process and a colored noise signal.

An estimated model is given by

(3) where and may be arbitrary and is some colored noise signal that is not purely random in general. Parameters for different model orders are estimated by minimization of the residual variance, which is the variance of . Afterwards, orders and are selected. Selection criteria add a penalty for each estimated parameter to the logarithm of the residual variance [7] of each estimated model and search for the minimum of the selection criterion. Moreover, finite order models are adequate for describing actual processes, because the higher order terms decrease rapidly for many processes.

The usual method of spectral estimation consists of win-dowed periodograms of tapered data that can be computed efficiently by FFT algorithms. However, the choices for the type and the length of spectral or lag windows in Fourier spectra have only been developed for known spectra [4]. No statistical theory is available to select optimal windows for unknown data.

New developments in time series modeling can be used for

LDA. A time series model, or or

is the best possible description of the structure of the data if the true model type and the true model order are known.

(3)

However, model type and model order must be found from the data. This is possible if certain combinations of estimating algorithms and order selection criteria are used.

• AR model parameters depend on the estimation method and model order selection depends on the highest order considered. A new order selection criterion [7] and the use of Burg’s algorithm [15] solve these problems. • MA estimation: Durbin’s method for MA estimation [16]

uses the parameters of a long intermediate autoregressive model to compute MA parameters. A new element in Durbin’s methods is that a theoretically optimal order has been defined for the intermediate AR model. By carefully selecting an order for the intermediate AR model, the Cram´er-Rao lower bound for the parameter accuracy is approximated for reasonable sample sizes [8].

• ARMA estimation is more problematic. Only Durbin’s second algorithm has no convergence problems, but initial conditions and the order of an intermediate AR model were open questions [17]. A solution has been given [9], but details still need to be improved.

Unknown stationary stochastic equidistant data can now be analyzed and a single time series model type and a model order can be selected automatically [6]. The quality of this selected model is, in numerous simulations, comparable to the quality that could be obtained if the true process type and order were known in advance. In other words, not knowing the true process is not an impediment for automatically finding the best model. The final single model is denoted the ARMAsel model. The covariance function and the power spectral density can be computed from the parameters. The power spectrum

of the model is given by [4]

(4) For all stationary stochastic processes, at least one of the three time series model types gives a good spectral description of the data.

To select the optimal window size windowed periodograms require a knowledge of the true spectrum. Trying different win-dows and making a selection afterwards gives disappointing results. The ARMAsel algorithm requires only the measured data and performs better than windowed periodograms even when the optimal window is used. In practical situations the true spectrum is not known and therefore a sub-optimal window must be used. Then the difference in performance is even more distinct.

In Fig. 2, resampling after interpolation is compared to the pseudo-regular approach where the randomly sampled obser-vations are treated as if they were equidistant. The resampling result is hardly distinguishable from the true spectrum. The pseudo-regular approach is inferior and is not investigated further. The global influence of treating original data as pseudo-regular is the same for windowed periodograms and for time series models.

Fig. 2. Comparison of resampling and the pseudo-regular approach for a simulated AR(2) signal. The spectra have been estimated with the ARMAsel algorithm.

Simulations are considered to be the best way to obtain an objective comparison of different algorithms since it is only in controlled situations that an exact knowledge of the process is available. For equidistant data, the quality can be evaluated using the model error ME [18] which can be computed for ARMAsel models and for periodograms. It has an interpretation in the time domain, in the frequency domain and as a difference between probability densities. In the time domain, it describes a scaled version of the one step ahead squared prediction error PE, which is found with an estimated model. The relationship between the model error ME and PE is given by [18]

(5) where is the number of observations used to estimate the model concerned. The one step ahead prediction error PE is a measure for how well can be predicted using all previous

observations The prediction error includes

the additional error caused by estimating the mean. This contribution can become quite significant when the estimate of the mean is biased.

The prediction is derived from the estimated correlations between the observations. In the case of simulated LDA-data, irregular observations are generated by picking observations at random from a dense regular grid. The average irregular sam-pling time is chosen 10 times higher than the regular interval. This means that only the correlation between observations at large lag times can be estimated properly. Therefore, a new error measure is introduced which is based on the prediction error with lag time , denoted PE . This is a measure of how well an observation can be predicted using the

previous observations at lag time The

error measure which is used to compare the various spectral estimators is a scaled version of PE , denoted ME

(6) where is the minimal value of PE that increases with ; for larger it becomes more difficult to have good predictions. The well-established theory and calculation methods of the one step ahead prediction error in time series can be transposed to

(4)

this new error measure. The prediction error with lag time for an ARMA-process can be calculated when the structure of the process sampled at times is known.

IV. SIMULATIONS

A simulation study has been carried out to compare the various algorithms. For the resampling and pseudo-regular techniques, the spectrum is estimated with Windowed Peri-odogram estimation as well as with ARMAsel estimation. Slotting can not be used in conjunction with ARMAsel es-timation because ARMAsel eses-timation requires a regularly sampled signal while slotting only provides a regularly sam-pled estimate of the correlation function. To obtain a positive slotting spectral estimate for all frequencies, it was necessary to take the absolute value of the slotting spectrum. The original velocity signal in the simulations consists of

observations of the following AR(2)-process:

(8)

with and .

The power spectrum of this process has a peak at frequency

. The width of the peak is .

This original velocity signal is sampled at irregularly spaced intervals with mean sampling time . must be considerably greater than one in order to reduce the influence of the fact that the time between observations is discrete. In these simulations is set equal to 10. The probability that a sample is taken can depend on the current value of the velocity. This dependency is expressed in the conditional probability

. The model used in the simulations is that

is proportional to the absolute value of .

The value of is chosen so that the average sampling time . By choosing equal to the bias in the estimated signal properties will be most prominent. When is very small compared to bias effects will be practically absent. This situation is covered by simulating LDA-data where the probability that a sample is taken does not depend on the current value of the velocity.

The values of the error measure ME averaged over 25 simulation runs are given in Table I, with and without bias. Resampling and slotting results turn out to be slightly better with bias where it would have been expected that bias has no influence. The difference is not considered significant; it is caused by using different runs for biased and unbiased simulations. The influence of bias is stronger in the pseudo-regular spectrum, as it should be. The automatically selected ARMAsel model yields a more accurate spectrum than the best Windowed Periodogram. A Parzen window has been used to estimate the Periodograms in Table I and the figures where the length of the window is a fraction of the number of observations . It is clear that the quality of the windowed periodogram depends on the length of the window as is evident in Table I, but no rules can be given to determine the best window length without the a priori knowledge of the true spectrum. The ARMAsel estimate and the best Windowed Periodogram from one typical simulation run are given in

TABLE I

ACCURACYME60 OF THEESTIMATEDSIGNALPROPERTIES FORDATA WITH AND

WITHOUTVELOCITYBIAS. SPECTRAHAVEBEENESTIMATED WITHARMASEL AND WITHFOURPERIODOGRAMS WITH APARZENWINDOW OFLENGTHM

Fig. 3. The best windowed periodogram and the ARMAsel estimate of the spectrum after resampling for a simulated AR(2) signal.

Fig. 3. Both the appearance in Fig. 3 and the objective quality ME show the advantage of the ARMAsel model.

As ARMAsel is the best way to estimate the spectrum [6], this is the preferred spectral estimator for a comparison of the methods for the extraction of regularly sampled data from the LDA-data. It is clear from Fig. 2 and Table I that resampling performs much better than pseudo-regular. For pseudo-regular, the difference between ARMAsel and periodograms is small. As ARMAsel cannot be used with slotting, resampling and slotting have been compared using the Windowed Periodogram as the spectral estimator. The best Windowed Periodogram after resampling yields a more accurate spectrum than the best Windowed Periodogram after slotting. This means that under the given circumstances resampling is much better than slotting for extracting regularly sampled data from the LDA-data. In this simulation example, linear interpolation gives the best result. Other examples have been simulated where zero order hold (take the last true observation for equidistant resampling) or nearest neighbor resampling (take the closest true observation) performed better. But in all simulations, the quality of the ARMAsel spectrum obtained from resampled data was better than all other estimates.

V. PRACTICAL DATA

Practical data which were measured in mixing layers of water and which had been investigated previously [19] were processed. The spectrum of those LDA-measurements was estimated with ARMAsel-modeling and Window Pe-riodogram after linear interpolation and resampling. The automatically selected ARMAsel model is compared to two

(5)

Fig. 4. Two windowed periodograms and the ARMAsel spectrum of prac-tical data.

Windowed Periodograms in Fig. 4. The ARMAsel model is an ARMA -model. The first windowed periodogram (Perio1 ) is almost as smooth as the ARMAsel estimate for higher frequencies. However, this estimate does not include the sharp peak at 0.3 Hz. The second periodogram estimate (Perio2 ) is able to track the peak only at the cost of irregular behavior at the high-frequency end of the spectrum. Qualitatively, the behavior of the spectral estimates is similar to the AR(2) simulations but no objective quality ME can be given here because the true process is unknown for the measured data.

VI. CONCLUDING REMARKS

Resampling is a much better technique for regularization than slotting. Treating the data as pseudo-regular gives high distortions in the high frequency range.

Spectra that are computed from the automatically selected ARMAsel time series model are better than that obtained with the best windowed periodogram. This follows from a subjective inspection of plots and from a new objective measure: the model error ME at time scale .

REFERENCES

[1] R. V. Edwards, “Report of the special panel on statistical particle bias problems in laser anemometry,” Trans. ASME J. Fluids Eng., vol. 109, pp. 89–93, 1987.

[2] L. H. J. Absil, “Analysis of the laser Doppler measurement technique for application in turbulent flows,” Ph.D. dissertation, Delft Univ. Technol., Delft, The Netherlands, 1995.

[3] C. Tropea, “Laser Doppler anemometry: Recent developments and future challenges,” Meas. Sci. Technol., vol. 6, pp. 605–619, 1995.

[4] M. B. Priestley, Spectral Analysis and Time Series. London, U.K.: Academic, 1981.

[5] R. J. Martin, “Autoregressive modeling of irregularly-sampled data,” in Signal Processing VIII, Proc. Eusipco Conf., Trieste, Italy, 1996, pp. 2033–2036.

[6] P. M. T. Broersen, “Facts and fiction in spectral analysis,” in Proc. IMTC’98 Conf., St. Paul, MN, 1998.

[7] , “The ABC of autoregressive order selection criteria,” in Preprints Sysid’97 Conf., Kitakyushu, Japan, July 8–11, 1997, pp. 231–236.

[8] , “The best order of long autoregressive models for moving av-erage estimation,” Signal Processing VIII, Proc. Eusipco Conf., Trieste, Italy, 1996, pp. 799–802.

[9] , “On orders of long AR models for ARMA estimation,” in Proc. ECSAP’97 Conf., Prague, Czech Republic, June 24–27, 1997, pp. 83–86. [10] M. Gaster and J. B. Roberts, “Spectral analysis of randomly sampled

signals,” J. Inst. Math. Applicat., vol. 15, pp. 195–216, 1975. [11] P. M. T. Broersen, “Estimation of the accuracy of mean and variance

of correlated data,” in Proc. IMTC’98 Conf., St. Paul, MN, 1998. [12] M. Gaster and J. B. Roberts, “The spectral analysis of randomly sampled

signals by a direct transform,” in Proc. R. Soc. Lond. A, 1977, vol. 354, pp. 27–58.

[13] M. J. Tummers and D. M. Passchier, “Spectral estimation using a variable window and the slotting technique with local normalization,” Meas. Sci. Technol., vol. 7, pp. 1541–1546, 1996.

[14] A. Ouabi, C. Depollier, L. Simon, D. Kouam´e, and M. Lethiecq, “Ran-dom sampling: Spectrum of fluid measured by Doppler velocimetry,” in Proc. IMTC Conf., Brussels, Belgium, 1996, pp. 531–536.

[15] J. P. Burg, “Maximum likelihood spectral analysis,” in Proc. 37th Meeting Society Exploration Geophysicists, Oklahoma City, OK, 1967, pp. 1–6.

[16] J. Durbin, “Efficient estimation of parameters in moving average mod-els,” Biometrika, vol. 46, pp. 306–316, 1959.

[17] , “The fitting of time series models,” Rev. Inst. Int. de Stat., vol. 28, pp. 233–243, 1960.

[18] P. M. T. Broersen, “The quality of models for ARMA processes,” IEEE Trans. Signal Processing, vol. 46, June 1998.

[19] J. Tukker, “Turbulence structures in shallow free-surface mixing layers,” Ph.D. dissertation, Delft Univ. Technol., Delft, The Netherlands, 1995. [20] G. Wilson, “Factorization of the covariance generating function of a pure moving average process,” SIAM J. Numer. Anal., vol. 6, pp. 1–7, 1969.

Stijn de Waele was born in Eindhoven, The

Nether-lands, in 1973. He received the M.Sc. degree in applied physics in 1998 from the Delft University of Technology, Delft, The Netherlands, where he is currently pursuing the Ph.D. degree.

His research is aimed at the development and application of new time series analysis algorithms.

Piet M. T. Broersen was born in Zijdewind, The Netherlands, in 1944. He

received the M.Sc. degree in applied physics in 1968 and the Ph.D. degree in 1976, both from Delft University of Technology, Delft,The Netherlands.

He is currently with the Department of Applied Physics, Delft University. His research interests are the selection of order and type of time series models and the application to spectral analysis, model building, and feature extraction.