REVIEW OF METHODS OF FORECASTING OF TIME SERIES

(1)

REVIEW OF METHODS OF FORECASTING OF TIME SERIES

LESZEK KLUKOWSKI IBS PAN, WARSZAWA, 21.03.2014

Several last decades are the period of intensive development of forecasting methods. The following factors stimulate this fact:

- development of theory of forecasts – mainly in the area of stochastic methods and artificial neuronal networks,

- dynamic increase of capacity of computing systems, - complexity of economic phenomena especially

reflected at financial markets and availability of huge data sets in computer systems.

Existing methods allow modeling and forecasting of phenomena with significant complexity, variability and variety. However, there exists the need for further works – in the area of theory and application – in this field.

(2)

Monographies:

C.W.J. Granger;

J.D. Hamilton;

D.C. Montgomery, Ch.L. Jennings, M. Kulahci G.E.P. Box, G.M. Jenkins;

D.C. Montgomery, L.A. Johnson;

H. Tong;

P.H. Frances, D. van Dijk

Journals: forecasting, time series, econometrics

Noble prizes: Jan Tinbergen, C.W.J. Granger

(3)

PLAN OF PRESENTATION

1. Introduction to the theory of forecasting.

2. Main directions of development of forecasting methods.

- univariate linear time series models, - combination of forecasts,

- multiple linear time series models,

- univariate nonlinear time series models.

3. Some comparisons of empirical forecasts.

4. Summary and conclusions.

(4)

1. Introduction to the theory of forecasting.

2. Main directions of development of forecasting methods - univariate linear time series models:

trend (regression), exponential smoothing, ARIMA;

- combination of forecasts:

linear combination and others (nonparametric approach, artificial neural networks);

- multiple linear time series models:

bivariate ARMA;

- univariate nonlinear time series models (having multivariate extensions):

Kalman’s filter, ARCH and GARCH, regime switching, artificial neural networks.

3. Some comparisons of empirical forecasts.

4. Summary and conclusions.

5. Basic literature.

(5)

1. Introduction to the theory of forecasting

The reasons of application of time series forecasting

• speed and low costs,

• formalization of forecasting process and known properties of forecasts,

• limited requirements about data,

• broad spectrum of methods

• reference point to other methods,

• component for combining forecasts,

• often, the only formalized method of forecasting.

(6)

Background for forecasting

• Forecasting – prediction of future events or processes based on rational bases.

• Assumptions – there exists a “mechanism” of predicted process (usually stochastic process), which can be identified, with a use of the statistical methods and extrapolated in the future. The stochastic process (SP) is the sequence of dependent in some way random variables. Identification – on the basis of time series (TS), i.e. some set of observations - finite realization of the process.

• The TS is used for determination of: a type of SP (model), estimation of its parameters, validation of the model (ex’ante, ex’post), extrapolation of a model (forecasting) and determining of precision of forecasts.

• The forecast can be in point or interval form, which are analogy to point and interval estimators; for point forecasts is possible to determine measurements of accuracy, usually variance (or SD) of prediction error.

• Optimal forecast minimizes (usually) the variance of

error.

(7)

• Area of application of forecasting methods:

- Financial market,

- Economy, monetary policy, public finance, - Operational management,

- Industrial processes, - Demography.

• Main groups of linear methods:

- Regression models,

- Exponential smoothing methods,

- Autoregressive moving averages models (ARIMA), - Transfer function and intervention models

(ARMAX not discussed),

- Combaining forecasts.

(8)

• Main groups of non-linear methods:

- Autoregressive conditional heteroscedastic models (ARCH),

- Generalized autoregressive conditional heteroscedastic models (GARCH),

- State space models,

- Regime-switching models: threshold

autoregressive, Markov-switching (TAR, SETAR, STAR, MSW),

- Artificial networks models.

(9)

• The forecasting process (in general):

- Problem definition, - Data collection, - Data analysis,

- Model selection and fitting (estimation), - Model validation,

- Forecasting model deployment,

- Monitoring forecasting model performance.

(10)

• Basing statistical tools Time series:

y_t,t =1,...,T;

forecast:

yˆ_T₊_h,h ≥1,

forecast error:

et₊h = y_t₊_h− yˆ_t₊_h, stationary TS:

- strictly - if joint probability distribution of y

y

y_t, _t₊₁,..., _t₊_n is the same as y_t₊_k, y_t₊_k₊₁,..., y_t₊_k₊_n; - weakly - if the expected value ^E⁽^y_t⁾ ⁼ µ_y and variance

)

( ²

2 µ

σ y = E y − y are constants and covariance function K

y k Cov y_t _t _k

k = ( , ₊ ), = 0,1, 2,...,

γ is a function of lag k.

Sample autocovariance ck and autocorrelation rk

functions:

= ∑

=

−

∑ −

= + =

−

=

T

t t

T k

t k

T

t t

k T y y y y k K y y

c

1 1 1

1 ( )( ), 0,1, 2,..., ;

c rk _k c^k

0

ˆ =

= ρ ^.

(11)

Forecasting errors (for comparisons of models):

Mean error (one-step-ahead): = ∑

= n t

et

ME n

1

1 ,

Mean absolute deviation: = ∑

= n t

et

MAD n

1

1 ,

Mean square error: = ∑

= n t

et

MSE n

1

1 2

,

Mean absolute percent: 1 100

1

= ∑

= n

t y

e

t t

MAPE n .

Criteria for model adequacy:

Mean square error of residuals: ∑

= −

= T t

et

p s T

1 2

2 1

(p – number of estimated parameters of the model),

R – square statistic (0, 1):

∑ −

∑

−

=

= T

t t

T t

t

y y

e R

1

2 1

2 2

) (

1

(12)

Adjusted R – square:

∑ −

−

=

− = n

t t

T adj

y y R s

1

2 1

1

2 2

) (

1

Akaike information criterion:

T p T

AIC s 2

) ln(

2

+

= ,

Schwarz information criterion:

T T p T

SIC s ln( )

) ln(

2

+

= ,

Corrected (consistent) AICC:

2 ) 1 (

) 2 ln(

2

−

− + +

= T p

p T T

AICC s .

SIC, AICC – consistent criterions of model selection (detect true model as T gets large), AIC – asymptotically consistent (approaches the true model as fast as any other criterion).

(13)

• The basis for time series forecasting - the theory of stochastic processes: identification (of type), estimation, verification and optimal extrapolation - for assumed criterion function (variance of error).

* The first models have been based on the formula:

..., , 1

; )

( + =

= f t _t t

Y

t

ε

Yt − forecasted series (random variables),

− ) (t

f some (deterministic) function of time t, ε^t − (Gaussian) white noise N(0 ,σε²);

Stochastic assumptions can be significantly weaker.

• The basis of the model – the representation of the stochastic process Yt as the sum of (uncorrelated) deterministic Dt and stochastic Zt components (Wold, Cramer):

∞

<

= ∑

= +

= ^∞

− =

∞

= c Y c

t Y Z

D

Y j

j t j

t j j

t t

t

t 2

0 0

; );

..., , 1

( ε ^.

Dt - deterministic, Zt - purely non-deterministic, c2j -

parameters, Dt ,Zt - uncorrelated.

* Examples:

t b t

b t b

f ( ) = ₀+ ₁ +...+ _k ^k, f (t) =b₀+b₁sin ²₁₂^π^t +b₂cos²₁₂^π^t .

(14)

• The estimation determines the optimal estimates of parameters of the models, the verification examines accuracy of the obtained form, the extrapolation determines optimal forecasts, i.e. minimizing assumed criterion, e.g. variance of forecast error (optimal predictor - the conditional expected value of the forecasted variable, for given information set).

• The estimation applies statistical methods: the least squares, maximal likelihood, non-parametric methods, robust methods.

• The verification is usually based on statistical tests, which verify: adequacy of the model and time series (e.g. R²), significance of its parameters (e.g. t-Student) and properties of forecasts (e.g. Chi-square). The verification can confirm the model or suggest its modification: change of a class, analytical form or parameters (e.g. an order of auto-regression).

• The extrapolation generates optimal forecasts and measures of their precision. Main forms of the forecasts are: point (Yˆt₊h) and interval (general form of linear models: Yˆt+hmtα_/₂_,^T−^pconst(σˆε), with confidence coefficient (from t-Student distribution) (1−α)), for horizon, h ≥ 1.

(15)

* Simple example of a forecast and its error:

ˆ ).

( )

( ˆ )

(

, 1 );

( ˆ

Var Y Var Y

Y Y Var

h h

T Y f

h h T

T h

h T T h T

+ + + +

+

=

−

≥ +

=

• The process of the model (stochastic process) building can be not easy-to-operate; typical steps include (P.H. Franses, D. van Dijk):

- calculate certain statistics indicating a type of a model (ARIMA: autocovariances functions),

- comparing the statistics with theoretical values, if the type of the model is adequate,

- estimate the parameters of the model, suggested in previous steps, on the basis of information set,

- evaluate the model using diagnostic measures, - re-specify the model if necessary,

- use the model for forecasting (or analytic) purposes.

• Final model selection is typically realized by comparison of different forms of models, using statistical tests and other measures, e.g. AIC or BIC criterion.

The model, which satisfies verification requirements

is reliable and useful in practice!

(16)

All models are wrong, some are useful.

G.E.P. Box

2. Main directions of development of forecasting methods

• Simple models, like trend, have good (simple currently) theory: estimation, verification and forecasting, but are applicable under restrictive assumptions, especially:

* simple, constant form of a trend and

* restrictive assumptions about the form of random variables of stochastic components (constant parameters, independency).

Therefore can be applied to simple phenomena.

• The next models, developed in last decades, have eliminated these disadvantages in the following, main, directions (overlapping classification):

- relaxing the assumptions about stability of a structure and parameters of the model,

- allowing for complex form of stochastic dependencies between variables of the stochastic process,

- advancing analytical form of the model.

• Important feature of the models – parsimonious parametrization (methods efficient for moderate number of observations).

(17)

Directions for univariate time series

• Simple trend models ..., , 1

; )

( + =

= f t _t t

Y

t

ε

⁽^εt ^~ⁱ^.ⁱ^.^d^., ^N⁽⁰^,^σ^ε²⁾⁾

t b t

b t b

f ( ) = ₀+ ₁ +...+ _k ^k,

12 2 2

12 1 2

0 sin cos

)

(t b b ^t b ^t

f = + ^π + ^π ;

Estimation of the polynomial trend, k=1:

y Z Z

Z

bˆ = ( ′ )⁻¹ ′













=













=

y y y

T _T

, ...

1 ...

2 1

1 1

2 1

y

Z ,

b bˆ) = (

E V =E[(bˆ −b)(bˆ −b)′]=G⁻¹σε²,











=  ₊ ₊ ₊

+

6 ) 1 2 )(

1 ( 2

) 1 (

2 ) 1 (

T T

T

T T

G

Estimates equivalent to ML (unbiased, consistent, minimal variance, asymptotic Gaussian distributions).

(18)

y y ˆ

ˆ² , = −

−

= ε′ε ε σε

k

n ^,^E⁽σ^ˆ_ε²⁾ ⁼σ_ε²^, σε²

0 ( 1)

) 1 2

( ) 2 ˆ

( −

= +

T T b T

Var , 1 ₂ σε²

) 1 (

) 12 ˆ

( = −

T T

Var b ,

σε² 1

0 ( 1)

) 6 ˆ , ˆ

( −

= − T T b

Cov b .

Verification of the model:

- verification of identical, independent, Gaussian distribution of ε^t using appropriate statistical tests,

- significance of _b₁ the hypothesis:

), , ( 0 :

, 0

: ₁ ₁ ₁

0 b = H b ≠ > <

H test t-Student,

- significance of the model: test F-Snedecor (under Gaussian assumption).

(19)

Forecasts:

Point: yˆ_T₊_h = z′_T₊_hbˆ′

σ σε² ₂ [(2 1)( 1) 6 ( 1)] ε²

) 1 (

)] 2 (

[ − − + + −

+ −

+ = T T h T h

T T h

e T Var _h

Interval (symmetric): 100(1−α)%

] ( )

1 ˆ ( [

ˆ_t₊_h±t _/₂_,_n₋_k ² +z′_T₊_h Z′Z ⁻¹z′_T₊_h ¹^/²

y α σε .

Main properties of forecasts:

- unbiasedness, minimal variances (optimal precision), Gaussian distributions with known parameters.

Computations: EXCEL, MINITAB, TSP, STATISTICA, SPSS

(20)

If you have to forecast, forecast often.

E.R. Fiedler

• Relaxing the assumptions about stability of parameters of the model; some developments:

- exponential smoothing (Brown, Holt-Winters) – the idea: re-estimating the model parameters each period in order to incorporate the most recent periods data (with weights decreasing exponentially);

the simplest - constant model, first order exponential smoothing (e.g. prices of a fuel):

) ..., , 1

(t T

b

Yt = +εt = ^, ) ,

1

~ ¹ (

0

y

y ^t _T _t

T

t t −

−

= −

= λ ∑ λ

~ , ) 1

~ (

y 1

y

y_t = λ _t + −λ _t₋ )

1 , 0

∈(

λ - smoothing constant (typically λ∈[0,1, 0,4]^), with the variance

) 2 (

~ )

(y Var y

Var _T _T

λ λ

= − ^;

(21)

Estimate bˆ0 is obtained from minimization

∑⁻ − −

= 1 0

2 0) ) (

1 (

T

t t

t y b

λ

for large T

∑ −

≅ ⁻

= −

1

0 0(1 )

ˆ ^T

t T t

t y

b λ λ

Typical models:

Linear trend model: ^Y^t ⁼ ^b⁰⁺^b¹^t ⁺ε^t

General trend model: ^t ^k ^t^k ε^t k

t b b b

Y = + + + +

... !

1 0

Sinusoidal trend π π ε

t

t d

b t d

b t b

Y = + + 2 +

2 cos

sin ₂

1

0 .

(22)

General approach

• The theoretical basis (Brown):

THEOREM OF EXPONENTIAL SMOOTHING for general nth–degree polynomial

).

, 0 ( .

~

! (

2 1

0 t ε ε iid N σε

n t b

b

y_t =b + +⋅ ⋅⋅+ ⁿ ⁿ+ _t _t

However, for n>2 the calculations get complicated and ARIMA models can instead be considered.

The process {Yt ,t =1,...} (constant or trend) is changing slowly (parameters bk). The system of weights heuristic, but with useful statistical (practical) properties.

(23)

The point forecast (constant with constant precision):

b y

yˆ_T₊_h = ˆ₀ = ~_T , λλσ λ

λ ε²

) 2 2 (

ˆ )

( = −

= − Var y

Var y_T _T ,

y y_T _T

T ˆ

ˆ = −

ε ^.

The interval forecast (constant – often unrealistic):

σ

α ˆ

ˆ_T u _/₂ _e

y m ,

where: uα/2 appropriate percentile of standard Gaussian distribution and ˆ 1( 1 ˆ ₁)²

2 1

y

y _t

T

t t

e = T ∑ = + − +

σ ^, ^yˆt₊₁ - one-step- ahead forecast (on historic data).

Choice of λ - minimization of: SS(λ) = ∑^T_t₌₁(y_t − yˆ_t )²^. Point forecast for linear trend model:

y h h

y^ˆ_T₊_h ⁼ β^ˆ₀_,_T ⁺β^ˆ₁_,_T⁽T ⁺ ⁾⁼ ^ˆ_T ⁺β^ˆ₁_,_T ^, ˆ ) ( ˆ

1 ) ( )

1 ˆ (

, 1 ,

0 2 1 1

,

0 λ λ λ β β

β T₊ = + yT₊ + − _T + _T , β

β β

β^ˆ ^λ_λ ⁽ ^ˆ ^ˆ ⁾ 2 _λ^λ ^ˆ₁_, ) 1 ( 2 ,

0 1

, 2 0

1 ,

1T T T −− T

− +

+ = − + .

Interval forecast:

σ

λ α λ λ

λ ) ˆ (1 ) ˆ ˆ

2 (

2 1

/ )

2 (

1 1

) 1 (

1 c e

c T

T

u h

h y

h y m

−

− − +

+

y y

yˆ_T⁽²⁾ = λ ˆ_T⁽¹⁾+(1−λ) ˆ_T⁽²₋⁾₁

] 2

) 3 4 ( 2 ) 5 14

10 [(

1 ² ² ²

2 ) ( 2

3 λ λ λ λ λ

λλ

i i

ci = + ₋ − + + − + ^.

(24)

Validation of the model

The basis: sample autocorrelation function of one-step- ahead forecasting errors rk, ^k =1,...,κ^T (κ ∈(0,1)) - should be around 0 (zero) with standard deviation 1 T . If the values lie outside the ±2 T limits it require examination.

Another approach: monitoring and modifying the discount factor λ, e.g. Trigg and Leach (1967), Chow (1965).

There exists models for seasonal data: additive and multiplicative.

(25)

Summary assessment

• The models are optimal (minimize mean square error) for some ARIMA(0, k, k) processes, the forecasts typically not (similarly as estimates of parameters); the benefit – simple model and computations.

• The features of the methods:

- allowing some changes of the phenomena forecasted, - possibility of computerization (automation) of model

building and forecasting, - short horizon of forecasts,

- empirical (typical): not unbiased, not optimal, often correlated errors.

(26)

•

Allowing for (more) complex form of stochastic

dependencies between variables of the stochastic process

- ARIMA(p, d, q) models (Box, Jenkins) ε^t

t

dY B

B

B)(1 ) ( )

( − = Θ

Φ

where:

1

; )

(y = y ₋ d ≥ B^d _t _t _d ,

) ...

1 ( )

(B = −φ₁B− −φ_pB^p

Φ ,

) ...

1 ( )

(B ⁼ ⁻θ₁B⁻ ⁻θ_qB^q

Θ ^,

)

Φ(B , Θ(B) - lag-polynomial operators (with roots outside unit circle – providing stationarity Φ(B) and invertibility Θ(B)),

d – parameter providing stationary process, ε^t - Gaussian white noise.

• The (linear) model reflects finite autoregressive and moving average dependencies of variables y_t , with d differencing providing (weak) stationarity. There exist optimal estimators and predictors; estimators are non- liner in the case of q≥1.

(27)

• The basis:

THEORY OF ARIMA PROCESSES

The main assumptions:

- weak stationarity of TS: the expected value of TS ( y )

E _t

y =

µ is constant, not dependent on time,

- invertibility of ARMA process: roots of Θ(B) less than one in absolute value (has an infinite AR representation),

- the autocovariance function γ _y(k) = Cov(y_t ,y_t₊_k) )

..., , 2 , 1

(k = for any lag k is only a function of k and not time.

The form of the ARMA(p, q) model ε

θ φ ε

δ i t i

q i i t

i t p

t i y

y ₋

− =

= + −∑

+ ∑

=

1 1

or

δ εt

t B

B)y ( )

( = +Θ

Φ .

(28)

The main tools for model building (identification, estimation, validation):

- Identification based on the autocorrelation function and partial autocorrelation function,

- Estimation the least squares, maximum likelihood – non-linear for q>0,

- Validation based on:

* residuals (realization of Gaussian white noise):

ˆ ( ˆ ˆ ˆ ˆ )

1

1 φ θ ε

ε δ i ^t ⁱ

q i i i t p t i

t y y ₋

− =

= − ∑

+ ∑

−

= ,

* statistic based on autocorrelations (chi-square K-p-q):

) ( )

( ²

1

r k d

T

Q ^K

k∑ ε

−

= = or ( 2) ²( )

1

1 r k

T T

Q ^K

k∑T k ε

+

= = − .

(29)

Optimal predictor:

Point forecast (in time T for time T+h, h ≥1) – conditional expectation of y_T₊_h for given y_T, y_T₋₁,...:

...]

, ,

[

ˆ E y y y ₁

y_T₊_h = _T₊_h _T _T₋ , y

h y

eT ( ) = _T₊_h− ˆ_T₊_h, 0

)) (

(e h =

E _T ,

) ( )

( ( ² ²

1 0

2 h

e h

Var _i

h

T =σε i∑⁻ ψ =σ

= ,

with coefficients ψ_i determined by relationship )

) ( ( )

(B = Φ B ¹Θ B

Ψ ⁻ ;

The interval forecast: yˆ_T₊_h± z_α_/₂σ(h) .

(30)

Example ARIMA(1, 1, 1) process:

θ ε

φ^B − ^B ^y_t₊_h = − ^B _T₊_h

− )(1 ) (1 ) 1

( ,

θ ε φ ε

φ) 1 2 1

1 (

ˆ_T₊_h = − y_T₊_h₋ − y_T₊_h₋ + _T₊_h − _T₊_h₋

y ,







≥

−

=

−

=

−

= −

− +

− + + −

. 2 ˆ ,

) ˆ 1 (

, 1 ),

1 ˆ (

) 1 ( ),

1 ( )

1 ˆ (

2 1

1

y h y

h y T

e y y e

y y

h T h

T

T T T

h

T φ φ

θ φ

φ

The rules of forecasting computations:

- unknown values y_T₊_k, k >0 are replaced by they forecasts yˆ_T₊_k,

- “forecasts” of y_T₊_k, k ≤ 0 are the known values,

- the optimal forecast of ε_T₊_k, k >0 is zero, the “forecasts”

of ε_T₊_k, k ≤ 0 are known values ε^T+^k.

(31)

Practical approach to model building:

- To determine “potential” models on the basis of autocorrelation and partial autocorrelation function and to estimate these models,

- To choose the best version on the basis of estimates of variance ^Var(ε_t), values of criterions AIC, BIC and tests used in validation.

Sometimes more than one version satisfies criterions used during validation; their forecasts are usually similar.

• The features of the methods:

- optimal properties of statistical tools: identification, estimation, verification and prediction,

- broad area of application, - successful applications,

- “rigid” assumption about stable form of the model.

(32)

- Combination of forecasts

• The (unbiased) forecasts can be often obtained on the basis of two or more sources (models). Typically, combination of such forecasts, e.g. weighted average, is more precise (a lower variance), than individual forecasts (Granger). It is analogy to linear combination of estimators; the weights can be determined in optimal way (Rao, Serfling).

• In the case of two forecasts the solution, the optimal weight k0 (0<k0<1), providing minimum error variance σ²^c^,⁰, has a form:

σ ρσ σ

σ σ ρσ σ

2 1 2

2 2

1

2 1 2

2

0 + −2

= −

k , and

k y k y

yˆ_comb = ₀ ˆ₁ +(1− ₀) ˆ₂,

σ ρσ σ

σ

σ ρ σ σ

2 2 1

2 2 2 2 2 1

0

, 2

) 1

(

− +

= −

c ,

σ

σ¹^, ² - standard deviations of forecasts yˆ₁, yˆ₂, ρ^- correlation coefficient of forecasts.

(33)

• In practice the weights are determined in many ways, e.g.

on the basis of n-1 forecasting errors (Granger) – via ML estimator (under assumption that forecasts have bivariate Gaussian distribution, with known variances and covariance) kn:

) ( ⁽¹⁾² ⁽²⁾²

1

1 (2)2

e e

e k

t t

n v n t

n v n

t t

n ∑ +

∑

= ₋

−

=

−

= .

• The combining forecasts can be also obtained with the use of artificial neuronal networks or non-parametric approach.

• Typically, combining forecasts exceed significantly individual forecasts, but require more than one forecast (model).

(34)

• The features of the method:

- require versatility of a forecaster,

- typically - the best empirical precision, because make use of positive properties of individual, usually suboptimal, forecasts,

- allow for application of forecasts from different sources (e.g. institutions).

(35)

- Multiple time series models

• Majority the univariate models have multivariate extensions (theory) and applied in practice. The following models have particular importance:

linear

* ARIMA (ARMAX), nonlinear

* Kalman’s filter

* ARCH and GARCH,

* threshold and Markov switching.

• The multivariate models allow causality and feedback.

Typically, the number of variables is low; exception is Kalman’s model.

(36)

• The ARMA bivariate (vector) model with feedback (Granger approach):

* ,

, ) 1 (

) ( ,

) 2 (

) ( ,

1 ^*

1

* 1

*1

*

1 _ϕ^θ η

ωδ

B t B B t

B

t Y

Y = +

* ,

, ) 2 (

) ( ,

) 1 (

) ( ,

2 ^*

2

* 2

*2

*

2 _ϕ^θ η

ωδ

B t B B t

B

t Y

Y = +

) ,

(Y₁_,t Y₂_,t - bivariate ARMA (with mean subtracted), )

,

(η₁^*_,_t η^*₂_,_t - bivariate white noises, mutually uncorrelated (ω^*₂(0) ≡ 0)^,

)

*(

1 B

ω ^,δ1^*⁽B⁾^,ω^*2⁽B⁾^,δ^*2⁽B⁾^,θ^*2⁽B⁾^,ϕ^*2⁽B⁾ - parameters;

The above relationships result from:

Yt B η_t

B) ( )

( ≡Θ

Φ (η_t - vector white noise) ) 2 , 1 ( ) ,

( ) ( )

( ) (

* ,

*

* ,

*

, = ∑ + ∑ =

≠

≠ i

B Y B

B

Y B _i _t

i i i j t j ij

ij i j t

i η

φ θ δ

ω

example:

* ,

, 1 1

1 , ... 2

, 1

1 * 2

2 , 1

* 1 , 1

2

*,2

* 1 1 , 1 4

*,4

* 1 1 , 1

* 0 ,

1 _δ _ϕ^θ _ϕ^θ η

δ ω

B t B

B B B t

t B Y

Y = ₋ ₋ ₋ + ₋⁻ ₋⁻

. ) 1

( ^*₂_,₁ ^*₂_,₂ ² ^*₂_,

, 1 1

...

,

2 *

1 , 2

3

*2,3

*,1

2 _δ ^ω θ θ η

ω

t t B

B B

t Y B B

Y = ₋⁺ ⁺ + − −

(37)

The model building steps, simplified (Granger, Newbold):

- fit the single series models for Y j_,t ( j =1, 2) )

2 , 1 (

) ( )

(B Y _j_,_t = _j B _j_,_t j =

j θ ε

φ , calculate residuals and

standardized residuals,

- calculate the cross-correlogram between the univariate models residuals and use this to identify the transfer functions ω _j(B)/δ _j(B) of models linking standardized residuals (ε₁_,t,ε₂_,t), i.e.:

, ,

) 1 (

) ( ,

) 2 (

) ( ,

1

1 1 1

1 ε η

ε ^ω_δ B t _ϕ^θ ^B_B _t B

t = + ε ^ω_δ B ε t _ϕ^θ _B^B η _t

B

t ( ) 2,

) ( ,

) 1 (

) ( ,

2

2 2 2

2 +

= ,

(standarized to have unit variances),

- identify the error structures: (ε₁_,t,ε₂_,t), i.e. the forms )

( / )

(B _j B

j δ

ω and estimate them; check the adequacy of fitted model,

- amalgamate the bivariate model fitted to residuals )

,

(ε₁_,t ε₂_,t with two univariate models to suggest the bivariate model (Y₁_,t,Y₂_,t); estimate the model relating the original series,

- check the adequacy of the model and, if necessary, reestimate it.

• The multivariate model building is much more complex and time-consuming. The model can provide dependencies between two variables and forecasts with lower errors than univariate models.

(38)

• Main groups of non-linear methods:

- Autoregressive conditional heteroscedastic models (ARCH),

- Generalized autoregressive conditional heterosceda stic models (GARCH),

- State space models,

- Regime-switching models: threshold autoregressive

Markov-switching (TAR, SETAR, STAR, MSW),

- Artificial networks models.

(39)

- ARCH & GARCH models (Engle, Bollerslev), (generalized) autoregressive conditional heteroscedasticity (introductory facts)

• Models which are capable of describing (not only) the feature of volatility clustering, but also other properties of financial time series, e.g. excess kurtosis

ˆ ) ˆ (

ˆ ²

1 1 2 1

1 4

ˆ = ∑ ∑

=

n t n t n

t n t

Kε ε ε (greater than 3) or fat-tailedness.

Consistent with theory (e.g. Capital Asset-Pricing Model) and empirical evidence; some of the properties important at financial market have models with non-linear form of the predictor.

• ARCH model (capture the volatility of clustering of TS – large shocks tend to be followed by large shocks)

ε^t

t t

t E Y

Y = ( Ω ₋₁)+ ,

) ( ))

( ( )

(

; ² E ² E E ² ₁ E h

h

zt t t t t t

t = σ ≡ ε = ε Ω ₋ =

ε

zt - iid. standard Gaussian random variable, )

(Ω ₁

≡ _t _t₋

t h

h - a nonnegative function,

Ωt−1 - information set up to and including time t −1

(distribution of ε^t conditional upon Ωt−1 is N(0, ht ), unconditional expectation of ht is constant).

(40)

An alternative representation of ARCH(1) model (conditional variance of shock at time t is linear function of squares of past shocks):

), 0 1

, 0

( ₁

2 1

1 > ≤ <

+

=ω α εt₋ ω α ht

) 1 (

,

2 2

2 1 1 2

−

=

−

≡

+ +

= ₋

z h h v

v

t t t t t

t t

t

ε

ε ω α

ε

i.e. AR model for ε²^t(stationary for 0≤α1 <1)^. Some features of the model:

ωα

ε

σ² ^≡ ^E⁽ ^t²⁾ ⁼ ₁₋ ₁^’ 0 ) ( t Ωt−₁ =

E ν ^,

vt t

t² =σ²+α₁(ε²₋₁−σ ²)+

ε ^.

If _ε_t²₋₁ is larger (smaller) than its unconditional expected value σ²^,ε^t² is expected to be larger (smaller) than σ²^as well.

The kurtosis of ε^t² always exceeds kurtosis of zt2.