Institute of Econometrics – Department of Applied Econometrics

(1)

Spatial Econometrics

Lecture 5: Single-source model of spatial regression. Combining GIS and regional analysis

Andrzej Torój

Institute of Econometrics – Department of Applied Econometrics

Andrzej Torój Institute of Econometrics – Department of Applied Econometrics

(2)

Outline

1 Linear model vs SAR/SLM (Spatial Lag) Linear model

SAR (Spatial Lag, SLM)

2 Model SEM (Spatial Error)

SEM model with global error dependence SEM model with local error dependence

3 SLX model

4 Combining point GIS data with regional statistics Example: location of Biedronka markets Homework

(3)

Plan prezentacji

1 Linear model vs SAR/SLM (Spatial Lag)

2 Model SEM (Spatial Error)

3 SLX model

4 Combining point GIS data with regional statistics

(4)

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

(5)

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

(6)

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

(7)

Flow of impacts in the linear model

(8)

Flow of impacts in SAR model

(9)

SAR model – relation to other models

(10)

SAR model – relation to other models

(11)

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

(12)

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

(13)

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

(14)

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

(15)

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ _KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

(true) parameter of the skipped variable

slope of the regression of the skipped variable on the included variables

In our case:

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}

(16)

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ _KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

(true) parameter of the skipped variable

slope of the regression of the skipped variable on the included variables

In our case:

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}

(17)

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ _KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

(true) parameter of the skipped variable

slope of the regression of the skipped variable on the included variables

In our case:

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}

(18)

Consequences of omitting spatial structure SAR (2)

Can Cov (Wy, X) possibly be 0? If the true data generating process is SAR, then...

y = (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

y = Xβ + ρWXβ + ρ

²

W

²

Xβ + ... + ε + ρWε + ρ

²

W

²

ε + ...

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ... + Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Thus (skipping the components related to ε, which – as we know – are uncorrelated to X):

plim ˆ β

_KMNK

− β

= ρ

Cov (WXβ,X)

Var (X)

+ ρ

^Cov

(

^ρW²^Xβ,X

)

Var (X)

+ ρ

^Cov

(

^ρ²^W³^Xβ,X

)

Var (X)

+ ... =

=

_{Var (X)}^ρ

Cov (WX, X) + ρ · Cov W

²

X, X + ρ

²

· Cov W

³

X, X + ... β Even if X is not spatially autocorrelated and Cov (WX, X) = 0, further components cannot be equal to zero.

W

²

and further powers of W are not any more matrices with zero diagonal elements.

Interpretation: W

²

is the matrix of connections to neighbours of the neighbours. But the neighbour of your neighbour is i.a. You! (And You’re always correlated with yourself.)

(19)

Consequences of omitting spatial structure SAR (2)

Can Cov (Wy, X) possibly be 0? If the true data generating process is SAR, then...

y = (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

y = Xβ + ρWXβ + ρ

²

W

²

Xβ + ... + ε + ρWε + ρ

²

W

²

ε + ...

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ... + Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Thus (skipping the components related to ε, which – as we know – are uncorrelated to X):

plim ˆ β

_KMNK

− β

= ρ

Cov (WXβ,X)

Var (X)

+ ρ

^Cov

(

^ρW²^Xβ,X

)

Var (X)

+ ρ

^Cov

(

^ρ²^W³^Xβ,X

)

Var (X)

+ ... =

=

_{Var (X)}^ρ

Cov (WX, X) + ρ · Cov W

²

X, X + ρ

²

· Cov W

³

X, X + ... β Even if X is not spatially autocorrelated and Cov (WX, X) = 0, further components cannot be equal to zero.

W

²

and further powers of W are not any more matrices with zero diagonal elements.

Interpretation: W

²

is the matrix of connections to neighbours of the neighbours. But the neighbour of your neighbour is i.a. You! (And You’re always correlated with yourself.)

(20)

Consequences of omitting spatial structure SAR (2)

Can Cov (Wy, X) possibly be 0? If the true data generating process is SAR, then...

y = (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

y = Xβ + ρWXβ + ρ

²

W

²

Xβ + ... + ε + ρWε + ρ

²

W

²

ε + ...

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ... + Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Thus (skipping the components related to ε, which – as we know – are uncorrelated to X):

plim ˆ β

_KMNK

− β

= ρ

Cov (WXβ,X)

Var (X)

+ ρ

^Cov

(

^ρW²^Xβ,X

)

Var (X)

+ ρ

^Cov

(

^ρ²^W³^Xβ,X

)

Var (X)

+ ... =

=

_{Var (X)}^ρ

Cov (WX, X) + ρ · Cov W

²

X, X + ρ

²

· Cov W

³

X, X + ... β Even if X is not spatially autocorrelated and Cov (WX, X) = 0, further components cannot be equal to zero.

W

²

and further powers of W are not any more matrices with zero diagonal elements.

Interpretation: W

²

is the matrix of connections to neighbours of the neighbours. But the neighbour of your neighbour is i.a. You! (And You’re always correlated with yourself.)

(21)

Spatial OLS (1)

If the omission of spatial lag makes the OLS estimator biased, we should include it.

Potentially easy to do: if W is predetermined, one can

construct the spatial lag variable Wy upfront and estimate the SAR model y = ρWy + Xβ + ε with OLS (this method is referred to as Spatial OLS):

y =

Wy X

ρ β

+ ε From OLS properties, we know that:

E

ρ ˆ β ˆ

=

ρ β

+

Wy X

T

Wy X

−1

E

Wy X

T

ε

(22)

Spatial OLS (1)

If the omission of spatial lag makes the OLS estimator biased, we should include it.

Potentially easy to do: if W is predetermined, one can

construct the spatial lag variable Wy upfront and estimate the SAR model y = ρWy + Xβ + ε with OLS (this method is referred to as Spatial OLS):

y =

Wy X

ρ β

+ ε From OLS properties, we know that:

E

ρ ˆ β ˆ

=

ρ β

+

Wy X

T

Wy X

−1

E

Wy X

T

ε

(23)

Spatial OLS (1)

If the omission of spatial lag makes the OLS estimator biased, we should include it.

Potentially easy to do: if W is predetermined, one can

construct the spatial lag variable Wy upfront and estimate the SAR model y = ρWy + Xβ + ε with OLS (this method is referred to as Spatial OLS):

y =

Wy X

ρ β

+ ε From OLS properties, we know that:

E

ρ ˆ β ˆ

=

ρ β

+

Wy X

T

Wy X

−1

E

Wy X

T

ε

(24)

Spatial OLS (2)

In the linear regression model, we assume that the error terms are independent of regressors, i.e. E

Wy X

T

ε

= 0, and this proves the unbiasedness of the OLS estimator in such a model. It holds that E X

^T

ε = 0, but:

E h

(Wy)

^T

ε i

= E n

W (I − ρW)

⁻¹

Xβ + W (I − ρW)

⁻¹

ε

T

ε o

=

= E

n W (I − ρW)

⁻¹

Xβ

T

ε + W (I − ρW)

⁻¹

ε

T

ε o

=

= E n

ε

^T

W (I − ρW)

⁻¹

^T

ε o

6= 0

Our model is not the classical regression model, because observations depend on one another (y

i

depends on the neighbour y

j

and vice versa).

Situation similar to the simultaneous equations models.

(25)

Spatial OLS (2)

In the linear regression model, we assume that the error terms are independent of regressors, i.e. E

Wy X

T

ε

= 0, and this proves the unbiasedness of the OLS estimator in such a model. It holds that E X

^T

ε = 0, but:

E h

(Wy)

^T

ε i

= E n

W (I − ρW)

⁻¹

Xβ + W (I − ρW)

⁻¹

ε

T

ε o

=

= E

n W (I − ρW)

⁻¹

Xβ

T

ε + W (I − ρW)

⁻¹

ε

T

ε o

=

= E n

ε

^T

W (I − ρW)

⁻¹

^T

ε o

6= 0

Our model is not the classical regression model, because observations depend on one another (y

i

depends on the neighbour y

j

and vice versa).

Situation similar to the simultaneous equations models.

(26)

Spatial OLS (3)

For simplification, consider the SAR model with 1 explanatory variable x:

E

ρˆ βˆ

=

ρ β

+ 1

det

Wy x T

Wy x

| {z }

≡γ>0

"

x^Tx − (Wy)^Tx

−x^T(Wy) (Wy)^T(Wy)

#





 (Wy)^Tε

x^Tε

|{z}

=0





=

=

ρ β

+







usually >0

z }| {

γx^Tx (Wy)^Tε

−γx^T(Wy) (Wy)^Tε

| {z }

usually <0







So, the spatial OLS delivers biased estimates! (ρ usually upward biased, β downward biased).

In the multivariate cases, the bias is concentrated on the parameters for variables X whose spatial patterns most resembles the spatial pattern of y.

(27)

Spatial OLS (3)

For simplification, consider the SAR model with 1 explanatory variable x:

E

ρˆ βˆ

=

ρ β

+ 1

det

Wy x T

Wy x

| {z }

≡γ>0

"

x^Tx − (Wy)^Tx

−x^T(Wy) (Wy)^T(Wy)

#





 (Wy)^Tε

x^Tε

|{z}

=0





=

=

ρ β

+







usually >0

z }| {

γx^Tx (Wy)^Tε

−γx^T(Wy) (Wy)^Tε

| {z }

usually <0







So, the spatial OLS delivers biased estimates! (ρ usually upward biased, β downward biased).

In the multivariate cases, the bias is concentrated on the parameters for variables X whose spatial patterns most resembles the spatial pattern of y.

(28)

Spatial 2SLS (1)

The simultaneous equation bias in y = ρWy + Xβ + ε can be treated analogously to the case of endogenous regressors: i.e. use the instrumental variables method. This implementation is consistent, unbiased and is referred to as spatial 2-stage least squares (S2SLS).

A valid instrumental variable is correlated with the problematic regressor (Wy), but uncorrelated with the error term (ε). Recall that for the SAR model:

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ...

| {z }

ideal instruments!

+ Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Step 1:

Linear regression of Wy on the matrix including exogenous variables and a certain number of instruments: Π =

X WX W

²

X . . . (OLS).

Theoretical values: c Wy = Π

Π

^T

Π

−1

Π

^T

| {z }

P

Wy

(29)

Spatial 2SLS (1)

The simultaneous equation bias in y = ρWy + Xβ + ε can be treated analogously to the case of endogenous regressors: i.e. use the instrumental variables method. This implementation is consistent, unbiased and is referred to as spatial 2-stage least squares (S2SLS).

A valid instrumental variable is correlated with the problematic regressor (Wy), but uncorrelated with the error term (ε). Recall that for the SAR model:

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ...

| {z }

ideal instruments!

+ Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Step 1:

Linear regression of Wy on the matrix including exogenous variables and a certain number of instruments: Π =

X WX W

²

X . . . (OLS).

Theoretical values: c Wy = Π

Π

^T

Π

−1

Π

^T

| {z }

P

Wy

(30)

Spatial 2SLS (1)

The simultaneous equation bias in y = ρWy + Xβ + ε can be treated analogously to the case of endogenous regressors: i.e. use the instrumental variables method. This implementation is consistent, unbiased and is referred to as spatial 2-stage least squares (S2SLS).

A valid instrumental variable is correlated with the problematic regressor (Wy), but uncorrelated with the error term (ε). Recall that for the SAR model:

Wy = WXβ + ρW

²

Xβ + ρ

²

W

³

Xβ + ...

| {z }

ideal instruments!

+ Wε + ρW

²

ε + ρ

²

W

³

ε + ...

Step 1:

Linear regression of Wy on the matrix including exogenous variables and a certain number of instruments: Π =

X WX W

²

X . . . (OLS).

Theoretical values: c Wy = Π

Π

^T

Π

−1

Π

^T

| {z }

P

Wy

(31)

Spatial 2SLS (2)

Step 2:

OLS estimation of the SAR model parameters after the replacement of Wy with d Wy:

ρ ˆ β ˆ

=

h

Wy d X i T h

Wy d X i −1

h

Wy d X i T

y

Spatial 2SLS (S2SLS)

model <- stsls(y ~ x, listw = W)

(32)

Spatial ML (1)

Variant 2: maximum likelihood method y =

M

z }| {

(I − ρW) ⁻¹ Xβ +

M

z }| {

(I − ρW) ⁻¹ u, u ∼ N(0, σ ² ) L (u) = _σ

2

¹ 2π

^N₂

exp

− ^u _2σ

^T

^u

2

By the change of variables theorem (multivariate case):

L (y) = det

M

⁻¹

z }| {

∂u

∂y

L [u (y)]

L (y) =

det M ⁻¹ ₁

σ

²

2π

^N₂

exp

− ^(y−MXβ)

T

( ^M

⁻¹

)

^T

( ^M

⁻¹

) ^(y−MXβ)

2σ

²

β = arg max ˆ

β

L (y)

(33)

Spatial ML (2)

Standard errors evaluated on the basis of Hessian matrix at the maximum point of the likelihood function (typical for ML).

If M = I, the likelihood function identical as in the linear model.

ML for the SAR model in R

model <- lagsarlm(y ~ x, listw = W)

The same model is estimated when the formula argument in the function spautolm (pure SAR) is supplied with additional regressors.

(34)

Demand for computing power

The most burdensome operations:

matrix determinant: det M

⁻¹

matrix inversion: M

⁻¹

I recommend to useMicrosoft R Open (instead of standard R) – it contains mathematical libraries for multi-threading.

Test the following script on Your computer with standard R and MS R Open:

(35)

Tests: linear model vs SAR (1)

This illustration demonstrates the univariate case (θ – scalar).

(36)

Tests: linear model vs SAR (2)

LM ρ = ^N

tr [( ^W

^T

^+W ) ^W ] ⁺

_ˆ_{εT ˆ}¹_ε

( ^{WX ˆ} ^β )

^T

[ ^I−X ( ^X

^T

^X ) ^X

^T

]( ^{WX ˆ} ^β )

ˆ ε

^T

Wy ˆ ε

^T

ˆ ε

2 ∼ χ ² (1)

H 0 : linear model (ρ = 0) H ₁ : SAR

(37)

Plan prezentacji

1 Linear model vs SAR/SLM (Spatial Lag)

2 Model SEM (Spatial Error)

3 SLX model

4 Combining point GIS data with regional statistics

(38)

Flow of impacts in SEM model

(39)

SEM model – relation to other models

(40)

SEM model – relation to other models

(41)

SEM model – specification

It is not the dependent variable, but the error term, that is subject to spatial autocorrelation – the difference is analogous to the difference between AR and MA models.

y = Xβ + ε ε = λWε + u

In the absence of regressors X, the model would be equivalent to (pure) SAR.

Spatial clustering in unobservables (shocks).

(42)

SEM model – specification

It is not the dependent variable, but the error term, that is subject to spatial autocorrelation – the difference is analogous to the difference between AR and MA models.

y = Xβ + ε ε = λWε + u

In the absence of regressors X, the model would be equivalent to (pure) SAR.

Spatial clustering in unobservables (shocks).

(43)

SEM model – specification

It is not the dependent variable, but the error term, that is subject to spatial autocorrelation – the difference is analogous to the difference between AR and MA models.

y = Xβ + ε ε = λWε + u

In the absence of regressors X, the model would be equivalent to (pure) SAR.

Spatial clustering in unobservables (shocks).

(44)

SEM model – estimation (1)

OLS estimator is inefficient (and the standard errors – biased), because:

y = Xβ + ε ε = λWε + u, czyli ε = (I − λW)

⁻¹

u Var (ε) = E εε

^T

= (I − λW)

⁻¹

E uu

^T

(I − λW)

⁻¹

T

= σ

²

(I − λW)

⁻¹

(I − λW)

⁻¹

T

6= σ

²

I

Variant 1: as usually with non-spherical errors, the solution is Generalised Least Squares estimation:

β = X ˆ

^T

Ω

⁻¹

X

⁻¹

X

^T

Ω

⁻¹

y with given Ω = (I − λW)

⁻¹

(I − λW)

⁻¹

^T

W known, λ estimated based on errors derived from the consistent OLS estimation (details of the procedure: Kelejian and Prucha, 1998; Arbia, 2014).

Var ˆ β

= ˆ σ

²

X

^T

Ω

⁻¹

X

−1

(45)

SEM model – estimation (2)

Spatial GLS in R

model4 <- GMerrorsar(y ~ x, listw = W)

(46)

SEM model – estimation (3)

Variant 2: maximum likelihood method y = Xβ +

M

z }| {

(I − λW) ⁻¹ u, u ∼ N(0, σ ² ) L (u) = _σ

2

¹ 2π

^N₂

exp

− ^u _2σ

^T

^u

2

By the change of variables theorem (multivariate case):

L (y) = det

M

⁻¹

z }| {

∂u

∂y

L [u (y)]

L (y) =

det M ⁻¹ ₁

σ

²

2π

^N₂

exp

− ^(y−Xβ)

T

( ^M

⁻¹

)

^T

( ^M

⁻¹

) ^(y−Xβ)

2σ

²

β = arg max ˆ

β

L (y)

(47)

SEM model – estimation (4)

Standard errors evaluated on the basis of Hessian matrix at the maximum point of the likelihood function (typical for ML).

If M = I, the likelihood function identical as in the linear model.

ML for SEM model in R

model <- errorsarlm(y ~ x, listw = W)

The same model will also be estimated, if the formula in the function spautolm (pure SAR) supplied with regressors.

(48)

Both SAR and SEM collapse to pure SAR without regressors X

SAR SEM

y = ρWy + Xβ + ε y = Xβ + (I − λW) ⁻¹ u β = 0

y = ρWy + ε

y = (I − λW) ⁻¹ u y − ρWy = ε

(I − ρW) y = ε y = (I − ρW) ⁻¹ ε

(49)

LM tests: linear model vs SEM

LM _λ = ^N

²

tr [( ^W

^T

^+W ) ^W ]

ˆ u

^T

Wˆ u ˆ u

^T

ˆ u

2 ∼ χ ² (1) H 0 : linear model (λ = 0)

H 1 : SEM

(50)

Robust LM tests (1)

In LM thests for SAR and SEM specifications (respectively):

1

H 0 : linear model (ρ = 0), H 1 : SAR

2

H 0 : linear model (λ = 0), H 1 : SEM

Problem: each pair of hypotheses leaves out of sight the alternative hypothesis from the other pair of the other test.

Consequence: test 1 rejects H ₀ even under false H ₁ (but true H ₁ from test 2). And vice versa.

RLMlag and RLMerr

Anselin et al. (1996) propose robust test statistics LM _ρ ^∗ and LM _λ ^∗ , which – by construction – exclude the possibility that an incorrect process is captured by the alternative hypothesis (see Arbia, 2014).

LM _ρ ^∗ = LM _ρλ − LM _λ LM _λ ^∗ = LM _ρλ − LM _ρ

(51)

Global vs local SEM model (1)

The previously presented SEM model stipulated a global dependence between unobservables:

y = Xβ + ε ε = λWε + u The local SEM version:

y = Xβ + ε ε = λWu + u

What is the difference? Consider spatial multiplier matrices of y with respect to u in both cases:

local SEM: y = Xβ + ε, ε = (I + λW)u M =

^∂y_∂u

= (I + λW) global SEM: y = Xβ + ε, ε = (I − λW)

⁻¹

u M =

^∂y_∂u

= (I − λW)

⁻¹

Algebraically, note that:

multiplier SEM glob

z }| {

(I − λW)

⁻¹

=

multiplier SEM loc

z }| {

I + λW + λ

²

W

²

+ λ

³

W

³

+ ...

(52)

Global vs local SEM model (2)

Example: Canada, USA, Mexico; W =







US 0 0.5 ^CA ^MX 0.5

1 0 0





 ; λ = 0.4; shock u = 1 occurs in Mexico.

Spatial multiplierrs for local SEM:



I + 0.4





0 0.5 0.5

1 0 0











 0 0 1



 =





1 0.2 0.2

0.4 1 0

0.4 0 1







 0 0 1



 =



 0.2

0 1





∆y

MX

= 1, ∆y

US

= 0.2, no effect for Canada.

Shock in u affected y in the directly linked units.

(53)

Global vs local SEM model (3)

Spatial multipliers for global SEM:



I − 0.4





0 0.5 0.5

1 0 0









−1



 0 0 1



 =





1 −0.2 −0.2

−0.4 1 0

−0.4 0 1





−1



 0 0 1



 ≈





1.19 0.24 0.24 0.48 1.10 0.10 0.48 0.10 1.10







 0 0 1



 ≈



 0.24 0.10 1.10





∆y

MX

> 1, ∆y

US

> 0.2, there is (weak but positive) effect for Canada

The impulse spills over to the related units, and then to their own related units, etc. (including the feedback into the impulse region).

(54)

Plan prezentacji

1 Linear model vs SAR/SLM (Spatial Lag)

2 Model SEM (Spatial Error)

3 SLX model

4 Combining point GIS data with regional statistics

(55)

Flow of impacts in the SLX model

(56)

SLX model – relation to other models

(57)

SLX model – relation to other models

(58)

SLX model – specification

Direct impact of causes in the neighbourhood on the consequence in the observed region – spatial spillovers::

y = Xβ + WXθ + ε

Consistent, efficient and unbiased estimation with OLS.

(59)

SLX model – specification

Direct impact of causes in the neighbourhood on the consequence in the observed region – spatial spillovers::

y = Xβ + WXθ + ε

Consistent, efficient and unbiased estimation with OLS.

(60)

Plan prezentacji

1 Linear model vs SAR/SLM (Spatial Lag)

2 Model SEM (Spatial Error)

3 SLX model

4 Combining point GIS data with regional statistics

(61)

GIS data about the markets Biedronka

Source: poiplaza.com

POI: points of interest (usually published for the users of car GPS navigation sets)

Point data about location of individual markets in Poland.

Longitude and latitude.

Question: what regional criteria do the managers / owners of Biedronka use when locating their markets?

In other words, is there a relationship between the number of Biedronka markets (per capita) and local socio-economic characteristics from Local Data Bank, e.g. on the level of poviats?

(62)

GIS data about the markets Biedronka

Source: poiplaza.com

POI: points of interest (usually published for the users of car GPS navigation sets)

Point data about location of individual markets in Poland.

Longitude and latitude.

Question: what regional criteria do the managers / owners of Biedronka use when locating their markets?

In other words, is there a relationship between the number of Biedronka markets (per capita) and local socio-economic characteristics from Local Data Bank, e.g. on the level of poviats?

(63)

GIS data about the markets Biedronka

Source: poiplaza.com

POI: points of interest (usually published for the users of car GPS navigation sets)

Point data about location of individual markets in Poland.

Longitude and latitude.

Question: what regional criteria do the managers / owners of Biedronka use when locating their markets?

In other words, is there a relationship between the number of Biedronka markets (per capita) and local socio-economic characteristics from Local Data Bank, e.g. on the level of poviats?

(64)

Aggregation of points on the predefined map

(65)

Aggregation of points on the predefined map

Function ClassIntervals – watch out the style parameter. So far, we have been using quantile – division into classes of equal count for the purpose of presentation.

It may not make much sense for a count variable (a lot of ”ties” around the limits of classes, colours will be allocated accidentally).

(66)

Estimates of 3 models

Fit the following regression models of the number of Biedronka markets per 10 thousand residents on labour market statistics (unemployment, wages):

linear SLM SEM SLX

Compare the models as regards the significance of variables, AIC criterion, log-likelihood value at maximum and the

presence of unremoved spatial autocorrelation. Which model is the best?

(67)

Exercise

Derive the likelihood function for SEM model with local error dependence.

Develop an R code for the estimation of such a model.

(68)

Institute of Econometrics – Department of Applied Econometrics

Spatial Econometrics

Lecture 5: Single-source model of spatial regression. Combining GIS and regional analysis

Andrzej Torój

Institute of Econometrics – Department of Applied Econometrics

Outline

1 Linear model vs SAR/SLM (Spatial Lag) Linear model

SAR (Spatial Lag, SLM)

2 Model SEM (Spatial Error)

SEM model with global error dependence SEM model with local error dependence

3 SLX model

4 Combining point GIS data with regional statistics Example: location of Biedronka markets Homework

Plan prezentacji

1 Linear model vs SAR/SLM (Spatial Lag)

2 Model SEM (Spatial Error)

3 SLX model

4 Combining point GIS data with regional statistics

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

Linear regression model – specification

The well-known linear regression model:

y = Xβ + ε

Its parameters can be estimated in an unbiased, consistent and efficient way via Ordinary Least Squares (OLS) method.

Appropriate, when spatial links in y are fully (implicitly) captured through the spatial autocorrelation of regressors included in X (spatial clustering of X).

Flow of impacts in the linear model

Flow of impacts in SAR model

SAR model – relation to other models

SAR model – relation to other models

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

SAR model – specification

Spatial autoregression with additional regressors.

y = ρWy + Xβ + ε

Without any explanatory variables X in the model, it would be identical with pure SAR.

In this model, we do not assume any spatial clustering of the causes, but spatial interactions in outcomes (spatial global spillovers, spatial spillovers).

Problem with OLS estimation: endogeneity (like in pure SAR).

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

(true) parameter of the skipped variable

slope of the regression of the skipped variable on the included variables

In our case:

plim ˆ β KMNK = β + ρ Cov (Wy,X) Var (X)

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

(true) parameter of the skipped variable

slope of the regression of the skipped variable on the included variables

In our case:

plim ˆ β KMNK = β + ρ Cov (Wy,X) Var (X)

Consequences of omitting spatial structure SAR (1)

True data generating process: y = ρWy + Xβ + ε Estimated linear model omitting Wy (method – OLS):

y = Xβ KMNK + ε

According to the general principles of econometrics, omitting a variable results in the estimation bias of β, that converges to the product of:

y = Xβ _KMNK + ε

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}

y = Xβ _KMNK + ε

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}

y = Xβ _KMNK + ε

plim ˆ β _KMNK = β + ρ ^{Cov (Wy,X)} _{Var (X)}