AndrzejTorój Lecture11:Spatialmodelsofbinaryvariables SpatialEconometrics

(1)

Spatial Econometrics

Lecture 11: Spatial models of binary variables

Andrzej Torój

Institute of Econometrics – Department of Applied Econometrics

Andrzej Torój Institute of Econometrics – Department of Applied Econometrics

(2)

1 Binary variable models: what’s special about spatial Nonspatial models of binary variable

Binary variable models: spatial version

2 Estimation of parameters for Probit-SAR model Numerical evaluation of the likelihood funciton RIS

3 Interpretation of coefficients

(3)

1 Binary variable models: what’s special about spatial

2 Estimation of parameters for Probit-SAR model

3 Interpretation of coefficients

(4)

Observable dependent variable with binary outcomes: y

i

∈ {0; 1}, i = 1, ..., N.

We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y

_i^∗

– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x

i

β – factors increasing the probability of 1s) and a random component (ε

i

):

y

_i^∗

= x

i

β + ε

i

, y

i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

i^∗

≤ 0)

In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y

^∗

:

L (β) = P (y

1

, y

2

, ..., y

N

) = ˆ

0

−∞

ˆ

0

−∞

. . .

| {z }

y_i=0

ˆ

∞

0

ˆ

∞

0

. . .

| {z }

y_i=1

f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

1^∗

(5)

Observable dependent variable with binary outcomes: y

i

∈ {0; 1}, i = 1, ..., N.

We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y

_i^∗

– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x

i

β – factors increasing the probability of 1s) and a random component (ε

i

):

y

_i^∗

= x

i

β + ε

i

, y

i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

i^∗

≤ 0)

In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y

^∗

:

L (β) = P (y

1

, y

2

, ..., y

N

) = ˆ

0

−∞

ˆ

0

−∞

. . .

| {z }

y_i=0

ˆ

∞

0

ˆ

∞

0

. . .

| {z }

y_i=1

f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

1^∗

(6)

Observable dependent variable with binary outcomes: y

i

∈ {0; 1}, i = 1, ..., N.

We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y

_i^∗

– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x

i

β – factors increasing the probability of 1s) and a random component (ε

i

):

y

_i^∗

= x

i

β + ε

i

, y

i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

i^∗

≤ 0)

In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y

^∗

:

L (β) = P (y

1

, y

2

, ..., y

N

) = ˆ

0

−∞

ˆ

0

−∞

. . .

| {z }

y_i=0

ˆ

∞

0

ˆ

∞

0

. . .

| {z }

y_i=1

f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

1^∗

(7)

Under independent observations and Bernoulli distribution we can factorize the integral into a product of one-dimensional integrals and write the likelihood function as:

L (β) = Π

i : y_i=0

[P (y

_i^∗

≤ 0)] Π

i : y_i=1

[1 − P (y

_i^∗

≤ 0)] =

= Π

i : y_i=0

[P (ε

i

≤ −x

i

β)] · Π

i : y_i=1

[1 − P (ε

i

≤ −x

i

β)] =

= Π

^Ni =1

[F (−x

i

β)]

^yⁱ

[1 − F (−x

i

β)]

^1−yⁱ

=

= Π

^N_{i =1}

("

_−x

iβ

´

−∞

f (ε

i

) d ε

i

#

y_i

"

1 −

−x

´

_iβ

−∞

f (ε

i

) d ε

i

#

1−y_i

)

Further derivation depends on the choice of density function f (ε

i

) and the integration technique. To common options (both easily subject to an analytical treatment):

Logistic density – logit model.

Normal density – probit model.

(8)

Under independent observations and Bernoulli distribution we can factorize the integral into a product of one-dimensional integrals and write the likelihood function as:

L (β) = Π

i : y_i=0

[P (y

_i^∗

≤ 0)] Π

i : y_i=1

[1 − P (y

_i^∗

≤ 0)] =

= Π

i : y_i=0

[P (ε

i

≤ −x

i

β)] · Π

i : y_i=1

[1 − P (ε

i

≤ −x

i

β)] =

= Π

^Ni =1

[F (−x

i

β)]

^yⁱ

[1 − F (−x

i

β)]

^1−yⁱ

=

= Π

^N_{i =1}

("

_−x

iβ

´

−∞

f (ε

i

) d ε

i

#

y_i

"

1 −

−x

´

_iβ

−∞

f (ε

i

) d ε

i

#

1−y_i

)

Further derivation depends on the choice of density function f (ε

i

) and the integration technique. To common options (both easily subject to an analytical treatment):

Logistic density – logit model.

Normal density – probit model.

(9)

Firstly: to be decided on the specification level – interdependence within the neighbourhood for the binary observable or the

unobservable propensity?

y

₁^∗

= β

1

y

2

+ β

2

x

1

+ ε

1

or y

₁^∗

= β

1

y

₂^∗

+ β

2

x

1

+ ε

1

Logical problem:

y

₂

= 1 → y

^β¹ ₁^∗

↑→P (y

1

= 1) ↑ → y

^β¹ ₂^∗

↑

That is: from the fact that event 2 occurred, it can be inferred that its probability should increase (?!).

For this reason, we normally assume a spatial interdependence on the level of the unobservable variable y

_j^∗

.

(10)

Firstly: to be decided on the specification level – interdependence within the neighbourhood for the binary observable or the

unobservable propensity?

y

₁^∗

= β

1

y

2

+ β

2

x

1

+ ε

1

or y

₁^∗

= β

1

y

₂^∗

+ β

2

x

1

+ ε

1

Logical problem:

y

₂

= 1 → y

^β¹ ₁^∗

↑→P (y

1

= 1) ↑ → y

^β¹ ₂^∗

↑

That is: from the fact that event 2 occurred, it can be inferred that its probability should increase (?!).

For this reason, we normally assume a spatial interdependence on the level of the unobservable variable y

_j^∗

.

(11)

Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:

y

^∗

= ρWy

^∗

+ Xβ + ε, y

_i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

_i^∗

≤ 0) y

^∗

= (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

| {z }

υ

Var (υ) = (I − ρW)

⁻¹

E εε

⁰

h

(I − ρW)

⁻¹

i

⁰

= σ

²_ε

h

(I − ρW)

0

(I − ρW) i

−1

Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.

Likewise for SEM.

(12)

Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:

y

^∗

= ρWy

^∗

+ Xβ + ε, y

_i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

_i^∗

≤ 0) y

^∗

= (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

| {z }

υ

Var (υ) = (I − ρW)

⁻¹

E εε

⁰

h

(I − ρW)

⁻¹

i

⁰

= σ

²_ε

h

(I − ρW)

0

(I − ρW) i

−1

Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.

Likewise for SEM.

(13)

Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:

y

^∗

= ρWy

^∗

+ Xβ + ε, y

_i

=

( 1 for P (y

_i^∗

> 0) 0 for P (y

_i^∗

≤ 0) y

^∗

= (I − ρW)

⁻¹

Xβ + (I − ρW)

⁻¹

ε

| {z }

υ

Var (υ) = (I − ρW)

⁻¹

E εε

⁰

h

(I − ρW)

⁻¹

i

⁰

= σ

²_ε

h

(I − ρW)

0

(I − ρW) i

−1

Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.

Likewise for SEM.

(14)

Thirdly: likelihood function – multidimensional integral:

L β, ρ, σ ² _ε = P y ₁ , y ₂ , ..., y _N |β, ρ, σ ² _ε = ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

_i

=1

f N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

In the absence of independent observations, further analytical simplifications impossible.

Numerical difficulties consist in:

multidimensionality

sometimes unknown function f

N

(but for SAR-probit it is known: MVN)

truncated MVN distribution (individual dimensions: below or above zero)

(15)

Thirdly: likelihood function – multidimensional integral:

L β, ρ, σ ² _ε = P y ₁ , y ₂ , ..., y _N |β, ρ, σ ² _ε = ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

_i

=1

f N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

In the absence of independent observations, further analytical simplifications impossible.

Numerical difficulties consist in:

multidimensionality

sometimes unknown function f

N

(but for SAR-probit it is known: MVN)

truncated MVN distribution (individual dimensions: below or above zero)

(16)

Thirdly: likelihood function – multidimensional integral:

L β, ρ, σ ² _ε = P y ₁ , y ₂ , ..., y _N |β, ρ, σ ² _ε = ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

_i

=1

f N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

In the absence of independent observations, further analytical simplifications impossible.

Numerical difficulties consist in:

multidimensionality

sometimes unknown function f

N

(but for SAR-probit it is known: MVN)

truncated MVN distribution (individual dimensions: below or above zero)

(17)

1 Binary variable models: what’s special about spatial

2 Estimation of parameters for Probit-SAR model

3 Interpretation of coefficients

(18)

The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ ² _ε ) and data (y, X, W).

Besides, the standard scheme applies:

1

Parameter starting values: β

⁽⁰⁾

, ρ

⁽⁰⁾

and σ

ε²⁽⁰⁾

.

2

Evaluation of L

β

⁽⁰⁾

, ρ

⁽⁰⁾

, σ

ε²⁽⁰⁾

.

3

Iterative update of parameters β

^{(i )}

, ρ

^{(i )}

and σ

ε^{2(i )}

within the selected maximization algorithm L...

4

...until convergence of L to the maximum.

Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).

(19)

The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ ² _ε ) and data (y, X, W).

Besides, the standard scheme applies:

1

Parameter starting values: β

⁽⁰⁾

, ρ

⁽⁰⁾

and σ

ε²⁽⁰⁾

.

2

Evaluation of L

β

⁽⁰⁾

, ρ

⁽⁰⁾

, σ

ε²⁽⁰⁾

.

3

Iterative update of parameters β

^{(i )}

, ρ

^{(i )}

and σ

ε^{2(i )}

within the selected maximization algorithm L...

4

...until convergence of L to the maximum.

Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).

(20)

The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ ² _ε ) and data (y, X, W).

Besides, the standard scheme applies:

1

Parameter starting values: β

⁽⁰⁾

, ρ

⁽⁰⁾

and σ

ε²⁽⁰⁾

.

2

Evaluation of L

β

⁽⁰⁾

, ρ

⁽⁰⁾

, σ

ε²⁽⁰⁾

.

3

Iterative update of parameters β

^{(i )}

, ρ

^{(i )}

and σ

ε^{2(i )}

within the selected maximization algorithm L...

4

...until convergence of L to the maximum.

Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).

(21)

The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ ² _ε ) and data (y, X, W).

Besides, the standard scheme applies:

1

Parameter starting values: β

⁽⁰⁾

, ρ

⁽⁰⁾

and σ

ε²⁽⁰⁾

.

2

Evaluation of L

β

⁽⁰⁾

, ρ

⁽⁰⁾

, σ

ε²⁽⁰⁾

.

3

Iterative update of parameters β

^{(i )}

, ρ

^{(i )}

and σ

ε^{2(i )}

within the selected maximization algorithm L...

4

...until convergence of L to the maximum.

Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).

(22)

The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ ² _ε ) and data (y, X, W).

Besides, the standard scheme applies:

1

Parameter starting values: β

⁽⁰⁾

, ρ

⁽⁰⁾

and σ

ε²⁽⁰⁾

.

2

Evaluation of L

β

⁽⁰⁾

, ρ

⁽⁰⁾

, σ

ε²⁽⁰⁾

.

3

Iterative update of parameters β

^{(i )}

, ρ

^{(i )}

and σ

ε^{2(i )}

within the selected maximization algorithm L...

4

...until convergence of L to the maximum.

Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).

(23)

L β, ρ, σ ² _ε

= P y ₁ , y ₂ , ..., y _N |β, ρ, σ _ε ² =

= ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

i

=1

f _N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

Since we cannot integrate analytically, we shall use numerical methods.

The proposed estimation method is Maximum Simulated Likelihood (MSL).

For R growing quicker than √

N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).

(24)

L β, ρ, σ ² _ε

= P y ₁ , y ₂ , ..., y _N |β, ρ, σ _ε ² =

= ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

i

=1

f _N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

Since we cannot integrate analytically, we shall use numerical methods.

The proposed estimation method is Maximum Simulated Likelihood (MSL).

For R growing quicker than √

N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).

(25)

L β, ρ, σ ² _ε

= P y ₁ , y ₂ , ..., y _N |β, ρ, σ _ε ² =

= ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

i

=1

f _N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

Since we cannot integrate analytically, we shall use numerical methods.

The proposed estimation method is Maximum Simulated Likelihood (MSL).

For R growing quicker than √

N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).

(26)

L β, ρ, σ ² _ε

= P y ₁ , y ₂ , ..., y _N |β, ρ, σ _ε ² =

= ˆ 0

−∞

ˆ 0

−∞

. . .

| {z }

y

i

=0

ˆ ∞

0 ˆ ∞

0 . . .

| {z }

y

i

=1

f _N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗

Since we cannot integrate analytically, we shall use numerical methods.

The proposed estimation method is Maximum Simulated Likelihood (MSL).

For R growing quicker than √

N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).

(27)

Since we integrate only over a truncated part of its domain, let’s transform the problem:

L β, ρ, σ

²_ε

= ˆ

0

−∞

ˆ

0

−∞

. . .

| {z }

y_i=0

ˆ

∞ 0

ˆ

∞ 0

. . .

| {z }

y_i=1

f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

₁^∗

=

= ˆ

∞

−∞

I

0

(y

i

) ˆ

∞

−∞

I

0

(y

i

) . . .

| {z }

y_i=0

ˆ

∞

−∞

I

1

(y

i

) ˆ

∞

−∞

I

1

(y

i

) . . .

| {z }

y_i=1

f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

₁^∗

=

∞

´

−∞

. . .

∞

´

−∞

I

01

(y

^∗

)f

N

(y

^∗

|Xβ) dy

_N^∗

...dy

₁^∗

where I

01

(y) = Π

_{i : I}₀_(y_i₎₌₁

I

<0

y

_i^∗

· Π

_{i : I}₁_(y

i)=1

I

>0

y

_i^∗

(i.e. 1 when the multivariate draw y

^∗

exactly reflects the set of 0s and 1s in the sample and 0 otherwise).

(28)

The indicator function I ₀₁ (y ^∗ ) will be named importance function, and the method – importance sampling.

The method is frequently applied in Bayesian econometrics, when we cannot draw from a given distribution, but we can draw from a different, approximate one.

Typical application: drawing from the truncated N / MVN / t / MVt distribution using its non-truncated counterpart.

In practice, the method boils down to the rejection of the draws located in the truncated parts of the domain.

For students/graduates of Bayesian Econometrics:

- Importance sampling e.g. with prior distributions only indicating the sign of the parameters.

- See wykład (in Polish).

(29)

The indicator function I ₀₁ (y ^∗ ) will be named importance function, and the method – importance sampling.

The method is frequently applied in Bayesian econometrics, when we cannot draw from a given distribution, but we can draw from a different, approximate one.

Typical application: drawing from the truncated N / MVN / t / MVt distribution using its non-truncated counterpart.

In practice, the method boils down to the rejection of the draws located in the truncated parts of the domain.

For students/graduates of Bayesian Econometrics:

- Importance sampling e.g. with prior distributions only indicating the sign of the parameters.

- See wykład (in Polish).

(30)

The density f _N is N-dimensional (in the case of probit – MVN):

L β, ρ, σ ² _ε

=

´ ∞

−∞

. . .

∞ ´

−∞

I ₀₁ (y) f _N (y ^∗ |Xβ)dy _N ^∗ ...dy ₁ ^∗ =

=

´ ∞

−∞

. . .

∞ ´

−∞

I ₀₁ (y) f _N h

(I − ρW) ⁻¹ Xβ + υ i

d υ ^∗ _N ...d υ ^∗ ₁

υ ∼ MVN

0, σ ² _ε h

(I − ρW)

⁰

(I − ρW) i −1

≡ MVN [0, Σ _ε ]

y ^∗ ∼ MVN h

(I − ρW) ⁻¹ Xβ, Σ _ε i

(31)

Solution (for r -th draw, r = 1, ..., R):

1

Draw independently υ ˜ ^{(r )} _i ∼ N 0, σ _i ² for i = 1, ..., N , where σ

²_i

is the i -th diagonal element of the matrix Σ

ε

.

2

Cholesky decomposition: Σ _ε = VV ⁰ allows to write:

υ ^{(r )} = V · ˜ υ ^{(r )} . Matrix V is upper triangular, which means:

1

independent draw of N-th (last) element υ

^{(r )}

,

2

draw of the element N − 1 for a given draw of N and given (by the matrix Σ

ε

) correlation of the last one with the

last-but-one,

3

draw of the last-but-two conditional upon the two last ones, etc.

3

Shift of the mean: y ^∗(r) = υ ^{(r )} + (I − ρW) ⁻¹ Xβ .

(32)

Solution (for r -th draw, r = 1, ..., R):

1

Draw independently υ ˜ ^{(r )} _i ∼ N 0, σ _i ² for i = 1, ..., N , where σ

²_i

is the i -th diagonal element of the matrix Σ

ε

.

2

Cholesky decomposition: Σ _ε = VV ⁰ allows to write:

υ ^{(r )} = V · ˜ υ ^{(r )} . Matrix V is upper triangular, which means:

1

independent draw of N-th (last) element υ

^{(r )}

,

2

draw of the element N − 1 for a given draw of N and given (by the matrix Σ

ε

) correlation of the last one with the

last-but-one,

3

draw of the last-but-two conditional upon the two last ones, etc.

3

Shift of the mean: y ^∗(r) = υ ^{(r )} + (I − ρW) ⁻¹ Xβ .

(33)

Solution (for r -th draw, r = 1, ..., R):

1

Draw independently υ ˜ ^{(r )} _i ∼ N 0, σ _i ² for i = 1, ..., N , where σ

²_i

is the i -th diagonal element of the matrix Σ

ε

.

2

Cholesky decomposition: Σ _ε = VV ⁰ allows to write:

υ ^{(r )} = V · ˜ υ ^{(r )} . Matrix V is upper triangular, which means:

1

independent draw of N-th (last) element υ

^{(r )}

,

2

draw of the element N − 1 for a given draw of N and given (by the matrix Σ

ε

) correlation of the last one with the

last-but-one,

3

draw of the last-but-two conditional upon the two last ones, etc.

3

Shift of the mean: y ^∗(r) = υ ^{(r )} + (I − ρW) ⁻¹ Xβ .

(34)

For each r -th sample, r = 1, ..., R, we have y ^∗(r) . The evaluation of:

L β, ρ, σ _ε ² = ˆ ∞

−∞

. . . ˆ ∞

−∞

I 01 (y ^∗ ) f _N (y ^∗ |Xβ) dy _N ^∗ ...dy ₁ ^∗ =

resembles the computation of expected value of I ₀₁ (y ^∗ ) over the density f _N (y ^∗ |Xβ), that is:

1

for each draw y ^∗(r) evaluate I ₀₁ (y ^∗ ) as 0 or 1

2

compute the mean of the obtained sequence of 0s and 1s

(35)

The above procedure is correct for R 0.

I

01

(y

^∗

) – multivariate indicator function – rarely takes the value of 1 (the drawn vector y

^∗

would have to imply

EXACTLY the same sequence of 0s and 1s as in the sample).

Extremely inefficient numerically, since we do not know at least approximate values of β.

Sometimes referred to as brute force method (cf. Lerman and Manski, 1981).

Solution: Recursive Importance Sampling (RIS).

(36)

The above procedure is correct for R 0.

I

01

(y

^∗

) – multivariate indicator function – rarely takes the value of 1 (the drawn vector y

^∗

would have to imply

EXACTLY the same sequence of 0s and 1s as in the sample).

Extremely inefficient numerically, since we do not know at least approximate values of β.

Sometimes referred to as brute force method (cf. Lerman and Manski, 1981).

Solution: Recursive Importance Sampling (RIS).

(37)

Transform the initial problem:

L β, ρ, σ

²_ε

= ˆ

0

−∞

ˆ

0

−∞

. . .

| {z }

y_i=0

ˆ

∞ 0

ˆ

∞ 0

. . .

| {z }

y_i=1

f

N

h

(I − ρW)

⁻¹

Xβ + υ i

d υ

_N^∗

...d υ

^∗₁

=

¯

0

−∞

f

N

h

Q (I − ρW)

⁻¹

Xβ + υ i d υ =

= P h

Q (I − ρW)

⁻¹

Xβ + υ ≤ 0 i

=

= P





 υ ≤ −Q (I − ρW)

⁻¹

Xβ

| {z }

≡µ







where: Q (y) =





 1 − 2y

1

. . .

1 − 2y

_N





 (this notation serves the purpose of

setting the diagonal elements as{−1; 1} and transforming all the inequalities to the form ≤ by using the symmetry of N distribution).

(38)

L β, ρ, σ

²_ε

= P [υ ≤ µ] υ ∼ MVN (0, Σ

ε

) = MVN (0, VV

⁰

) Since υ

^{(r )}

= V · ˜ υ

^{(r )}

, then:

P h

υ

^{(r )}

≤ µ i

= P h

V · ˜ υ

^{(r )}

≤ µ i

= P h

˜

υ

^{(r )}

≤ V

⁻¹

· µ i

≡ P h

˜ υ

^{(r )}

≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.

We can exploit the indepencence between the individual dimensions of

˜

υ

^{(r )}

to write:

L

^{(r )}

β, ρ, σ

²_ε

= P h

˜ υ

^{(r )}

≤ ˜ µ i

= Π

^N_{i =1}

P h

˜ υ

i(r )

≤ ˜ µ

i

= Π

^N_{i =1}

Φ

˜ µ_i σ_i

where Φ (.) – standard normal distribution function.

Typically for importance sampling: L β, ρ, σ

_ε²

=

_R¹

Σ

^R_{r =1}

L

^{(r )}

β, ρ, σ

²_ε

.

(39)

L β, ρ, σ

²_ε

= P [υ ≤ µ] υ ∼ MVN (0, Σ

ε

) = MVN (0, VV

⁰

) Since υ

^{(r )}

= V · ˜ υ

^{(r )}

, then:

P h

υ

^{(r )}

≤ µ i

= P h

V · ˜ υ

^{(r )}

≤ µ i

= P h

˜

υ

^{(r )}

≤ V

⁻¹

· µ i

≡ P h

˜ υ

^{(r )}

≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.

We can exploit the indepencence between the individual dimensions of

˜

υ

^{(r )}

to write:

L

^{(r )}

β, ρ, σ

²_ε

= P h

˜ υ

^{(r )}

≤ ˜ µ i

= Π

^N_{i =1}

P h

˜ υ

i(r )

≤ ˜ µ

i

= Π

^N_{i =1}

Φ

˜ µ_i σ_i

where Φ (.) – standard normal distribution function.

Typically for importance sampling: L β, ρ, σ

_ε²

=

_R¹

Σ

^R_{r =1}

L

^{(r )}

β, ρ, σ

²_ε

.

(40)

L β, ρ, σ

²_ε

= P [υ ≤ µ] υ ∼ MVN (0, Σ

ε

) = MVN (0, VV

⁰

) Since υ

^{(r )}

= V · ˜ υ

^{(r )}

, then:

P h

υ

^{(r )}

≤ µ i

= P h

V · ˜ υ

^{(r )}

≤ µ i

= P h

˜

υ

^{(r )}

≤ V

⁻¹

· µ i

≡ P h

˜ υ

^{(r )}

≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.

We can exploit the indepencence between the individual dimensions of

˜

υ

^{(r )}

to write:

L

^{(r )}

β, ρ, σ

²_ε

= P h

˜ υ

^{(r )}

≤ ˜ µ i

= Π

^N_{i =1}

P h

˜ υ

i(r )

≤ ˜ µ

i

= Π

^N_{i =1}

Φ

˜ µ_i σ_i

where Φ (.) – standard normal distribution function.

Typically for importance sampling: L β, ρ, σ

_ε²

=

_R¹

Σ

^R_{r =1}

L

^{(r )}

β, ρ, σ

²_ε

.

(41)

L β, ρ, σ

²_ε

= P [υ ≤ µ] υ ∼ MVN (0, Σ

ε

) = MVN (0, VV

⁰

) Since υ

^{(r )}

= V · ˜ υ

^{(r )}

, then:

P h

υ

^{(r )}

≤ µ i

= P h

V · ˜ υ

^{(r )}

≤ µ i

= P h

˜

υ

^{(r )}

≤ V

⁻¹

· µ i

≡ P h

˜ υ

^{(r )}

≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.

We can exploit the indepencence between the individual dimensions of

˜

υ

^{(r )}

to write:

L

^{(r )}

β, ρ, σ

²_ε

= P h

˜ υ

^{(r )}

≤ ˜ µ i

= Π

^N_{i =1}

P h

˜ υ

i(r )

≤ ˜ µ

i

= Π

^N_{i =1}

Φ

˜ µ_i σ_i

where Φ (.) – standard normal distribution function.

Typically for importance sampling: L β, ρ, σ

_ε²

=

_R¹

Σ

^R_{r =1}

L

^{(r )}

β, ρ, σ

²_ε

.

(42)

MSL with RIS: McMillen (1992), spprobitml {McSpatial}

GMM variant: Klier, McMillen (2008), gmmprobit {McSpatial}

Due to a high level of complication, some authors propose using Bayesian methods: LeSage and Pace (2009), sarprobit {spatialprobit}

For students/graduates of Bayesian econometrics:

- Prior distribution: non-informative normal-gamma-uniform (normal-gamma for β and

_σ¹2

ε

, and uniform for ρ).

- Posterior sampling method -- Metropolis-within-Gibbs. Conditional posterior distributions:

1) P σ

_ε²

|ρ, β = P σ

²_ε

∼ InvGamma 2) P β|ρ, σ

_ε²

∼ N (known parameters)

3) P ρ|β, σ

_ε²

∼? (evaluation by the Metropolis-Hastings algorithm)

(43)

1 Binary variable models: what’s special about spatial

2 Estimation of parameters for Probit-SAR model

3 Interpretation of coefficients

(44)

Recall that in non-spatial probit models: ^∂P(y _∂x

ⁱ

⁼¹⁾

i ,k

= f (β, x _i ).

Marginal effects depend not only on the coefficients, but on the level of the independent variable (for which we compute the effects) and the levels of all the other independent variables for a given unit.

In spatial probit models, it holds additionally that

∂P(y

i

=1)

∂x

i ,k

= f (β, ρ, W, X) .

Apart from all the abovementioned factors, as well as spatial parameters and weights, the effects for a given unit depend on the levels of all explanatory variables for all the units.

(45)

Recall that in non-spatial probit models: ^∂P(y _∂x

ⁱ

⁼¹⁾

i ,k

= f (β, x _i ).

Marginal effects depend not only on the coefficients, but on the level of the independent variable (for which we compute the effects) and the levels of all the other independent variables for a given unit.

In spatial probit models, it holds additionally that

∂P(y

i

=1)

∂x

i ,k

= f (β, ρ, W, X) .

Apart from all the abovementioned factors, as well as spatial parameters and weights, the effects for a given unit depend on the levels of all explanatory variables for all the units.