Spatial Econometrics
Lecture 11: Spatial models of binary variables
Andrzej Torój
Institute of Econometrics – Department of Applied Econometrics
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
1 Binary variable models: what’s special about spatial Nonspatial models of binary variable
Binary variable models: spatial version
2 Estimation of parameters for Probit-SAR model Numerical evaluation of the likelihood funciton RIS
3 Interpretation of coefficients
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
1 Binary variable models: what’s special about spatial
2 Estimation of parameters for Probit-SAR model
3 Interpretation of coefficients
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Observable dependent variable with binary outcomes: y
i∈ {0; 1}, i = 1, ..., N.
We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y
i∗– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x
iβ – factors increasing the probability of 1s) and a random component (ε
i):
y
i∗= x
iβ + ε
i, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0)
In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y
∗:
L (β) = P (y
1, y
2, ..., y
N) = ˆ
0−∞
ˆ
0−∞
. . .
| {z }
yi=0
ˆ
∞0
ˆ
∞0
. . .
| {z }
yi=1
f
N(y
∗|Xβ) dy
N∗...dy
1∗Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Observable dependent variable with binary outcomes: y
i∈ {0; 1}, i = 1, ..., N.
We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y
i∗– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x
iβ – factors increasing the probability of 1s) and a random component (ε
i):
y
i∗= x
iβ + ε
i, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0)
In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y
∗:
L (β) = P (y
1, y
2, ..., y
N) = ˆ
0−∞
ˆ
0−∞
. . .
| {z }
yi=0
ˆ
∞0
ˆ
∞0
. . .
| {z }
yi=1
f
N(y
∗|Xβ) dy
N∗...dy
1∗Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Observable dependent variable with binary outcomes: y
i∈ {0; 1}, i = 1, ..., N.
We assume that the occurence of 0s and 1s is determined by an unobservable continuous variable y
i∗– propensity of i -th observation to take the value of 1, materialising itself above the threshold of 0 (this value comes without any loss of generality, as the propensity depends on an estimated constant). This propensity depends on a systematic component (x
iβ – factors increasing the probability of 1s) and a random component (ε
i):
y
i∗= x
iβ + ε
i, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0)
In general (for both spatial and nonspatial case): the likelihood function is an N-dimensional integral of an N-dimensional joint density of y
∗:
L (β) = P (y
1, y
2, ..., y
N) = ˆ
0−∞
ˆ
0−∞
. . .
| {z }
yi=0
ˆ
∞0
ˆ
∞0
. . .
| {z }
yi=1
f
N(y
∗|Xβ) dy
N∗...dy
1∗Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Under independent observations and Bernoulli distribution we can factorize the integral into a product of one-dimensional integrals and write the likelihood function as:
L (β) = Π
i : yi=0[P (y
i∗≤ 0)] Π
i : yi=1[1 − P (y
i∗≤ 0)] =
= Π
i : yi=0[P (ε
i≤ −x
iβ)] · Π
i : yi=1[1 − P (ε
i≤ −x
iβ)] =
= Π
Ni =1[F (−x
iβ)]
yi[1 − F (−x
iβ)]
1−yi=
= Π
Ni =1("
−xiβ
´
−∞
f (ε
i) d ε
i#
yi"
1 −
−x
´
iβ−∞
f (ε
i) d ε
i#
1−yi)
Further derivation depends on the choice of density function f (ε
i) and the integration technique. To common options (both easily subject to an analytical treatment):
Logistic density – logit model.
Normal density – probit model.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Under independent observations and Bernoulli distribution we can factorize the integral into a product of one-dimensional integrals and write the likelihood function as:
L (β) = Π
i : yi=0[P (y
i∗≤ 0)] Π
i : yi=1[1 − P (y
i∗≤ 0)] =
= Π
i : yi=0[P (ε
i≤ −x
iβ)] · Π
i : yi=1[1 − P (ε
i≤ −x
iβ)] =
= Π
Ni =1[F (−x
iβ)]
yi[1 − F (−x
iβ)]
1−yi=
= Π
Ni =1("
−xiβ
´
−∞
f (ε
i) d ε
i#
yi"
1 −
−x
´
iβ−∞
f (ε
i) d ε
i#
1−yi)
Further derivation depends on the choice of density function f (ε
i) and the integration technique. To common options (both easily subject to an analytical treatment):
Logistic density – logit model.
Normal density – probit model.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Firstly: to be decided on the specification level – interdependence within the neighbourhood for the binary observable or the
unobservable propensity?
y
1∗= β
1y
2+ β
2x
1+ ε
1or y
1∗= β
1y
2∗+ β
2x
1+ ε
1Logical problem:
y
2= 1 → y
β1 1∗↑→P (y
1= 1) ↑ → y
β1 2∗↑
That is: from the fact that event 2 occurred, it can be inferred that its probability should increase (?!).
For this reason, we normally assume a spatial interdependence on the level of the unobservable variable y
j∗.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Firstly: to be decided on the specification level – interdependence within the neighbourhood for the binary observable or the
unobservable propensity?
y
1∗= β
1y
2+ β
2x
1+ ε
1or y
1∗= β
1y
2∗+ β
2x
1+ ε
1Logical problem:
y
2= 1 → y
β1 1∗↑→P (y
1= 1) ↑ → y
β1 2∗↑
That is: from the fact that event 2 occurred, it can be inferred that its probability should increase (?!).
For this reason, we normally assume a spatial interdependence on the level of the unobservable variable y
j∗.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:
y
∗= ρWy
∗+ Xβ + ε, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0) y
∗= (I − ρW)
−1Xβ + (I − ρW)
−1ε
| {z }
υ
Var (υ) = (I − ρW)
−1E εε
0h
(I − ρW)
−1i
0= σ
2εh
(I − ρW)
0
(I − ρW) i
−1Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.
Likewise for SEM.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:
y
∗= ρWy
∗+ Xβ + ε, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0) y
∗= (I − ρW)
−1Xβ + (I − ρW)
−1ε
| {z }
υ
Var (υ) = (I − ρW)
−1E εε
0h
(I − ρW)
−1i
0= σ
2εh
(I − ρW)
0
(I − ρW) i
−1Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.
Likewise for SEM.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Secondly: spatial interdependence implies heteroskedasticity and spatial autocorrelation – e.g. for SAR:
y
∗= ρWy
∗+ Xβ + ε, y
i=
( 1 for P (y
i∗> 0) 0 for P (y
i∗≤ 0) y
∗= (I − ρW)
−1Xβ + (I − ρW)
−1ε
| {z }
υ
Var (υ) = (I − ρW)
−1E εε
0h
(I − ρW)
−1i
0= σ
2εh
(I − ρW)
0
(I − ρW) i
−1Heteroskedasticity is due to a varying degree of network connectivity from individual to individual.
Likewise for SEM.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Thirdly: likelihood function – multidimensional integral:
L β, ρ, σ 2 ε = P y 1 , y 2 , ..., y N |β, ρ, σ 2 ε = ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
In the absence of independent observations, further analytical simplifications impossible.
Numerical difficulties consist in:
multidimensionality
sometimes unknown function f
N(but for SAR-probit it is known: MVN)
truncated MVN distribution (individual dimensions: below or above zero)
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Thirdly: likelihood function – multidimensional integral:
L β, ρ, σ 2 ε = P y 1 , y 2 , ..., y N |β, ρ, σ 2 ε = ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
In the absence of independent observations, further analytical simplifications impossible.
Numerical difficulties consist in:
multidimensionality
sometimes unknown function f
N(but for SAR-probit it is known: MVN)
truncated MVN distribution (individual dimensions: below or above zero)
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Thirdly: likelihood function – multidimensional integral:
L β, ρ, σ 2 ε = P y 1 , y 2 , ..., y N |β, ρ, σ 2 ε = ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
In the absence of independent observations, further analytical simplifications impossible.
Numerical difficulties consist in:
multidimensionality
sometimes unknown function f
N(but for SAR-probit it is known: MVN)
truncated MVN distribution (individual dimensions: below or above zero)
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
1 Binary variable models: what’s special about spatial
2 Estimation of parameters for Probit-SAR model
3 Interpretation of coefficients
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ 2 ε ) and data (y, X, W).
Besides, the standard scheme applies:
1
Parameter starting values: β
(0), ρ
(0)and σ
ε2(0).
2
Evaluation of L
β
(0), ρ
(0), σ
ε2(0).
3
Iterative update of parameters β
(i ), ρ
(i )and σ
ε2(i )within the selected maximization algorithm L...
4
...until convergence of L to the maximum.
Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ 2 ε ) and data (y, X, W).
Besides, the standard scheme applies:
1
Parameter starting values: β
(0), ρ
(0)and σ
ε2(0).
2
Evaluation of L
β
(0), ρ
(0), σ
ε2(0).
3
Iterative update of parameters β
(i ), ρ
(i )and σ
ε2(i )within the selected maximization algorithm L...
4
...until convergence of L to the maximum.
Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ 2 ε ) and data (y, X, W).
Besides, the standard scheme applies:
1
Parameter starting values: β
(0), ρ
(0)and σ
ε2(0).
2
Evaluation of L
β
(0), ρ
(0), σ
ε2(0).
3
Iterative update of parameters β
(i ), ρ
(i )and σ
ε2(i )within the selected maximization algorithm L...
4
...until convergence of L to the maximum.
Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ 2 ε ) and data (y, X, W).
Besides, the standard scheme applies:
1
Parameter starting values: β
(0), ρ
(0)and σ
ε2(0).
2
Evaluation of L
β
(0), ρ
(0), σ
ε2(0).
3
Iterative update of parameters β
(i ), ρ
(i )and σ
ε2(i )within the selected maximization algorithm L...
4
...until convergence of L to the maximum.
Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above difficulties boil down to a single evaluation of the likelihood function value for given parameter values (β, ρ and σ 2 ε ) and data (y, X, W).
Besides, the standard scheme applies:
1
Parameter starting values: β
(0), ρ
(0)and σ
ε2(0).
2
Evaluation of L
β
(0), ρ
(0), σ
ε2(0).
3
Iterative update of parameters β
(i ), ρ
(i )and σ
ε2(i )within the selected maximization algorithm L...
4
...until convergence of L to the maximum.
Setting the direction of parameter change (point 3) also requires the evaluation of L (e.g. for numerical gradient evaluation).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ 2 ε
= P y 1 , y 2 , ..., y N |β, ρ, σ ε 2 =
= ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
Since we cannot integrate analytically, we shall use numerical methods.
The proposed estimation method is Maximum Simulated Likelihood (MSL).
For R growing quicker than √
N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ 2 ε
= P y 1 , y 2 , ..., y N |β, ρ, σ ε 2 =
= ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
Since we cannot integrate analytically, we shall use numerical methods.
The proposed estimation method is Maximum Simulated Likelihood (MSL).
For R growing quicker than √
N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ 2 ε
= P y 1 , y 2 , ..., y N |β, ρ, σ ε 2 =
= ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
Since we cannot integrate analytically, we shall use numerical methods.
The proposed estimation method is Maximum Simulated Likelihood (MSL).
For R growing quicker than √
N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ 2 ε
= P y 1 , y 2 , ..., y N |β, ρ, σ ε 2 =
= ˆ 0
−∞
ˆ 0
−∞
. . .
| {z }
y
i=0
ˆ ∞
0
ˆ ∞
0
. . .
| {z }
y
i=1
f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗
Since we cannot integrate analytically, we shall use numerical methods.
The proposed estimation method is Maximum Simulated Likelihood (MSL).
For R growing quicker than √
N – consistent and efficient estimation (Train, 2009 – free e-book about MSL).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Since we integrate only over a truncated part of its domain, let’s transform the problem:
L β, ρ, σ
2ε= ˆ
0−∞
ˆ
0−∞
. . .
| {z }
yi=0
ˆ
∞ 0ˆ
∞ 0. . .
| {z }
yi=1
f
N(y
∗|Xβ) dy
N∗...dy
1∗=
= ˆ
∞−∞
I
0(y
i) ˆ
∞−∞
I
0(y
i) . . .
| {z }
yi=0
ˆ
∞−∞
I
1(y
i) ˆ
∞−∞
I
1(y
i) . . .
| {z }
yi=1
f
N(y
∗|Xβ) dy
N∗...dy
1∗=
=
∞
´
−∞
. . .
∞
´
−∞
I
01(y
∗)f
N(y
∗|Xβ) dy
N∗...dy
1∗where I
01(y) = Π
i : I0(yi)=1I
<0y
i∗· Π
i : I1(yi)=1
I
>0y
i∗(i.e. 1 when the multivariate draw y
∗exactly reflects the set of 0s and 1s in the sample and 0 otherwise).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The indicator function I 01 (y ∗ ) will be named importance function, and the method – importance sampling.
The method is frequently applied in Bayesian econometrics, when we cannot draw from a given distribution, but we can draw from a different, approximate one.
Typical application: drawing from the truncated N / MVN / t / MVt distribution using its non-truncated counterpart.
In practice, the method boils down to the rejection of the draws located in the truncated parts of the domain.
For students/graduates of Bayesian Econometrics:
- Importance sampling e.g. with prior distributions only indicating the sign of the parameters.
- See wykład (in Polish).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The indicator function I 01 (y ∗ ) will be named importance function, and the method – importance sampling.
The method is frequently applied in Bayesian econometrics, when we cannot draw from a given distribution, but we can draw from a different, approximate one.
Typical application: drawing from the truncated N / MVN / t / MVt distribution using its non-truncated counterpart.
In practice, the method boils down to the rejection of the draws located in the truncated parts of the domain.
For students/graduates of Bayesian Econometrics:
- Importance sampling e.g. with prior distributions only indicating the sign of the parameters.
- See wykład (in Polish).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The density f N is N-dimensional (in the case of probit – MVN):
L β, ρ, σ 2 ε
=
´ ∞
−∞
. . .
∞ ´
−∞
I 01 (y) f N (y ∗ |Xβ)dy N ∗ ...dy 1 ∗ =
=
´ ∞
−∞
. . .
∞ ´
−∞
I 01 (y) f N h
(I − ρW) −1 Xβ + υ i
d υ ∗ N ...d υ ∗ 1
υ ∼ MVN
0, σ 2 ε h
(I − ρW)
0(I − ρW) i −1
≡ MVN [0, Σ ε ]
y ∗ ∼ MVN h
(I − ρW) −1 Xβ, Σ ε i
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Solution (for r -th draw, r = 1, ..., R):
1
Draw independently υ ˜ (r ) i ∼ N 0, σ i 2 for i = 1, ..., N , where σ
2iis the i -th diagonal element of the matrix Σ
ε.
2
Cholesky decomposition: Σ ε = VV 0 allows to write:
υ (r ) = V · ˜ υ (r ) . Matrix V is upper triangular, which means:
1
independent draw of N-th (last) element υ
(r ),
2
draw of the element N − 1 for a given draw of N and given (by the matrix Σ
ε) correlation of the last one with the
last-but-one,
3
draw of the last-but-two conditional upon the two last ones, etc.
3
Shift of the mean: y ∗(r) = υ (r ) + (I − ρW) −1 Xβ .
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Solution (for r -th draw, r = 1, ..., R):
1
Draw independently υ ˜ (r ) i ∼ N 0, σ i 2 for i = 1, ..., N , where σ
2iis the i -th diagonal element of the matrix Σ
ε.
2
Cholesky decomposition: Σ ε = VV 0 allows to write:
υ (r ) = V · ˜ υ (r ) . Matrix V is upper triangular, which means:
1
independent draw of N-th (last) element υ
(r ),
2
draw of the element N − 1 for a given draw of N and given (by the matrix Σ
ε) correlation of the last one with the
last-but-one,
3
draw of the last-but-two conditional upon the two last ones, etc.
3
Shift of the mean: y ∗(r) = υ (r ) + (I − ρW) −1 Xβ .
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Solution (for r -th draw, r = 1, ..., R):
1
Draw independently υ ˜ (r ) i ∼ N 0, σ i 2 for i = 1, ..., N , where σ
2iis the i -th diagonal element of the matrix Σ
ε.
2
Cholesky decomposition: Σ ε = VV 0 allows to write:
υ (r ) = V · ˜ υ (r ) . Matrix V is upper triangular, which means:
1
independent draw of N-th (last) element υ
(r ),
2
draw of the element N − 1 for a given draw of N and given (by the matrix Σ
ε) correlation of the last one with the
last-but-one,
3
draw of the last-but-two conditional upon the two last ones, etc.
3
Shift of the mean: y ∗(r) = υ (r ) + (I − ρW) −1 Xβ .
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
For each r -th sample, r = 1, ..., R, we have y ∗(r) . The evaluation of:
L β, ρ, σ ε 2 = ˆ ∞
−∞
. . . ˆ ∞
−∞
I 01 (y ∗ ) f N (y ∗ |Xβ) dy N ∗ ...dy 1 ∗ =
resembles the computation of expected value of I 01 (y ∗ ) over the density f N (y ∗ |Xβ), that is:
1
for each draw y ∗(r) evaluate I 01 (y ∗ ) as 0 or 1
2
compute the mean of the obtained sequence of 0s and 1s
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above procedure is correct for R 0.
I
01(y
∗) – multivariate indicator function – rarely takes the value of 1 (the drawn vector y
∗would have to imply
EXACTLY the same sequence of 0s and 1s as in the sample).
Extremely inefficient numerically, since we do not know at least approximate values of β.
Sometimes referred to as brute force method (cf. Lerman and Manski, 1981).
Solution: Recursive Importance Sampling (RIS).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
The above procedure is correct for R 0.
I
01(y
∗) – multivariate indicator function – rarely takes the value of 1 (the drawn vector y
∗would have to imply
EXACTLY the same sequence of 0s and 1s as in the sample).
Extremely inefficient numerically, since we do not know at least approximate values of β.
Sometimes referred to as brute force method (cf. Lerman and Manski, 1981).
Solution: Recursive Importance Sampling (RIS).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Transform the initial problem:
L β, ρ, σ
2ε= ˆ
0−∞
ˆ
0−∞
. . .
| {z }
yi=0
ˆ
∞ 0ˆ
∞ 0. . .
| {z }
yi=1
f
Nh
(I − ρW)
−1Xβ + υ i
d υ
N∗...d υ
∗1=
=
¯
0−∞
f
Nh
Q (I − ρW)
−1Xβ + υ i d υ =
= P h
Q (I − ρW)
−1Xβ + υ ≤ 0 i
=
= P
υ ≤ −Q (I − ρW)
−1Xβ
| {z }
≡µ
where: Q (y) =
1 − 2y
1. . .
1 − 2y
N
(this notation serves the purpose of
setting the diagonal elements as{−1; 1} and transforming all the inequalities to the form ≤ by using the symmetry of N distribution).
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ
2ε= P [υ ≤ µ] υ ∼ MVN (0, Σ
ε) = MVN (0, VV
0) Since υ
(r )= V · ˜ υ
(r ), then:
P h
υ
(r )≤ µ i
= P h
V · ˜ υ
(r )≤ µ i
= P h
˜
υ
(r )≤ V
−1· µ i
≡ P h
˜ υ
(r )≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.
We can exploit the indepencence between the individual dimensions of
˜
υ
(r )to write:
L
(r )β, ρ, σ
2ε= P h
˜ υ
(r )≤ ˜ µ i
= Π
Ni =1P h
˜ υ
i(r )≤ ˜ µ
ii
= Π
Ni =1Φ
˜ µi σi
where Φ (.) – standard normal distribution function.
Typically for importance sampling: L β, ρ, σ
ε2=
R1Σ
Rr =1L
(r )β, ρ, σ
2ε.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ
2ε= P [υ ≤ µ] υ ∼ MVN (0, Σ
ε) = MVN (0, VV
0) Since υ
(r )= V · ˜ υ
(r ), then:
P h
υ
(r )≤ µ i
= P h
V · ˜ υ
(r )≤ µ i
= P h
˜
υ
(r )≤ V
−1· µ i
≡ P h
˜ υ
(r )≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.
We can exploit the indepencence between the individual dimensions of
˜
υ
(r )to write:
L
(r )β, ρ, σ
2ε= P h
˜ υ
(r )≤ ˜ µ i
= Π
Ni =1P h
˜ υ
i(r )≤ ˜ µ
ii
= Π
Ni =1Φ
˜ µi σi
where Φ (.) – standard normal distribution function.
Typically for importance sampling: L β, ρ, σ
ε2=
R1Σ
Rr =1L
(r )β, ρ, σ
2ε.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ
2ε= P [υ ≤ µ] υ ∼ MVN (0, Σ
ε) = MVN (0, VV
0) Since υ
(r )= V · ˜ υ
(r ), then:
P h
υ
(r )≤ µ i
= P h
V · ˜ υ
(r )≤ µ i
= P h
˜
υ
(r )≤ V
−1· µ i
≡ P h
˜ υ
(r )≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.
We can exploit the indepencence between the individual dimensions of
˜
υ
(r )to write:
L
(r )β, ρ, σ
2ε= P h
˜ υ
(r )≤ ˜ µ i
= Π
Ni =1P h
˜ υ
i(r )≤ ˜ µ
ii
= Π
Ni =1Φ
˜ µi σi
where Φ (.) – standard normal distribution function.
Typically for importance sampling: L β, ρ, σ
ε2=
R1Σ
Rr =1L
(r )β, ρ, σ
2ε.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
L β, ρ, σ
2ε= P [υ ≤ µ] υ ∼ MVN (0, Σ
ε) = MVN (0, VV
0) Since υ
(r )= V · ˜ υ
(r ), then:
P h
υ
(r )≤ µ i
= P h
V · ˜ υ
(r )≤ µ i
= P h
˜
υ
(r )≤ V
−1· µ i
≡ P h
˜ υ
(r )≤ ˜ µ i The method is called recursive because of the triangularity of V matrix.
We can exploit the indepencence between the individual dimensions of
˜
υ
(r )to write:
L
(r )β, ρ, σ
2ε= P h
˜ υ
(r )≤ ˜ µ i
= Π
Ni =1P h
˜ υ
i(r )≤ ˜ µ
ii
= Π
Ni =1Φ
˜ µi σi
where Φ (.) – standard normal distribution function.
Typically for importance sampling: L β, ρ, σ
ε2=
R1Σ
Rr =1L
(r )β, ρ, σ
2ε.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
MSL with RIS: McMillen (1992), spprobitml {McSpatial}
GMM variant: Klier, McMillen (2008), gmmprobit {McSpatial}
Due to a high level of complication, some authors propose using Bayesian methods: LeSage and Pace (2009), sarprobit {spatialprobit}
For students/graduates of Bayesian econometrics:
- Prior distribution: non-informative normal-gamma-uniform (normal-gamma for β and
σ12ε
, and uniform for ρ).
- Posterior sampling method -- Metropolis-within-Gibbs. Conditional posterior distributions:
1) P σ
ε2|ρ, β = P σ
2ε∼ InvGamma 2) P β|ρ, σ
ε2∼ N (known parameters)
3) P ρ|β, σ
ε2∼? (evaluation by the Metropolis-Hastings algorithm)
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
1 Binary variable models: what’s special about spatial
2 Estimation of parameters for Probit-SAR model
3 Interpretation of coefficients
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Recall that in non-spatial probit models: ∂P(y ∂x
i=1)
i ,k
= f (β, x i ).
Marginal effects depend not only on the coefficients, but on the level of the independent variable (for which we compute the effects) and the levels of all the other independent variables for a given unit.
In spatial probit models, it holds additionally that
∂P(y
i=1)
∂x
i ,k= f (β, ρ, W, X) .
Apart from all the abovementioned factors, as well as spatial parameters and weights, the effects for a given unit depend on the levels of all explanatory variables for all the units.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
Recall that in non-spatial probit models: ∂P(y ∂x
i=1)
i ,k
= f (β, x i ).
Marginal effects depend not only on the coefficients, but on the level of the independent variable (for which we compute the effects) and the levels of all the other independent variables for a given unit.
In spatial probit models, it holds additionally that
∂P(y
i=1)
∂x
i ,k= f (β, ρ, W, X) .
Apart from all the abovementioned factors, as well as spatial parameters and weights, the effects for a given unit depend on the levels of all explanatory variables for all the units.
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics
M i ,j k =
= ∂P(y ∂x
i=1)
k,j
= ∂P(y ∂y
i∗=1)
i· ∂y i ∗
∂x k,j
| {z }
as before
= ∂P ( y
i∗>0 )
∂y
i∗· h
(I − ρW) −1 i
i .j β k =
=
∂P
y ∗i−
[
(I−ρW)−1Xβ]
i[Σε]i,i
>
0−
[
(I−ρW)−1Xβ]
i[Σε]i,i
!
∂y
i∗· h
(I − ρW) −1 i
i .j β k =
=
∂P
y ∗i −
[
(I−ρW)−1Xβ]
i[Σε]i,i
< [
(I−ρW)−1Xβ]
i[Σε]i,i
!
∂y
i∗· h
(I − ρW) −1 i
i .j β k =
= Φ
0[ (I−ρW)
−1Xβ ]
i[Σ
ε]
i ,i· [Σ 1
ε
]
i ,i· h
(I − ρW) −1 i
i .j β k
Andrzej Torój Institute of Econometrics – Department of Applied Econometrics