• Nie Znaleziono Wyników

Simultaneous equations

N/A
N/A
Protected

Academic year: 2021

Share "Simultaneous equations"

Copied!
43
0
0

Pełen tekst

(1)

Simultaneous equations

Katarzyna Bech

09.03.2017

() 09.03.2017 1 / 43

(2)

Motivation

In Economics, as in most other social sciences, theoretical models cannot usually be exposed to experimental testing and calibration:

people (regrettably!) know that they are being experimented upon, and it is usually unreasonable to assume their behaviour would be the same in a non-experimental situation.

Many of the statistical methods that have been developed for use in experimental situations cannot be applied, at least without

modi…cation and extension.

Speci…cally, in a ’hard’science application, a relationship between several variables can usually be physically isolated from all other sources of variation ! it is unnecessary for the statistician to model all of the possibly many factors that, in the absence of the laboratory facilities, might in‡uence the relation under study.

(3)

Motivation

In the social sciences we do not have this luxury: even when interested in perhaps a relatively simple relationship, it is usually necessary to model a much larger system of variables (the larger systems that actually generated the observed data).

Such systems are called Structural models.

We shall deal here with only the simplest forms of structural models:

those in which the relationships entering the system are linear in parameters.

(4)

Fundamental Issues: Illustrative example

We are interested in studying the demand function relating the number of digital radios sold in the UK to their price:

QtD = α+βPt+ fterms involving other variablesg +ε1t. The price we observe should be the market clearing price, which equates the demand for radios with the supply:

QtS =δ+γPt φCt+ fterms involving other variablesg +ε2t. Ignoring other variables, the economic theory suggests the following system:

QtD = α+βPt+ε1t (Demand) QtS = δ+γPt φCt +ε2t (Supply) QtD = QtS =Qt (Equilibrium).

(5)

Fundamental Issues: Illustrative example

As the model jointly determines prices and quantities, Pt and Qt are endogenous (shortly to be called ’y’). The variables that are

determined by other relationships independent of the system (here Ct) are exogenous (shortly to be called ’x’).

Assumption: we know beforehand which variables are endogenous and which are exogenous.

Both of the equations involve variables of both kinds, although some variables may not appear in some equations.

(6)

Notation and Assumptions

The sample data consists of T observations.

The endogenous variables for a single observation are labelled y1t, y2t, ..., yMt, t =1, ...T ,

and the exogenous variables are

x1t, x2t, ..., xkt, t =1, ...T .

The system contains M endogenous variables and k exogenous variables.

Assumption: there must be exactly M equations in the complete system.

(7)

Klein’s Model I

A widely used example of a simultaneous equations model of the economy:

Ct = α0+α1Pt+α2Pt 1+α3(WtP +Wtg) +ε1t (consumption) It = β0+β1Pt +β2Pt 1+β3Kt 1+ε2t (investment)

WtP = γ0+γ1Xt+γ2Xt 1+γ3At +ε3t (private wages) Xt = Ct +It +Gt (equilibrium demand)

Pt = Xt Tt WtP (private pro…ts) Kt = Kt 1+It (capital stock)

(8)

Notation and Assumptions

For a single observation:

yt = 2 66 4

y1t y2t

...

yMt 3 77 5

(M 1)

and xt = 2 66 4

x1t x2t

...

xkt 3 77 5

(k 1)

For all observations:

Y = 2 66 4

y10 y20 ...

yT0 3 77 5

(T M)

and X = 2 66 4

x10 x20 ...

xT0 3 77 5

(T k)

(9)

Notation and Assumptions

Write the M equations of the system:

β11y1t+β12y2t+...+β1MyMt +γ11x1t+...+γ1kxkt = ε1t

...

βM 1y1t+βM 2y2t+...+βMMyMt+γM 1x1t+...+γMkxkt = εMt. De…ne the parameter matrices:

B0 = 2

4 β11 ... β1M ... ... ...

βM 1 ... βMM 3

5 and Γ0 = 2

4 γ11 ... γ1k ... ... ...

γM 1 ... γMk 3 5 .

Write the system in a compact notation:

B0yt+Γ0xt =εt

(10)

Notation and Assumptions

Or even better:

YB+XΓ=E

which is often refe¤ed to as the structural form, and the parameters of B and Γ are called the structural parameters.

In practice, not all y’s and x’s will appear in each equation!many β0s and γ0s will be known a priori to be zero. We will see later that this is in fact crucial: if it is not so, the structural parameters cannot be estimated at all.

(11)

Aside: SUR model

If B was a diagonal matrix, each equation would contain only one endogenous variable and then we have the Seemingly Unrelated Regressions Model.

Equations are only linked through their disturbances.

Consistent estimation: OLS, consistent and e¢ cient: GLS.

(12)

Going back to market equilibrium

We have:

Qt βPt α = ε1t

Qt γPt δ+φCt = ε2t. Thus,

yt = Qt

Pt , xt = 1 Ct

B0 = 1 β

1 γ0 = α 0

δ φ .

(13)

Assumptions on B matrix

If B is upper triangular then the system is said to be triangular and the joint determination of the variables in the model is recursive.

The solution of the system of equations determining y in terms of x and error is the reduced form:

yt0xt +ut, where ut = (B0) 1εt andΠ= ΓB 1.

Completeness condition: B is nonsingular. Note that this is an assumption on the unknown matrix B, so it cannot be tested empirically.

In our example, this assumption holds only if β γ6=0, or β6=γ (as β is the slope of the demand curve and γis the slope of the supply curve, it’s pretty obvious that there would be a problem if they both had the same slope: they wouldn’t intersect!)

(14)

Reduced form

The parameters inΠ are called the reduced form parameters.

It is extremely important to keep clearly in mind that it is these reduced form equations that are by assumption the equations generating the observed data. Thus, the data can only inform us about the reduced form parameters and the properties of the vector of transformed errors.

In our example:

Π=? and ut =? The reduced form equations read:

Qt = π11+π12Ct +u1t Pt = π21+π22Ct +u2t.

(15)

Reduced form in matrix notation

Given that the structural form is YB+XΓ=E , the reduced form is:

Y = XΓB 1+EB 1

= XΠ+U

which are exactly in the form of a multivariate linear model, except from the fact that the matrix of coe¤cients Π is a fairly complicated function of the underlying matrices of structural parameters.

(16)

Disturbances

Assume that the structural disturbances are randomly drawn from an M-variate distribution with

E[εt] =0 and E[εtε0t] =Σ.

The reduced form disturbances have

E[ut] =0 and E[utut0] = (B0) 1ΣB 1 Ω.

(17)

Identi…cation

We have observed above that, since the observed data are generated by the reduced form equations, only the parameters involved in those equations„ namely

Π= ΓB 1 and Ω= (B0) 1ΣB 1 can be learned from the data.

Question: if we learn the values of Π and Ω, can we deduce from these the values of the structural parameters? This is a question of identi…cation.

(18)

Identi…cation

Consider two di¤erent structures characterized by parameters (B111) and (B222) for which B2 =B1P, Γ2 =Γ1P and Σ2 =P0Σ1P. It can be shown that two structures related in this way have exactly the same reduced forms, and hence are indistinguishable empirically, no matter how much data were available!

The converse can also be shown to hold: two structures that have identical reduced forms must be related in a manner described above.

It follows that if we are to be able to deduce (B,Γ, Σ) from knowledge of (Π, Ω), i.e. from the data, we must have enough information about the structure- that we know to be true a priori- to be able to rule out the possibility that both structures (B111) and (B222) satisfy the restrictions we know hold.

We shall see soon that this means that we must know quite a lot about B and Γ before any statistical analysis of the model can begin.

(19)

Identi…cation

The information required to ensure that the system of structural equations is identi…ed can take a number of forms, but by far the most common is:

Each equation in the system contains only some of the endogenous and exogenous variables, or in other words, some endogenous variables, and some exogenous variables are known not to appear in each equation.

In that case, the matrices B and Γ will contain many elements that are known to be zero from the beginning.

(20)

Single Equation Analysis

Consider a single equation, say the …rst, of the structural model and assume that only some (say n+1) of the M endogenous variables, and only some (say k1) of the k exogenous variables appera in that equation. Partition the data accordingly:

Y = [Y1 Y2], X = [X1 X2]

such that Y1 and X1 are included in the …rst equation and Y2 and X2

are excluded.

The …rst structural equation then reads:

Y1βM+X1γ1 =ε1

with the parameters βM and γ1 attached to the variables remaining in the equation.

The part of the reduced form which applies to the endogenous variables appearing in the …rst equation has the form:

Y1 =X1Π11+X2Π12+U1.

(21)

Single Equation Analysis

The reduced form equation has generated the …rst structural equation.

Note that X2 does not appear in the structural equation, but it appears in the reduced form equation. This will prove to be crucial.

We have to establish the connections between structural and reduced form parameters.

Multiply the reduced form by the vector βM:

Y1βM=X1Π11βM+X2Π12βM+U1βM. Compare with structural:

Y1βM =ε1 X1γ1.

(22)

Single Equation Analysis

It must be true that:

Π11βM = γ1 Π12βM = 0

ε1 = U1βM.

The …rst equation is just a de…nition. The second condition is the key!

(23)

Identi…cation cont.

Suppose for the moment that Π11 andΠ12 were known. Can we learn about the structural parameters from these reduced forms?

Only under certain conditions on Π12 :

if rank(Π12) =n the equationsΠ12βM=0 have a solution for βM that is unique up to multiplication by a constant. In this case we can learn βM fromΠ12: the equation is identi…ed. The condition

rank(Π12) =n is called the rank condition for identi…cation. Note that it involves the unknown matrix.

if rank(Π12) <n the equationsΠ12βM=0 have a solution for βM, but in fact there would be many solutions. In this case the equation is unidenti…ed.

Note that the matrix Π12 is k2 (n+1), so the rank condition cannot possibly hold unless k2 n. This condition is a necessary (but not su¢ cient) condition for identi…cation and it’s known as an order condition.

(24)

Identi…cation cont.

The order condition basically says that the number of exogenous variables excluded from the equation must be at least as large as the number of endogenous variables that still appear in the equation minus 1.

Note that this condition is always satis…ed if there is only one

endogenous variable in the equation, as there is for instance in a SUR model.

(25)

Exercise

Consider a system of equations:

Si = β0+β1Ai +εSi sales equation

Ai = γ1+γ2Si+γ3SZi +εAi advert. equation and assume that E[εSi ] =E[εAi ] =0 and Cov[εSi , εAi ] =0.

Find the reduced form equations Check whether the system is identi…ed.

(26)

Aside: partial identi…cation

The class of partially identi…ed models contains two types of models:

models that are identi…ed in some parts, but unidenti…ed in others.

models in which the parameter of interest is not uniquely determined by the distribution of the observed data.

(27)

Aside: partial identi…cation

It is clear that in applied econometrics data alone is not su¢ cient to deduct meaningful conclusions about the population of interest.

Inference typically requires making assumptions on the population behaviour and the data generating process. Therefore, researchers combine available data with the set of assumptions to yield

conclusions. Until the late 1980’s, parameters were only considered to be either point identi…ed or not identi…ed at all. Point identi…cation was typically achieved by using assumptions, which were strong enough to identify the exact value of the parameters of interest.

However, by imposing weaker and more credible restrictions,

researchers are able to partially identify some features of the model.

Hence, the partial identi…cation approach states that even if the model cannot point identify parameters, it frequently contains some relevant message, which enables researchers to bound parameters in informative ways.

(28)

Normalisation

The key equations linking the structural equation with the reduced form have (if the rank condition is satis…ed) a solution for βM that is unique only up to multiplication by a constant (that is, if βM satis…es Π12βM =0, so does αβM for any α).

In order to determine βM completely, we need to impose some (arbitrary) normalisation rule. The most common rule is to set one element of βM (say the …rst) equal to 1, leaving only n elements unknown. We set

βM= 1 β .

(29)

Normalisation

Under this normalisation we can write Y1 = (y1, Y11), so that Y1βM =y1 Y11β,

which implies that the …rst structural equation might be written as y1 =Y11β+X1γ1+ε1,

which clearly looks like a standard linear regression model.

The …rst question one might ask: why not apply OLS on this normalised equation?

(30)

OLS

Applying OLS gives directly:

OLS = (Y110 MX1Y11) 1Y110 MX1y1 1 = (X10X1) 1X10(y1 Y11)

which can clearly be calculated, but are both biased and inconsistent.

They converge in probability to values that di¤er from the true values by a term that depends on the covariance between Y11 and ε1. Try to apply the OLS to the sales equation from the

advertisement/sales exercise.

(31)

2SLS

To avoid inconsistency (but not the bias) use 2SLS.

The idea is to replace the Y11 in the formula for OLS by its least squares estimate bY11 =X(X0X) 1X0Y11.

Try to use the …rm size as an instrument of Ai in the sales equation.

(32)

Limited Information Maximum Likelihood

The most obvious method of estimation would be the maximum likelihood.

The maximum likelihood for the reduced form is complicated, but obtainable. The log likelihood function is maximized subject to the rank restriction rank(Π12) n.

This method is called the limited MLE as they incorporate the complete speci…cation details for only one equation in the system.

The LIML estimators are again not unbiased (in fact, their mean does not exist), but they are consistent. Many studies have shown that from a number of di¤erent points of view, the LIML estimates are the most acurate in this model (better than 2SLS).

(33)

Full Information Maximum Likelihood

Not single equation approach, but a method for the whole system.

Advantages: consistent and asymptotically e¢ cient

Disadvantages: cannot be done analytically, not feasible for very large systems, di¢ cult to verify whether the solution is a global maximum, very sensitive to speci…cation.

(34)

Aside: Nonparametric IV, discrete case

Nonparametric additive error model

Y = h(X) +ε E[εjZ = zj] =0, 8j

where we have i .i .d . data (xis, yis, zis)on (X , Y , Z), and Y is a continuous scalar dependent variable,

X is a single discrete regressor with supportfxk, k =1, ..., Kg, that may be endogenous, with associated probabilities pk >0,

Z is a discrete instrumental variable with support fzj, j =1, ..., Jg, with associated probabilities qj >0.

() 09.03.2017 34 / 43

(35)

Hypothesis of interest

Null hypothesis (exogeneity):

H0 : E[εjX =xk] =0, k =1, .., K .

Under the null, h( ) can be consistently estimated using standard nonparametric techniques.

Under the alternative, the IV solution to endogeneity only possible under point identi…cation.

() 09.03.2017 35 / 43

(36)

Identi…cation

Since

Y =

K k=1

h(xk)I(X =xk) +ε, the conditional expectation of Y given Z =zj is

E[YjZ =zj] =

K k=1

Pr[X =xk, Z =zj]h(xk).

) the instrument Z supplies the equations π =Πβ,

where βk =h(xk), πj =E[YjZ =zj],Πjk =P[X =xkjZ =zj]. h( )is identi…ed at ALL support points of X i¤ J K .

() 09.03.2017 36 / 43

(37)

Identi…cation when J < K

h( )is partially identi…ed when J <K :

Let L(β) =c0β be a linear functional of the elements of β. When rank(Π) =J<K , the following are true:

(1) for any c orthogonal to the null space ofΠ, L(β)is point-identi…ed; the dimension of this set is J.

(2) for c not orthogonal to the null space ofΠ, L(β)is completely unconstrained; the dimension of this set is K J.

That is: when J <K , some linear functionals are point-identi…ed, some completely arbitrary (not even set-identi…ed!).

Point-identi…ability of L(β)can be tested (for a given choice of c):

Gn =n c20 c10Πˆ11Πˆ2 VPˆ 1 c20 c10Πˆ11Πˆ2 0

!d χ2K J.

() 09.03.2017 37 / 43

(38)

Linear Model Setup

We de…ne the (0, 1) matrix LX (n K) with elements (LX)ik =I(xis =xk). Likewise LZ (n J). Then, H0 says:

y =LXβ+ε, E[εjX =xk] =08k.

β can be consistently estimated by OLS under exogeneity:

= (LX0 LX) 1L0Xy = 0 BB

@

ni=1yiI(xis=x1)

ni=1I(xis=x1)

...

ni=1yiI(xis=xK)

ni=1I(xis=xK)

1 CC A .

If X is exogenous then the nonparametric (OLS) estimator bβ is consistent and

pn bβ β !d N 0, σ2DX1 ,

where DX is diag(pk).

() 09.03.2017 38 / 43

(39)

Linear Model Setup

or by IV (with LZ as instruments) under endogeneity, when J K : IV = L0XPLZLX

1L0XPLZy .

Under assumptions above, the IV estimator bβIV is consistent and pn bβIV β !d N 0, σ2 P0DZ1P 1 ,

where P is a matrix of joint probabilities with elements pjk =Pr[Z =zj, X =xk]; j =1, ..., J; k =1, ..., K and DZ is diag(qj).

BUT no consistent estimator exists for K J linear functionals if X is endogenous and J <K .

() 09.03.2017 39 / 43

(40)

Test for exogeneity

Test statistics di¤er depending on whether J K or J <K . For J K , (a modi…ed version of) Wu-Hausman test:

Under H0, and the assumptions above, Tn !d χ2K 1.

Under the sequence of local alternatives, and the assumptions above, the test statistic

Tn !d Gamma(β, λ, θ),

with the shape parameter α= K21, the scale parameter θ =2σ2

σ2 and the noncentrality parameter λ=2, where

δ2= ξ

0Σ111ξ σ2 .

() 09.03.2017 40 / 43

(41)

Test for exogeneity

For J <K , the test is based on the two SSE:

Unrestricted: in the model y =LXβ+ε, i.e., y0MLXy , and Restricted: minimising the SSE in this model subject toπˆ =Πβ.ˆ Test statistic:

Rn = y0MLXLZ (L0ZPLXLZ) 1L0ZMLXy n 1y0MLXy .

() 09.03.2017 41 / 43

(42)

Test for exogeneity

Theorem 1

Under H0 and the assumptions above,

Rn !d z0 1z

J 1 j

=1

ωjχ2j(1) where z N(0,Σ), with Σ as de…ned above,

Ω :=CJ0(PDX1P0 pZpZ0 )CJ, and the ωj are positive eigenvalues satisfying

det[Σ ωΩ] =0

with the χ2j(1)variables independent copies of a χ21 random variable.

() 09.03.2017 42 / 43

(43)

Critical value computation

using consistent estimates of ωbj, simulate the distribution of

(jJ=11)ωbjχ2j(1)to get the appropriate 1 α quantiles,

simulate the quadratic form z0Ωb 1z, with z N(0, bΣ)and compute the quantiles,

approximate by the distribution of aχ2(v)+b, choosing (a, b, v) to match the …rst three cumulants.

() 09.03.2017 43 / 43

Cytaty

Powiązane dokumenty

The authors wish to thank the Editor for his valuable

(The Frey curves arising in their proof for Theorem 1 have semistable reduction at 3 and 5, and the Shimura–Taniyama conjecture for such elliptic curves had already been settled

The following lemma together with Lemma 6 of [5] enables us to obtain the Galois module structure for any abelian extension of a local field K as soon as we know this structure for

On the other hand, in the case n = 1, we use an exponential sum estima- tion of Huxley in order to update Kuba’s result [7], which relies on another result due to Huxley (cf3. We

This is a Ramsey-type question (see [3], [4]) for integers: when is it true that every partition is such that at least one of the parts has a certain property.. Our aim in this note

We present an example of application of the last result and we obtain another class of sets satisfying the Borsuk conjecture (Theorem 3)... be partitioned into

Both fossil and DNA evidence of speech adaptations in Neanderthals or earlier hominins, and archeological indications of symbolic behavior in Neanderthals,

The limit behaviour of functions of sums with random indices when {Xn, те &gt; 1} and {Nn, те &gt; 1} are not assumed to be independent, is given by the following theorem. Theorem