Panel Data Econometrics

(1)

Panel Data Econometrics

Katarzyna Bech

18.05.2017

() 18.05.2017 1 / 37

(2)

What is ’Panel Data’?

Tra¢ c fatality rate for 48 contiguous U.S. states for each of the seven years from 1982 to 1988:

year Alabama Arkansas ... Wyoming 1982 0.000213 0.00025 ... 0.000394 1983 0.000235 0.000227 ... 0.000335

... ... ... ... ...

1988 0.000249 0.000271 ... 0.000324

Two- dimensional: observations on di¤erent objects in di¤erent points in time.

Both time-series and cross sectional data can be treated as special cases of panels.

Interesting: time does not have to be the second dimension!

(3)

Couple of de…nitions

N- the number of (cross-sectional) objects (individuals, organizations, countries) in the sample

T - the number of time periods (years, quarters, months, days) in the sample i.e. the number of waves

If N =1 and T is large- time series.

If T =1 and N is large- cross section.

Panel data are those with N >1 and T >1.

Balanced panel: for each N we have exactly the same number of time observations T .

Short panel: if N >T . Long panel if T >N.

Micro panel: N >>T . Macro panel: N '^{T .}

(4)

Why ’Panel data’?

Consider an empirical application: what are the e¤ects of alcohol taxes and drunk driving laws on tra¢ c fatalities?

Panel data set let us control for unobserved variables that di¤er from one state to the next, but do not change overtime.

It also allows us to control for (unobserved) variables that vary through time, but do not vary across states.

Another advantage is increased precision in estimation, by pooling several time periods of data for each individual.

Panel data also gives ”more variability, less collinearity among variables, more degrees of freedom and more e¢ ciency”.

Better suited to study the dynamics of change.

(5)

Some well known examples of panel data sets:

The Panel Study of Income Dynamics (PSID): constructed by the Institute of Social Research (University of Michigan), collected from 1968 (each year), 500 families, socioeconomic and demographic variables

Survey of Income and Program Participation (SIPP): conducted by the Bureau of the Census of the US Department of Commerce, four times a year, individual level, economic conditions

The German Socio-Economic Panel (GESOEP): every year from 1984 to 2014, individual level

National Longitudinal Survey of Youth (NLSY): collected by the US Department of Labour, individual level, labour market activities Other: LFS, BHPS, CFPS

(6)

Problems solved by panels:

Labour supply: Ben-Porath (1973) observes that at a certain point in time, in a cohort of women, 50% may appear to be working. It is ambiguous whether this implies that, in this cohort one-half of women on average will be working or that the same one-half will be working in every period.

Production function: inability to separate economies of scale and technological change. Cross-sectional data only provide info about the former, time-series muddle the two e¤ects, with no prospect of separation, e.g. common to assume CRS in order to reveal the technical change. Greene (1983) uses panel of a large number of

…rms for several years and provides estimates for technological change and economies of scale.

(7)

Illustrative example: charitable giving

Data: 47 individuals over the period 1979-1988. From Frees (2004), Longitudinal and Panel Data Analysis and Applications in the Social Sciences, Cambridge University Press.

Variables:

Charity- sum of cash contributions Income- gross

Price- (1-marginal income tax rate) Age- dummy, 1-for individuals over 64 MS- dummy, 1-for married

DEPS- number of dependents

Model

C_it = β₀+β₁Age_it+β₂Income_it+β₃Pr ice_it+β₄DEPS_it+β₅MS_it+εit

Estimate by OLS

Requirements for unbiasedness and consistency same as for standard linear regression for large sample cross-section (standard

Gauss-Markov)

Important: consistency for N !∞, when T is …nite ("…xed T

asymptotics"). If T also!∞ then treat problem as multivariate time series.

Time series characteristics irrelevant (may be nonstationary).

As we omit the info on the structure of the sample, OLS not e¢ cient.

(14)

Option 3: Pooled OLS

Estimates

bβ₀ bβ₁ bβ₂ bβ₃ bβ₄ bβ₅

4.67 1.55 1.04 0.48 0.18 0.008

Endogeneity?

(15)

Pooled OLS

Heterogeneity bias

(16)

Option 4: Fixed e¤ect model

The basic framework for the discussion is the regression model of the form

y_it =αi+β⁰x_it +εit.

The individual e¤ect is αi constant over t and speci…c to the individual cross-section unit i . Unknown parameters to estimate.

It is also possible to allow the slopes to vary across i , but it inroduces methodological issues and complexity in calculations. We can go over it next week if you wish.

How to make this model operational?

(17)

Option 4: Fixed e¤ect model

For each individual we have:

yi =l αi +Xiβ+εi

where l is T 1 vector of ones.

Collecting all inviduals we have:

2 4

y1

...

y_N 3 5=

2 4

l ... 0 ... ... ...

0 ... l 3 5

2 4 α1

...

αN

3 5+

2 4

X1

...

X_N 3 5 β+

2 4 ε1

...

εN

3 5

or

y = [d1 ... dN X] ^α β +ε

(18)

Option 4: Fixed e¤ect model

Let D be NT N matrix

D= [d1 ... d_N]. Assembling all NT rows together gives:

y =Dα+X β+ε

referred to as the Least Squares Dummy Variable Model (LSDV).

(19)

Option 4: Fixed e¤ect model

Why …xed? Intercepts, although di¤erent across individuals, do not vary in time (time-invariant).

Model:

C_it = β_0i+β₁Age_it+β₂Income_it+β₃Pr ice_it+β₄DEPS_it+β₅MS_it+εit. β_0i controls for an unobserved heterogeneity. If this heterogeneity is correlated with other regressors, Polled OLS is biased (as this heterogeneity is omitted in the pooled model).

Practical issue: if individual characteristics are not enough time varying, FEM might not work.

(20)

Fixed e¤ect model: di¤erential intercept dummies

De…ne a dummy variable D1_i, which takes value 1, if i =1, 0 otherwise. Similarly D2_i, D3_i, ..., D47_i.

Then our model might be written as:

C_it = α1+α2D2_i+α3D3_i +...+α47D47_i

+β₁Ageit +β₂Incomeit+β₃Pr iceit +β₄DEPSit +β₅MSit+εit. Each individual intercept is then: β₀₁ =α1 and β_0i =α1+αi for

i 2. We have a classical model with K+ (N 1) variables.

Remember about dummy variable trap!

If we estimate by OLS, then we call them Least Squares Dummy Variable Estimators.

(21)

Fixed e¤ect model: di¤erential intercept dummies.

Problems

The choice of the reference group typically random- not interesting interpretation of α unless you model without intercept.

Every additional dummy costs you a degree of freedom.

Remember about the assumptions on the error term: εit (0, σ²). These may have to be modi…ed, e.g. you might assume constant variance, but also allow for heteroskedasticity and correct the standard errors, you may assume no serial correlation or allow for some AR structure in the error (and correct the standard errors), you might assume that at any time error term of one individual is not correlated with the errors of the other, or you might allow for such correlation (treat it as SURE model).

(22)

Fixed e¤ect model: time e¤ect

Similarly, we can have models in which we allow for an individual e¤ect, not for objects, but for waves (time):

Y_it = β_0t+β₁X_it +εit,

or even on both dimensions (two-way …xed e¤ect model):

Y_it =β_0it+β₁X_it+εit. The parameters might be estimated by LSDV in:

Y_it =α1+α2D2_i +...+αNDN_i+γ₂B2_i+...+αTBT_i+εit. If we wish to introduce di¤erential slope coe¢ cients by multiplying slope coe¢ cients with intercept dummies (in this model such operation consumes 230 degrees of freedom). If we additionally interact time dummies with …ve regressors (50 degrees of freedom), we have almost no observations left for meaningful conclusions.

(23)

Fixed e¤ect model: within transformation

We go back to the model:

y =Dα+X β+ε.

There is an easier way to estimate its parameters than to go through LSDV.

Use the results for a partition regression and write the OLS of β as β= [X⁰M_dX] ¹[X⁰M_dy].

This amounts to a least squares regression using the transformed data:

X =MdX and y =Mdy .

(24)

Fixed e¤ect model: within transformation

The structure of D is particularly convenient (its columns are orthogonal), so

Md = 2 4

M⁰ ... 0 ... ... ...

0 ... M⁰ 3 5

where M⁰ =I_T _T¹ll⁰.

Premultiplying any T 1 vector z_i by M⁰ gives M⁰z_i =z_i zl

where the mean is taken over T observations for unit i .

This implies that the regression of Mdy on MdX is equivalent to the regression of y_it y_i on x_it x_i.

(25)

Fixed e¤ect model: within transformation

We centre the observations around their means (calculated over time):

C_it C_i = β₁(Age_it Age_i) +β₂(Income_it Income_i) + β₃(Pr iceit Pr icei) +β₄(DEPSit DEPSi) + β₅(MSit MSi) +uit

"within", as we consider changes in individual characteristics in time, but not across individuals (each period- deviation from the mean).

Estimators show the impact of the changes in individual’s characteristics in time.

Both estimators LSDV and WG are identical, as matematically both models are exactly the same.

(26)

Fixed e¤ect model: within transformation

We pay the price for the simplicity of this approach:

model needs to be estimated without the intercept

all regressors which for each individual do not vary in time needs to be removed (or they will be automatically eliminated)

loosing the intercept is ok, but loosing important explanatory variables may lead to endogeneity and omitted variable bias.

Z metody within (z oczywistych wzgl ¾edów) nie mo·zemy stosowa´c, gdy chcemy bada´c np. dyskryminacj ¾e p÷acow ¾a

Within group estimators are consistent, but not e¢ cient (as they express variables as their deviations from the mean, the variability of that di¤erence will be relatively smaller than in the original data, which means that the variability of the error term will be relatively higher, which leads to higher variance estimates).

(27)

Aside: between transformation

Alternatively (for pooled OLS) we may present the model for averages (across time) for di¤erent individuals, i.e. we estimate

Ci = β₀+β₁Age_i+β₂Incomei+β₃Pr icei+β₄DEPSi+β₅MSi+ui

"between", as we consider changes of ’average’characteristics between individuals.

Use if you want to measure e.g. gwg or impact of any characteristics that are time-invariant.

Problem: we loose observations (only N left)

i

T_i(x_i x)² Total variation = within variation + between variation

(29)

Fixed e¤ect model: better than pooled OLS?

H0 : α1 =αj for all j 2 ((N 1)restrictions)

Unrestricted model: FE; Restricted model: pooled OLS

F = (RRSS URSS)/m

URSS /[n (k+1) m] ^F^m,n ⁽^k⁺¹⁾ ^m where m=N 1, and [n (k+1) m] =N(T 1) k.

(30)

Fixed e¤ect model: di¤ in di¤ (FD- First Di¤erence)

LSDV and WG estimators are not the only ways of dealing with FE models.

If the model is true in time t

Y_it =β_0i +β₁X_it+εit, it is also true in time t 1

Yit 1 =β_0i +β₁Xit 1+εit 1. Substracting one from another we get:

4^Y^it = β₁4^X^it+u_it,

where uit = 4^ε^it. Due to this transformation we eliminate all variables which are time-invariant (including individual β_0i)(…rst

"di¤"). Additionally if the model has a linear trend (t as a regressor), it also gets eliminated (second "di¤").

Unfortunately for T >2 the error term will be correlated!^OLS provides consistent but ine¢ cient estimates.

(31)

FE or FD?

If N is large, and T small then FE more e¢ cient if we have autocorrelation, FD better for nonstationary series.

If T is large, and N small then

we prefer FD to model processes with strong positive correlation in the error term (AR(1) parameter close to 1, as FD eliminates

nonstationarity problem)

FE more sensitive to the lack of normality, heteroscedasticity and autocorrelation in the error

but FE less sensitive for the endogeneity of regressors.

If T =2 the all three LSDV, WG and FD are exactly the same.

(32)

Option 5: Random e¤ect model

In the FE models we assume that the "individual speci…c" parameters β_0i are constant (time invariant) for each individual i - ok if we believe that di¤erences between units can be viewed as parametric shifts of a regression function.

In the Random E¤ect Model we assume that β_0i is a random variable with mean β₀ (o index i ), which means that the intercept for each individual is

β_0i = β₀+u_i, where u_i is (0, σ²_u).

In our example this means that 47 individuals were randomly drawn form a large population of individuals with a constant expected value of the intercept. Di¤erences across individuals are therefore expressed by the error component ui.

(33)

Random e¤ect model: General formulation

Model:

y_it =α+β⁰x_it +u_i+εit

where u_i is constant through time.

Standard assumptions are:

u_i (0, σ²_u) εit (0, σ²_ε)

E[uiεit] = 0, E[uiuj] =0(i 6=^j)

E[εitεis] = E[εitεij] =E[εitεjs] =0(i 6=^{j, t} 6=^s).

(34)

Random e¤ect model: General formulation

De…ne

w_it =u_i+εit

and

wi = [wi 1, wi 2, ..., wiT]⁰. Given the assumption listed above:

E[w_it] = 0 E[w_it²] = σ²_u +σ²_ε E[w_itw_is] = σ²_u

(35)

Random e¤ect model: General formulation

De…ne

Ω = E[wiw_i⁰] = 2 66 4

σ²_u+σ²_ε σ²_u ... σ²_u σ²_u σ²_u+σ²_ε ... σ²_u

... ... ... ...

σ²_u σ²_u ... σ²_u+σ²_ε 3 77 5

= σ²_εI+σ²_ull⁰

Since observations i and j are independent, the disturbance covariance for the full NT observations is

V = 2

4 Ω ... 0 ... ... ...

0 ... Ω 3

5=I Ω.

Apply GLS.

(36)

Random e¤ect model

Our charity function might be expressed as:

Cit= β₀+β₁Age_it+β₂Incomeit+β₃Pr iceit+β₄DEPSit+β₅MSit+wit, where w_it =u_i+εit.

Composite error component: individual speci…c u_i and εit

"idiosyncratic term", varying across time and individuals.

RE also known as Error Components Model (ECM).

Super important: w_it should not be correlated with the regressors.

(37)

FE vs. RE

The practical choice of the model depends on the assumptions on the correlation between u_i and a set of regressors X. If corr(u_i, X) =0 the correct model is RE, if corr(ui, X) 6=0 the correct model is FE.

How to decide?

Hausman test: the null says, that there is no di¤erence between FE and RE. If we reject H₀ we state, that the RE is inappropriate as u_i is probably correlated with one or more regressors !we should choose FE model instead.

Alternatively, we can still stick to the RE model, but estimate its parameters by IV (panel IV- e.g. Hausman-Taylor Estimator or Arellano-Bond).

(38)

FE vs. RE practicalities

If T is large and N small (and standard assumption satis…ed) both models should deliver similar results (then the choice depends on computational complexity).

If N is large and T small and all assumptions of the RE model are satis…ed, RE more e¢ cient than FE.

RE might estimate the impact of time-invariant variables that

’disapear’in the FE.

If the true model is pooled: all estimators (FE, RE, pooled OLS) are consistent.

If the true model is FE: pooled OLS and RE are inconsistent.

If the true model is RE: FE are consistent (FE consistent always!).

(39)

Panel data Econometrics: further topics

Hypothesis veri…cation

Heteroskedasticity and Autocorrelation Unbalanced panels

Dynamic panel data models Multivariate panels

Limited dependent variable panels Nonstationarity in panel data