Panel Data Econometrics
Katarzyna Bech
18.05.2017
() 18.05.2017 1 / 37
What is ’Panel Data’?
Tra¢ c fatality rate for 48 contiguous U.S. states for each of the seven years from 1982 to 1988:
year Alabama Arkansas ... Wyoming 1982 0.000213 0.00025 ... 0.000394 1983 0.000235 0.000227 ... 0.000335
... ... ... ... ...
1988 0.000249 0.000271 ... 0.000324
Two- dimensional: observations on di¤erent objects in di¤erent points in time.
Both time-series and cross sectional data can be treated as special cases of panels.
Interesting: time does not have to be the second dimension!
Couple of de…nitions
N- the number of (cross-sectional) objects (individuals, organizations, countries) in the sample
T - the number of time periods (years, quarters, months, days) in the sample i.e. the number of waves
If N =1 and T is large- time series.
If T =1 and N is large- cross section.
Panel data are those with N >1 and T >1.
Balanced panel: for each N we have exactly the same number of time observations T .
Short panel: if N >T . Long panel if T >N.
Micro panel: N >>T . Macro panel: N 'T .
Why ’Panel data’?
Consider an empirical application: what are the e¤ects of alcohol taxes and drunk driving laws on tra¢ c fatalities?
Panel data set let us control for unobserved variables that di¤er from one state to the next, but do not change overtime.
It also allows us to control for (unobserved) variables that vary through time, but do not vary across states.
Another advantage is increased precision in estimation, by pooling several time periods of data for each individual.
Panel data also gives ”more variability, less collinearity among variables, more degrees of freedom and more e¢ ciency”.
Better suited to study the dynamics of change.
Some well known examples of panel data sets:
The Panel Study of Income Dynamics (PSID): constructed by the Institute of Social Research (University of Michigan), collected from 1968 (each year), 500 families, socioeconomic and demographic variables
Survey of Income and Program Participation (SIPP): conducted by the Bureau of the Census of the US Department of Commerce, four times a year, individual level, economic conditions
The German Socio-Economic Panel (GESOEP): every year from 1984 to 2014, individual level
National Longitudinal Survey of Youth (NLSY): collected by the US Department of Labour, individual level, labour market activities Other: LFS, BHPS, CFPS
Problems solved by panels:
Labour supply: Ben-Porath (1973) observes that at a certain point in time, in a cohort of women, 50% may appear to be working. It is ambiguous whether this implies that, in this cohort one-half of women on average will be working or that the same one-half will be working in every period.
Production function: inability to separate economies of scale and technological change. Cross-sectional data only provide info about the former, time-series muddle the two e¤ects, with no prospect of separation, e.g. common to assume CRS in order to reveal the technical change. Greene (1983) uses panel of a large number of
…rms for several years and provides estimates for technological change and economies of scale.
Illustrative example: charitable giving
Data: 47 individuals over the period 1979-1988. From Frees (2004), Longitudinal and Panel Data Analysis and Applications in the Social Sciences, Cambridge University Press.
Variables:
Charity- sum of cash contributions Income- gross
Price- (1-marginal income tax rate) Age- dummy, 1-for individuals over 64 MS- dummy, 1-for married
DEPS- number of dependents
From the Panel of Individual Tax Returns.
Goal: study the e¤ect (if any!) of the marginal tax rate on charitable giving. Prior expectations?
How to estimate the parameters of the charity function? 5 options
Individual time series of charity functions.
Cross-sectional charity functions.
Pooled OLS (constant coe¢ cient model) Fixed e¤ects model
Random e¤ects model
Option 1: Time series
Model
Ct = β0+β1Aget+β2Incomet+β3Pr icet+β4DEPSt+β5MSt+εt
Estimates
i bβ0 bβ1 bβ2 bβ3 bβ4 bβ5
1 35.14 2.70 2.32
2 7.95 0.18 0.05 1.16 0.57
...
47 13.75 1.75 0.40 0.13
Option 1: Time series
Model
Ct = β0+β1Aget+β2Incomet+β3Pr icet+β4DEPSt+β5MSt+εt
Estimates
i bβ0 bβ1 bβ2 bβ3 bβ4 bβ5
1 35.14 2.70 2.32
2 7.95 0.18 0.05 1.16 0.57
...
47 13.75 1.75 0.40 0.13
Option 2: Cross section
Model
Ci = β0+β1Agei+β2Incomei+β3Pr icei +β4DEPSi +β5MSi+εi
Estimates
t bβ0 bβ1 bβ2 bβ3 bβ4 bβ5
1 7.93 1.17 1.33 0.02 0.11 0.12
2 13.16 1.15 1.08 7.07 0.24 1.05
...
10 9.46 1.77 7.39 0.35 1.92
Option 2: Cross section
Model
Ci = β0+β1Agei+β2Incomei+β3Pr icei +β4DEPSi +β5MSi+εi
Estimates
t bβ0 bβ1 bβ2 bβ3 bβ4 bβ5
1 7.93 1.17 1.33 0.02 0.11 0.12
2 13.16 1.15 1.08 7.07 0.24 1.05
...
10 9.46 1.77 7.39 0.35 1.92
Option 3: Pooled OLS
Model
Cit = β0+β1Ageit+β2Incomeit+β3Pr iceit+β4DEPSit+β5MSit+εit
Estimate by OLS
Requirements for unbiasedness and consistency same as for standard linear regression for large sample cross-section (standard
Gauss-Markov)
Important: consistency for N !∞, when T is …nite ("…xed T
asymptotics"). If T also!∞ then treat problem as multivariate time series.
Time series characteristics irrelevant (may be nonstationary).
As we omit the info on the structure of the sample, OLS not e¢ cient.
Option 3: Pooled OLS
Estimates
bβ0 bβ1 bβ2 bβ3 bβ4 bβ5
4.67 1.55 1.04 0.48 0.18 0.008
Endogeneity?
Pooled OLS
Heterogeneity bias
Option 4: Fixed e¤ect model
The basic framework for the discussion is the regression model of the form
yit =αi+β0xit +εit.
The individual e¤ect is αi constant over t and speci…c to the individual cross-section unit i . Unknown parameters to estimate.
It is also possible to allow the slopes to vary across i , but it inroduces methodological issues and complexity in calculations. We can go over it next week if you wish.
How to make this model operational?
Option 4: Fixed e¤ect model
For each individual we have:
yi =l αi +Xiβ+εi
where l is T 1 vector of ones.
Collecting all inviduals we have:
2 4
y1
...
yN 3 5=
2 4
l ... 0 ... ... ...
0 ... l 3 5
2 4 α1
...
αN
3 5+
2 4
X1
...
XN 3 5 β+
2 4 ε1
...
εN
3 5
or
y = [d1 ... dN X] α β +ε
Option 4: Fixed e¤ect model
Let D be NT N matrix
D= [d1 ... dN]. Assembling all NT rows together gives:
y =Dα+X β+ε
referred to as the Least Squares Dummy Variable Model (LSDV).
Option 4: Fixed e¤ect model
Why …xed? Intercepts, although di¤erent across individuals, do not vary in time (time-invariant).
Model:
Cit = β0i+β1Ageit+β2Incomeit+β3Pr iceit+β4DEPSit+β5MSit+εit. β0i controls for an unobserved heterogeneity. If this heterogeneity is correlated with other regressors, Polled OLS is biased (as this heterogeneity is omitted in the pooled model).
Practical issue: if individual characteristics are not enough time varying, FEM might not work.
Fixed e¤ect model: di¤erential intercept dummies
De…ne a dummy variable D1i, which takes value 1, if i =1, 0 otherwise. Similarly D2i, D3i, ..., D47i.
Then our model might be written as:
Cit = α1+α2D2i+α3D3i +...+α47D47i
+β1Ageit +β2Incomeit+β3Pr iceit +β4DEPSit +β5MSit+εit. Each individual intercept is then: β01 =α1 and β0i =α1+αi for
i 2. We have a classical model with K+ (N 1) variables.
Remember about dummy variable trap!
If we estimate by OLS, then we call them Least Squares Dummy Variable Estimators.
Fixed e¤ect model: di¤erential intercept dummies.
Problems
The choice of the reference group typically random- not interesting interpretation of α unless you model without intercept.
Every additional dummy costs you a degree of freedom.
Remember about the assumptions on the error term: εit (0, σ2). These may have to be modi…ed, e.g. you might assume constant variance, but also allow for heteroskedasticity and correct the standard errors, you may assume no serial correlation or allow for some AR structure in the error (and correct the standard errors), you might assume that at any time error term of one individual is not correlated with the errors of the other, or you might allow for such correlation (treat it as SURE model).
Fixed e¤ect model: time e¤ect
Similarly, we can have models in which we allow for an individual e¤ect, not for objects, but for waves (time):
Yit = β0t+β1Xit +εit,
or even on both dimensions (two-way …xed e¤ect model):
Yit =β0it+β1Xit+εit. The parameters might be estimated by LSDV in:
Yit =α1+α2D2i +...+αNDNi+γ2B2i+...+αTBTi+εit. If we wish to introduce di¤erential slope coe¢ cients by multiplying slope coe¢ cients with intercept dummies (in this model such operation consumes 230 degrees of freedom). If we additionally interact time dummies with …ve regressors (50 degrees of freedom), we have almost no observations left for meaningful conclusions.
Fixed e¤ect model: within transformation
We go back to the model:
y =Dα+X β+ε.
There is an easier way to estimate its parameters than to go through LSDV.
Use the results for a partition regression and write the OLS of β as β= [X0MdX] 1[X0Mdy].
This amounts to a least squares regression using the transformed data:
X =MdX and y =Mdy .
Fixed e¤ect model: within transformation
The structure of D is particularly convenient (its columns are orthogonal), so
Md = 2 4
M0 ... 0 ... ... ...
0 ... M0 3 5
where M0 =IT T1ll0.
Premultiplying any T 1 vector zi by M0 gives M0zi =zi zl
where the mean is taken over T observations for unit i .
This implies that the regression of Mdy on MdX is equivalent to the regression of yit yi on xit xi.
Fixed e¤ect model: within transformation
We centre the observations around their means (calculated over time):
Cit Ci = β1(Ageit Agei) +β2(Incomeit Incomei) + β3(Pr iceit Pr icei) +β4(DEPSit DEPSi) + β5(MSit MSi) +uit
"within", as we consider changes in individual characteristics in time, but not across individuals (each period- deviation from the mean).
Estimators show the impact of the changes in individual’s characteristics in time.
Both estimators LSDV and WG are identical, as matematically both models are exactly the same.
Fixed e¤ect model: within transformation
We pay the price for the simplicity of this approach:
model needs to be estimated without the intercept
all regressors which for each individual do not vary in time needs to be removed (or they will be automatically eliminated)
loosing the intercept is ok, but loosing important explanatory variables may lead to endogeneity and omitted variable bias.
Z metody within (z oczywistych wzgl ¾edów) nie mo·zemy stosowa´c, gdy chcemy bada´c np. dyskryminacj ¾e p÷acow ¾a
Within group estimators are consistent, but not e¢ cient (as they express variables as their deviations from the mean, the variability of that di¤erence will be relatively smaller than in the original data, which means that the variability of the error term will be relatively higher, which leads to higher variance estimates).
Aside: between transformation
Alternatively (for pooled OLS) we may present the model for averages (across time) for di¤erent individuals, i.e. we estimate
Ci = β0+β1Agei+β2Incomei+β3Pr icei+β4DEPSi+β5MSi+ui
"between", as we consider changes of ’average’characteristics between individuals.
Use if you want to measure e.g. gwg or impact of any characteristics that are time-invariant.
Problem: we loose observations (only N left)
Useful variance analysis
Variance (total variation) is quanti…ed by the fact that each individual is di¤erent from the average calculated on both dimensions :T and N.
We might decompose the variance into a part coming from changes in time (within) and a part coming from changes between individuals (between):
∑
i∑
t
(xit x)2 =
∑
i
∑
t
(xit xi)2 +
∑
i
Ti(xi x)2 Total variation = within variation + between variation
Fixed e¤ect model: better than pooled OLS?
H0 : α1 =αj for all j 2 ((N 1)restrictions)
Unrestricted model: FE; Restricted model: pooled OLS
F = (RRSS URSS)/m
URSS /[n (k+1) m] Fm,n (k+1) m where m=N 1, and [n (k+1) m] =N(T 1) k.
Fixed e¤ect model: di¤ in di¤ (FD- First Di¤erence)
LSDV and WG estimators are not the only ways of dealing with FE models.
If the model is true in time t
Yit =β0i +β1Xit+εit, it is also true in time t 1
Yit 1 =β0i +β1Xit 1+εit 1. Substracting one from another we get:
4Yit = β14Xit+uit,
where uit = 4εit. Due to this transformation we eliminate all variables which are time-invariant (including individual β0i)(…rst
"di¤"). Additionally if the model has a linear trend (t as a regressor), it also gets eliminated (second "di¤").
Unfortunately for T >2 the error term will be correlated!OLS provides consistent but ine¢ cient estimates.
FE or FD?
If N is large, and T small then FE more e¢ cient if we have autocorrelation, FD better for nonstationary series.
If T is large, and N small then
we prefer FD to model processes with strong positive correlation in the error term (AR(1) parameter close to 1, as FD eliminates
nonstationarity problem)
FE more sensitive to the lack of normality, heteroscedasticity and autocorrelation in the error
but FE less sensitive for the endogeneity of regressors.
If T =2 the all three LSDV, WG and FD are exactly the same.
Option 5: Random e¤ect model
In the FE models we assume that the "individual speci…c" parameters β0i are constant (time invariant) for each individual i - ok if we believe that di¤erences between units can be viewed as parametric shifts of a regression function.
In the Random E¤ect Model we assume that β0i is a random variable with mean β0 (o index i ), which means that the intercept for each individual is
β0i = β0+ui, where ui is (0, σ2u).
In our example this means that 47 individuals were randomly drawn form a large population of individuals with a constant expected value of the intercept. Di¤erences across individuals are therefore expressed by the error component ui.
Random e¤ect model: General formulation
Model:
yit =α+β0xit +ui+εit
where ui is constant through time.
Standard assumptions are:
ui (0, σ2u) εit (0, σ2ε)
E[uiεit] = 0, E[uiuj] =0(i 6=j)
E[εitεis] = E[εitεij] =E[εitεjs] =0(i 6=j, t 6=s).
Random e¤ect model: General formulation
De…ne
wit =ui+εit
and
wi = [wi 1, wi 2, ..., wiT]0. Given the assumption listed above:
E[wit] = 0 E[wit2] = σ2u +σ2ε E[witwis] = σ2u
Random e¤ect model: General formulation
De…ne
Ω = E[wiwi0] = 2 66 4
σ2u+σ2ε σ2u ... σ2u σ2u σ2u+σ2ε ... σ2u
... ... ... ...
σ2u σ2u ... σ2u+σ2ε 3 77 5
= σ2εI+σ2ull0
Since observations i and j are independent, the disturbance covariance for the full NT observations is
V = 2
4 Ω ... 0 ... ... ...
0 ... Ω 3
5=I Ω.
Apply GLS.
Random e¤ect model
Our charity function might be expressed as:
Cit= β0+β1Ageit+β2Incomeit+β3Pr iceit+β4DEPSit+β5MSit+wit, where wit =ui+εit.
Composite error component: individual speci…c ui and εit
"idiosyncratic term", varying across time and individuals.
RE also known as Error Components Model (ECM).
Super important: wit should not be correlated with the regressors.
FE vs. RE
The practical choice of the model depends on the assumptions on the correlation between ui and a set of regressors X. If corr(ui, X) =0 the correct model is RE, if corr(ui, X) 6=0 the correct model is FE.
How to decide?
Hausman test: the null says, that there is no di¤erence between FE and RE. If we reject H0 we state, that the RE is inappropriate as ui is probably correlated with one or more regressors !we should choose FE model instead.
Alternatively, we can still stick to the RE model, but estimate its parameters by IV (panel IV- e.g. Hausman-Taylor Estimator or Arellano-Bond).
FE vs. RE practicalities
If T is large and N small (and standard assumption satis…ed) both models should deliver similar results (then the choice depends on computational complexity).
If N is large and T small and all assumptions of the RE model are satis…ed, RE more e¢ cient than FE.
RE might estimate the impact of time-invariant variables that
’disapear’in the FE.
If the true model is pooled: all estimators (FE, RE, pooled OLS) are consistent.
If the true model is FE: pooled OLS and RE are inconsistent.
If the true model is RE: FE are consistent (FE consistent always!).
Panel data Econometrics: further topics
Hypothesis veri…cation
Heteroskedasticity and Autocorrelation Unbalanced panels
Dynamic panel data models Multivariate panels
Limited dependent variable panels Nonstationarity in panel data