Advanced Econometrics Topic 6: Bayesian inference
Michał Rubaszek
SGH Warsaw School of Economics
Themes
A. Introduction to Bayesian inference B. Bayesian regression
C. Bayesian model averaging
Theme A.
Introduction to Bayesian inference
Bayes theorem
For events A and B the Bayes theorem is:
Explanation:
,
Bayes theorem in econometrics
For parameters and data the Bayes theorem implies:
- prior pdf of parameters - probability of data given - posteriori pdf
- marginal likelihood of data (does not depend on )
To derive posterior of we substitute by likelihood | :
∝
Bayes theorem in econometrics
In a Bayesian framework, the parameters are considered as random
variables, whereas in a frequentist approach is assumed to be a constant The distribution of before observing the data is called prior distribution and is denoted by .
The distribution of after observing the data is called posterior distribution and is denoted by | .
Bayes rule in econometrics: illustration
∝
Conjugate prior
∝
In some class of models the posterior distribution is in the same
family as the prior distribution. In this case we say about conjgate prior
Example:
Beta distribution Binomial distribution Beta distribution
Conjugate prior: Beta + Binomial
∼ "#$ %, &
' ()*(
+$, ()* -(*()*).
Γ % 0 &
Γ % Γ & (1. 1 3 *1.
0 5 5 1
Linkto beta distribution in Wikipedia
Conjugate prior: example
Example:
Two students (A and B) like to play chess. They have already played 6 Times and student A won 7 times (and lost 6 3 7 . Let be the parameter that describes the probability of student A success.
Prior: 8 (9)*9
8 (9 8 *9 (91. 1 3 *91. ∼ "#$ %:, &:
Likelihood: | 6
7 ; 1 3 <1; | ∼ 6,
Posterior: | ∝ (9); 1. 1 3 *9)<1; 1. | ∼ "#$ %., &.
%. %: 0 7 ; &. &: 0 6 3 7
| 8 (8 (>)*>
> 8 *> (>1. 1 3 *>1.
Notice: in formula for | we omitted 8 (8 (9)*9
9 8 *9 and 6
7 . Why?
Metropolis-Hastings
Markov chain Monte Carlo (MCMC) algorithm
In most cases we can calculate and | but don't know the analytical formula for |
We resort to numerical methods, e.g. Metropolis Hastings MCMC 1. Set the initial value of parameter @ for A 0
2. Draw ∗ @ 0 CD, where D ∼ 6 0, Σ and C is a step length 3. Draw F ∼ G :,.
4. Calculate H ∗ / @ and compare it to F
If H 5 F then @). @
If H J F then @). ∗
5. Repeat steps 2-4 6K@L times
6. Using the sample @ for A 6MNOP 0 1, … , 6K@L calculate descriptive statistics for |
Exercise 1
I. Sample | ∼ 6, from binomial distribution.
Set 6 25 and 0.5
II. Assume the prior to ∼ "#$ %:, &: . Set %: 4 and &: 6
III. Calculate the posteriori distribution parameters %. and &..
Calculate posteriori mean and standard deviation with analytical method
IV. Perform MH MCMC simulations and calculate posteriori mean and standard deviation. Compare the values to the results from IV
V. Plot the posteriori density for using two methods:
analytical
numerical (MCMC)
Exercise 2
I. Sample V| ∼ 6 , 1W from normal distribution.
Set 6 10 and 2
Notice: our model is @ 0 D@, D@ ∼ 6 0,1W
II. Assume the prior to ∼ 6 :, X: . Set : 1 and X: 0.5W
III. Perform MH MCMC simulations and calculate posteriori mean and standard deviation for
Notice: the loglikelihood for normal distribution model is:
Y 2Z 1<W exp 3 3 \< ′ 3 \<
2 Here ^ . W … <_′ and \< ^1 1 … 1_′
Theme B. Linear Bayesian model
Bayesian regression
Consider a linear model:
V `Va& 0 DV, DV ∼ 6 0, X
Likelihood:
&, X , ` 2ZX 1:.bc exp 3 d1e* a d1e*
Wf
where g `.a, `Wa,..., `ca ′ and h ., W, … , c ′
ML estimates:
&i gag 1.ga
Xj k1e*l nm k1e*l , where o p 3 7
Vector of parameters:
&, X
Bayesian regression: known variance
Linear model:
g& 0 D, D ∼ 6 0, X
Likelihood: & 2ZX 1:.bc exp 3 d1e* a d1e*
Wf
Prior: & ∝ exp 30.5 & 3 & a Ω1. & 3 &
& ∼ 6 &, Ω
Posterior: &| ∝ exp 3.W & 3 &̅ aΩv1. & 3 &̅
&| ∼ 6 &̅, Ωv
Ωv Ω1. 0 h1.gag 1. Ω1. 0 Ωl1w 1.
&̅ Ωx Ω1.& 0 X1.g′ = Ωv Ω1.& 0 Ωl1w&i
Example
Let us consider a model:
Ayz 1, Zyz & 0 D
Prior:
& ∼ 6 01.5 , 10W 0 0 0.1W
Posterior mean:
ML Prior Posterior const 2.58 0.00 1.50 infEA 0.71 1.50 0.99
Bayesian regression: random variance
Linear model:
g& 0 D, D ∼ 6 0, X
Likelihood: &, X 2ZX 1:.bc exp 3 d1e* a d1e*
Wf
Prior: X ∼ {| o, X
&|X ∼ 6 &, XΩ
&, X ∼ 6{| &, Ω, o, X Normal Inverse Gamma distribution
Posterior: X| ∼ {| o̅, Xx
&|X, ∼ 6 &̅, XΩv
&, X| ∼ 6{| &̅, Ωv, o̅, Xx
Ωv Ω1. 0 gag 1. &̅ Ωv Ω1.& 0 g′
o̅ o 0 o Xx Xo 0 Xj} 0 & 3 &i Ωv & 3 &i
Conjugate prior: Normal Gamma
Inverse gamma
g ∼ {| %, &
' g (1.*
+$, g (). *--(1W -
g 8 (*~ g(1.exp 3*e g J 0
Link to inverse gamma distr. in Wikipedia
Gibbs sampling
In the above example we know that X| ∼ {| o̅, Xx and &|X, ∼ 6 &̅, XΩv But we don't know what is the marginal distribution &| .*
To derive it we can use Gibbs-Sampler*
1. Draw X @ from {| o̅, Xx 2. Draw & @ from 6 &̅, XΩv
3. Repeat setps 1-2 6K@L times
4. Using the sample & @ for A 1, … , 6K@L calculate descriptive statistics for &|
* In this case we can derive that the marginal distribution is t-Student
Exercises
I. Estimate a model in which interest rate in a given country depends on:
inflation
GDP growth rate
exchange rate depreciation
II. Set the prior centered at ^0 1.5 0.5 0_ with standard deviation 10 0.1 0.1 0.1 III. Derive the posterior distribution
IV. Make a plot (prior/LL/posteriori) for each parameter
V. Repeat the above using the Normal-Gamma prior and compare the posteriori mean with the values from point III
Theme C. Bayesian model averaging
Bayes rule in econometrics
Let us consider a model for with 7 potential regressors from a set:
g `., `W, … `;
There are 2; different subsets g„ ∈ g, hence 2; potential models †„:
%„ 0 g„&„ 0 D
Which specification †„ should be selected? This problem is especially difficult for large 7!
Bayesian inference helps to tackle this problem
BMA – Bayesian model averaging / BMS – Bayesian model selection
Bayes theorem in model selection
The Bayes theorem
†„ †„ †„
where
ˆ †„ †„
W‰
„Š.
†„ - posteriori probability of model †„
†„ - prior probability of model †„
†„ - marginal likelihood model †„
We need a method to calculate |†„ and choose †„
Marginal likelihood
For model †„:
%„ 0 g„&„ 0 D, D ∼ 6 0, X„
the marginal likelihood is
†„ ‹ „ „ Œ „
where „ %„, &„, X„
Marginal likelihood, Zellner g-prior
Zellner g-prior
%„ ∝ 1 X„ ∝ X„1.
&„|• ∼ 6 0, •X„ g„ag„ 1.
Posterior distribution
' &„ .)ŽŽ &i• , where &i„ is ML estimate of the parameters
For g → 0 posterior is equal to prior
For g = 1 posteriori puts equal weight to prior and likelihood,
For g = T prior has the equivalent weight of 1 observation
For g → ∞ prior is uniform (non-informative)
Marginal likelihood: Zellner-g prior
Marginal likelihood of data:
†„, • Γ p 3 12
p.WZc1.W 3 x a 3 x 1 0 • 1 3 •„W 1c1.W 1 0 • c1;W‘1.
†„, • ’“”•# – 1 0 • 1 3 •„W 1—˜>- 1 0 • —˜‰‘˜>-
Bayes factor (relative marginal likelihood) for †™ and †š:
› †™, †š ™ †™, •
™ †š, • .)Ž .1œ.)Ž .1œ•-ž-
—˜>
- 1 0 • ‰•˜‰ž-
Prior of the model:
Methods of choosing the prior †„
Uniform prior: †„ 21;
[each model is equally probable, expected model size 7/2]
Binomial prior: †„ ;‘ 1 3 ;1;‘
[ -fixed prob. of including each parameter, expected model size 7 ]
Custom Prior Inclusion Probabilities: Ÿ is individual for each variable
Beta-binomial: ∼ "#$ is a random variable
BMS: example
BMS: example
Exercises
Exercise 1.
Select a country and a variable from the dataJIE.csv file. Evaluate factors that were most important for this variable using BMA/BMS methodology
Exercise 2.
Download the data on GDP growth over 1960-1992 and 41 other variables in 72 countries with the commands:
data(datafls) help(datafls)
Evaluate factors that were most important for economic growth using BMA/BMS methodology