A. Introduction to Bayesian inference B. Bayesian regression

(1)

Advanced Econometrics Topic 6: Bayesian inference

Michał Rubaszek

SGH Warsaw School of Economics

(2)

Themes

A. Introduction to Bayesian inference B. Bayesian regression

C. Bayesian model averaging

(3)

Theme A.

Introduction to Bayesian inference

(4)

Bayes theorem

For events A and B the Bayes theorem is:

Explanation:

,

(5)

Bayes theorem in econometrics

For parameters and data the Bayes theorem implies:

- prior pdf of parameters - probability of data given - posteriori pdf

- marginal likelihood of data (does not depend on )

To derive posterior of we substitute by likelihood | :

∝

(6)

Bayes theorem in econometrics

In a Bayesian framework, the parameters are considered as random

variables, whereas in a frequentist approach is assumed to be a constant The distribution of before observing the data is called prior distribution and is denoted by .

The distribution of after observing the data is called posterior distribution and is denoted by | .

(7)

Bayes rule in econometrics: illustration

∝

(8)

Conjugate prior

∝

In some class of models the posterior distribution is in the same

family as the prior distribution. In this case we say about conjgate prior

Example:

Beta distribution Binomial distribution Beta distribution

(9)

Conjugate prior: Beta + Binomial

∼ "#$ %, &

' _()*⁽

+$, _()* _-^(*_()*).

Γ % 0 &

Γ % Γ & ^(1. 1 3 ^*1.

0 5 5 1

Linkto beta distribution in Wikipedia

(10)

Conjugate prior: example

Example:

Two students (A and B) like to play chess. They have already played 6 Times and student A won 7 times (and lost 6 3 7 . Let be the parameter that describes the probability of student A success.

Prior: ^{8 (}⁹^)*⁹

8 (₉ 8 *₉ (₉1. 1 3 ^*⁹^1. ∼ "#$ %_:, &_:

Likelihood: | 6

7 ^; 1 3 ^<1; | ∼ 6,

Posterior: | ∝ ⁽⁹^{); 1.} 1 3 ^*⁹^{)<1; 1.} | ∼ "#$ %_., &_.

%_. %_: 0 7 ; &_. &_: 0 6 3 7

| _{8 (}^{8 (}^>^)*^>

> 8 *_> (_>1. 1 3 ^*^>^1.

Notice: in formula for | we omitted _{8 (}^{8 (}⁹^)*⁹

9 8 *9 and 6

7 . Why?

(11)

Metropolis-Hastings

Markov chain Monte Carlo (MCMC) algorithm

 In most cases we can calculate and | but don't know the analytical formula for |

 We resort to numerical methods, e.g. Metropolis Hastings MCMC 1. Set the initial value of parameter ^@ for A 0

2. Draw ^∗ ^@ 0 CD, where D ∼ 6 0, Σ and C is a step length 3. Draw F ∼ G _:,.

4. Calculate H ^∗ / ^@ and compare it to F

 If H 5 F then ^@). ^@

 If H J F then ^@). ^∗

5. Repeat steps 2-4 6_K@L times

6. Using the sample ^@ for A 6_MNOP 0 1, … , 6_K@L calculate descriptive statistics for |

(12)

Exercise 1

I. Sample | ∼ 6, from binomial distribution.

Set 6 25 and 0.5

II. Assume the prior to ∼ "#$ %_:, &_: . Set %_: 4 and &_: 6

III. Calculate the posteriori distribution parameters %_. and &_..

Calculate posteriori mean and standard deviation with analytical method

IV. Perform MH MCMC simulations and calculate posteriori mean and standard deviation. Compare the values to the results from IV

V. Plot the posteriori density for using two methods:

 analytical

 numerical (MCMC)

(13)

Exercise 2

I. Sample _V| ∼ 6 , 1^W from normal distribution.

Set 6 10 and 2

Notice: our model is _@ 0 D_@, D_@ ∼ 6 0,1^W

II. Assume the prior to ∼ 6 _:, X_: . Set _: 1 and X_: 0.5^W

III. Perform MH MCMC simulations and calculate posteriori mean and standard deviation for

Notice: the loglikelihood for normal distribution model is:

Y 2Z ^1<W exp 3 3 \_< ′ 3 \_<

2 Here ^ _. _W … _<_′ and \_< ^1 1 … 1_′

(14)

Theme B. Linear Bayesian model

(15)

Bayesian regression

 Consider a linear model:

V `_V^a& 0 D_V, D_V ∼ 6 0, X

 Likelihood:

&, X , ` 2ZX ^1:.bc exp 3 d1e* a d1e*

Wf

where g `_.â, `_Wâ,..., `_câ ′ and h _., _W, … , _c ′

 ML estimates:

&i g^ag ^1.g^a

Xj ^k1e*l _n^m ^k1e*l , where o p 3 7

 Vector of parameters:

&, X

(16)

Bayesian regression: known variance

Linear model:

g& 0 D, D ∼ 6 0, X

Likelihood: & 2ZX ^1:.bc exp 3 d1e* a d1e*

Wf

Prior: & ∝ exp 30.5 & 3 & ^a Ω^1. & 3 &

& ∼ 6 &, Ω

Posterior: &| ∝ exp 3^._W & 3 &̅ ^aΩv^1. & 3 &̅

&| ∼ 6 &̅, Ωv

Ωv Ω^1. 0 h^1.g^ag ^1. Ω^1. 0 Ωl^1w ^1.

&̅ Ωx Ω^1.& 0 X^1.g′ = Ωv Ω^1.& 0 Ωl^1w&i

(17)

Example

Let us consider a model:

A^yz 1, Z^yz & 0 D

Prior:

& ∼ 6 01.5 , 10^W 0 0 0.1^W

Posterior mean:

ML Prior Posterior const 2.58 0.00 1.50 infEA 0.71 1.50 0.99

(18)

Bayesian regression: random variance

Linear model:

g& 0 D, D ∼ 6 0, X

Likelihood: &, X 2ZX ^1:.bc exp 3 d1e* a d1e*

Wf

Prior: X ∼ {| o, X

&|X ∼ 6 &, XΩ

&, X ∼ 6{| &, Ω, o, X Normal Inverse Gamma distribution

Posterior: X| ∼ {| o̅, Xx

&|X, ∼ 6 &̅, XΩv

&, X| ∼ 6{| &̅, Ωv, o̅, Xx

Ωv Ω^1. 0 g^ag ^1. &̅ Ωv Ω^1.& 0 g′

o̅ o 0 o Xx Xo 0 Xj^{} 0} & 3 &i Ωv & 3 &i

(19)

Conjugate prior: Normal Gamma

Inverse gamma

g ∼ {| %, &

' g _(1.^*

+$, g _(). ^*_-^-_(1W _-

g _{8 (}^*^~ g^(1.exp 3^*_e g J 0

Link to inverse gamma distr. in Wikipedia

(20)

Gibbs sampling

In the above example we know that X| ∼ {| o̅, Xx and &|X, ∼ 6 &̅, XΩv But we don't know what is the marginal distribution &| .*

To derive it we can use Gibbs-Sampler*

1. Draw X ^@ from {| o̅, Xx 2. Draw & ^@ from 6 &̅, XΩv

3. Repeat setps 1-2 6_K@L times

4. Using the sample & ^@ for A 1, … , 6_K@L calculate descriptive statistics for &|

* In this case we can derive that the marginal distribution is t-Student

(21)

Exercises

I. Estimate a model in which interest rate in a given country depends on:

 inflation

 GDP growth rate

 exchange rate depreciation

II. Set the prior centered at ^0 1.5 0.5 0_ with standard deviation 10 0.1 0.1 0.1 III. Derive the posterior distribution

IV. Make a plot (prior/LL/posteriori) for each parameter

V. Repeat the above using the Normal-Gamma prior and compare the posteriori mean with the values from point III

(22)

Theme C. Bayesian model averaging

(23)

Bayes rule in econometrics

 Let us consider a model for with 7 potential regressors from a set:

g `_., `_W, … `_;

 There are 2^; different subsets g_„ ∈ g, hence 2^; potential models †_„:

%_„ 0 g_„&_„ 0 D

 Which specification †_„ should be selected? This problem is especially difficult for large 7!

 Bayesian inference helps to tackle this problem

BMA – Bayesian model averaging / BMS – Bayesian model selection

(24)

Bayes theorem in model selection

The Bayes theorem

†_„ †_„ †_„

where

ˆ †_„ †_„

W^‰

„Š.

†_„ - posteriori probability of model †_„

†_„ - prior probability of model †_„

†_„ - marginal likelihood model †_„

We need a method to calculate |†_„ and choose †_„

(25)

Marginal likelihood

For model †_„:

%_„ 0 g_„&_„ 0 D, D ∼ 6 0, X_„

the marginal likelihood is

†_„ ‹ _„ _„ Œ _„

where _„ %_„, &_„, X_„

(26)

Marginal likelihood, Zellner g-prior

Zellner g-prior

%_„ ∝ 1 X_„ ∝ X_„^1.

&_„|• ∼ 6 0, •X_„ g_„^ag_„ ^1.

Posterior distribution

' &_„ _.)Ž^Ž &i_• , where &i_„ is ML estimate of the parameters

 For g → 0 posterior is equal to prior

 For g = 1 posteriori puts equal weight to prior and likelihood,

 For g = T prior has the equivalent weight of 1 observation

 For g → ∞ prior is uniform (non-informative)

(27)

Marginal likelihood: Zellner-g prior

Marginal likelihood of data:

†_„, • Γ p 3 12

p^.^WZ^c1.^W 3 x ^a 3 x 1 0 • 1 3 •_„^W ^1c1.^W 1 0 • ^c1;^W^‘^1.

†_„, • ’“”•# – 1 0 • 1 3 •_„^W ¹^—˜>^- 1 0 • ^{—˜‰‘˜>}^-

Bayes factor (relative marginal likelihood) for †_™ and †_š:

› †_™, †_š ^™ †_™, •

™ †_š, • ^{.)Ž .1œ}^{.)Ž .1œ}^•-^ž^-

—˜>

- 1 0 • ^{‰•˜‰ž}^-

(28)

Prior of the model:

Methods of choosing the prior †_„

 Uniform prior: †_„ 2^1;

[each model is equally probable, expected model size 7/2]

 Binomial prior: †_„ ^;^‘ 1 3 ^;1;^‘

[ -fixed prob. of including each parameter, expected model size 7 ]

 Custom Prior Inclusion Probabilities: _Ÿ is individual for each variable

 Beta-binomial: ∼ "#$ is a random variable

(29)

BMS: example

(30)

BMS: example

(31)

Exercises

Exercise 1.

Select a country and a variable from the dataJIE.csv file. Evaluate factors that were most important for this variable using BMA/BMS methodology

Exercise 2.

Download the data on GDP growth over 1960-1992 and 41 other variables in 72 countries with the commands:

data(datafls) help(datafls)

Evaluate factors that were most important for economic growth using BMA/BMS methodology