Mathematical Statistics

(1)

Mathematical Statistics

Anna Janicka

Lecture XV, 3.06.2019

B

^AYESIAN

S

^TATISTICS

–

^CONT

.

(2)

Plan for Today

1. Bayesian Statistics

a priori and a posteriori distributions Bayesian estimation:

Maximum a posteriori probability (MAP) Bayes Estimator

(3)

Bayesian Model – reminder

X

₁

, ..., X

_n

come from distribution P

_θ

, with density f

_θ

(x) – conditional density given a specific value of θ

P – family of probability distributions P

_θ

General knowledge: distribution Π over the parameter space Θ, given by π ⁽ θ ^{) (prior)}

Additional knowledge (specific): data values

Based on this distribution, we find the conditional

) ( )

| ,...,

, (

) , ,...,

,

( x

₁

x

₂

x

_n

θ f x

₁

x

₂

x

_n

θ π θ

f =

) , ,...,

(

) ( )

| ,...,

) ( ,...,

| (

1 1

1

n n

n

m x x

x x

x f

x θ π θ

θ

π =

θ θ π

θ d

x x

f x

x

m( ₁,..., _n) =

∫

Θ ( ₁,..., _n | ) ( )

(4)

Bayesian Model – posterior distribution – reminder

is called the a posteriori/

posterior distribution, denoted Π

_x

The posterior distribution reflects all

knowledge: general (initial) and specific (based on the observed data).

Grounds for Bayesian inference and modeling

) ,...,

|

( θ x

₁

x

_n

π

(5)

A priori and a posteriori distributions: examples (reminder)

1.Let X

₁

, ..., X

_n

be IID r.v. from a 0-1 distr. with prob. of success θ ; let

for θ ∈(0,1) where

and

then the posterior distribution:

conjugate prior

for Bernoulli distr.

) ,

(

) 1

) ( (

1 1

β α

θ θ θ

π

^α ^β

B

−

=

) (

) ( ) ) (

1 ( )

,

( ¹

0

1 1

β α

β β α

α ^α ^β

+ Γ

Γ

= Γ

−

=

∫

^u ⁻ ^u ⁻ ^du

B

) 1 (

) exp(

)

( 0

1 − = − Γ −

=

Γ ^α

∫

^∞^u^α⁻ ^u ^du ^α ^α

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

Beta(α,β) distr with mean

= α/(α+ β)

(6)

A priori and a posteriori distributions: examples (2)

2. Let X

₁

, ..., X

_n

be IID r.v. from N( θ , σ

²

), and σ

²

known; θ ~N(m, τ

²

) for m, τ known.

Then the posterior distribution for θ :

conjugate prior

for a normal distr.

 





 





+ +

+

2 2

1 1

1

1 ,

τ σ

n n

m X

N n

(7)

Bayesian Statistics

Based on the Bayes approach, we can find estimates

find an equivalent of confidence intervals verify hypotheses

make predictions

(8)

Bayesian Most Probabale (BMP) / Maximum a posteriori Probability (MAP) estimate

Similar to ML estimation: the argument which maximizes the posterior distribution:

i.e.

) ,...,

| ( max

) ,...,

ˆ |

( θ

_BMP

x

₁

x

_n

π θ x

₁

x

_n

π =

_θ

) ,...,

| ( max

ˆ arg )

(

_BMP

x

₁

x

_n

BMP θ = θ =

_θ

π θ

(9)

BMP: examples

1. Let X

₁

, ..., X

_n

be IID r.v. from a Bernoulli distr. with prob. of success θ ^{; for} θ ∈(0,1)

We know the posterior distribution:

we have max for

i.e. for 5 successes in 10 trials for a U(0,1) prior (i.e. Beta(1,1) distr.) we have BMP(θ)=5/10 = ½; if the prior were Beta(5,5) then BMP(θ)=9/18 = ½;

if the prior were Beta(1,5) then BMP(θ)=5/14

and for 9 successes in 10 trials for a U(0,1) prior we have BMP(θ)=9/10; if the prior were Beta(5,5) then BMP(θ)=13/18; if the prior were Beta(1,5)

then BMP(θ)=9/14

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

^α ^β

B

−

=

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

2 ) 1

(

¹

− +

+

−

= ∑

=

+

α β

θ α

n BMP x

n

i i

(10)

BMP: examples (2)

2. Let X

₁

, ..., X

_n

be IID r.v. from N( θ , σ

²

), with σ

²

known; θ ^~N(m, τ

²

^{) for m,} τ ^known.

Then the posterior distr. for θ : so

i.e. if we have a sample of 5 obs. 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N( θ , 4) and the prior distr. is θ ~N(1, 1), then

BMP( θ ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr. were θ ~N(3, 1), then

BMP( θ ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44













+ +

+

2 2

1 1

1

1 1

,

τ σ

n n

m X

N n

2 2

1 1

) (

τ σ

τ

θ

σ

+

= +

n

m X

BMP n

(11)

Bayes Estimator

An estimation rule which minimizes the

posterior expected value of a loss function

L( θ , a) – loss function, depends on the true value of θ and the decision a.

e.g. if we want to estimate g( θ ):

L( θ , a) = (g( θ ) - a)

²

– quadratic loss function

L( θ , a) = |g( θ ) - a| – module loss function

(12)

Bayes Estimator – cont.

We can also define the accuracy of an estimate for a given loss function :

(the average loss of the estimator for a given a priori distribution and data, i.e. for a specific

posterior distribution)

( = ) = ∫

Θ

=

Π g x E L θ g x X x L θ g x π θ x d θ

acc ( , ˆ ( )) ( , ˆ ( )) | ( , ˆ ( )) ( | )

(13)

Bayes Estimator – cont. (2)

The Bayes Estimator for a given loss function L( θ , a) is such that

For a quadratic loss function ( θ – a)

²

:

For a module loss function | θ – a|

²

:

gˆ

B

) ,

( min

)) ˆ (

, (

acc g x acc a

x Π

_B

=

_a

Π

∀

) (

)

| ˆ (

x B

= E θ X = x = E Π

θ

) ˆ (

x B

= Med Π

θ

more generally: E(g(θ)|x)

(14)

Bayes Estimator: Example (1)

1. Let X

₁

, ..., X

_n

be IID r.v. from a Bernoulli distr. with prob. of success θ ^{; for} θ ∈(0,1)

We know the posterior distribution:

so the Bayes Estimator is

i.e. for 5 successes in 10 trials for a U(0,1) prior (i.e. Beta(1,1) distr.) we have =6/12 = ½; if the prior were Beta(5,5) then =10/20 = ½;

if the prior were Beta(1,5) then =6/16

and for 9 successes in 10 trials for a U(0,1) prior we have = 10/12; if the prior were Beta(5,5) then = 14/20; if the prior were Beta(1,5)

then = 9/16

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

^α ^β

B

−

= )

, (

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

Beta(α,β) distr with mean

= α/(α+ β)

α β

θ α

+ +

= ∑

=

+ n

n

x

i i

B

ˆ

1

θ ^ˆ

B

θ ^ˆ

B

θ ^ˆ

B

θ ^ˆ

B

θ ^ˆ

B

θ ^ˆ

B

(15)

BMP: examples

1. Let X

₁

, ..., X

_n

be IID r.v. from a Bernoulli distr. with prob. of success θ ^{; for} θ ∈(0,1)

We know the poster distribution:

we have max for

i.e. for 5 successes in 10 trials for a U(0,1) prior (i.e. Beta(1,1) distr.) we have BMP(θ)=5/10 = ½; if the prior were Beta(5,5) then BMP(θ)=9/18 = ½;

if the prior were Beta(1,5) then BMP(θ)=5/14

and for 9 successes in 10 trials for a U(0,1) prior we have BMP(θ)=9/10; if the prior were Beta(5,5) then BMP(θ)=13/18; if the prior were Beta(1,5)

then BMP(θ)=9/14

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

^α ^β

B

−

=

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

2 ) 1

(

¹

− +

+

−

= ∑

=

+

α β

θ α

n BMP x

n

i i

(16)

Bayes Estimator: examples (2)

2. Let X

₁

, ..., X

_n

be IID r.v. from N( θ , σ

²

), with σ

²

known; θ ^~N(m, τ

²

^{) for m,} τ ^known.

Then the posterior distr. for θ : so

i.e. if we have a sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N( θ , 4) and the a priori distr is θ ~N (1, 1), then

= (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the a priori distr were θ ~N (3, 1), then

= (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44













+ +

+

2 2

1 1

1

1 1

,

τ σ

n n

m X

N n

2 2

1 1

ˆ

τ σ

τ

θ

σ

+

= +

n

m X

n

B

θ ^ˆ

B

θ ^ˆ

B

(17)

BMP: examples (2)

2. Let X

₁

, ..., X

_n

be IID r.v. from N( θ , σ

²

), with σ

²

known; θ ^~N(m, τ

²

^{) for m,} τ ^known.

Then the a posteriori distr for θ : so

i.e. if we have a sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N( θ , 4) and the a priori distr is θ ~N (1, 1), then

BMP ( θ ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the a priori distr were θ ~N (3, 1), then

BMP ( θ ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44













+ +

+

2 2

1 1

1

1 1

,

τ σ

n n

m X

N n

2 2

1 1

) (

τ σ

τ

θ

σ

+

= +

n

m X

BMP n

(18)

Example problem

Let 0.38, 0.65, 0.72, 1.00 be independent realizations of a random variable from a uniform distribution over the interval (0, θ ), where θ >0 is an unknown parameter.

We initially assume that θ is uniformly distributed over the interval [1/2, 2].

Find the posterior distribution.

Find the Bayesian most probable estimator.

Find the Bayes estimator for a quadratic loss function.

Find the Bayes estimator for a modulus loss function.

(19)

Example exam problem (1)

(20)

Example exam problem (2)

(21)

Highest Posterior Density Credible Interval

A 1- α HPD (Highest Posterior Density) credible

interval (Bayesian Confidence Interval) for parameter

θ is a set A ⊆ Θ such that

and

for highest such that the second condition is fulfilled

The HPD credible interval has the intuitive property of inclusion which the frequentist CI does not have

} )

| (

: {

θ π θ k _α

A = x >

α

−

≥

Π ( A | x ) 1

k α

(22)

HPD Credible Interval: example

Let X

₁

, ..., X

_n

be IID r.v. from N( θ , σ

²

), with σ

²

known;

θ ^~N(m, τ

²

^{) for m,} τ ^known.

Then the posterior distr. for θ : so for α = 0.05 we get a HPD CI:

i.e. if we have a sample of 5 obs. from distr. N( θ , 4) with mean 2 and the a priori distr. is θ ~N (1, 1), then due to the fact that u

_0.975

≈ 1.96 we have an HPD CI:













+ +

+

2 2 2

2

2 2

1 1

1

1 1

,

τ σ

n n

m X

N n

 





 





⋅ + + +

+

⋅ + + −

+

2 2

1 1

1

1 96 . 1 1 ,

96 . 1

τ σ

n n

m X

n n

n

m X

n

( ¹ ^. ⁸³ ⁻ ¹ ^. ⁹⁶ ^⋅

¹₆

^, ¹ ^. ⁸³ ⁺ ¹ ^. ⁹⁶ ^⋅

¹₆

) ( ⁼ ¹ ^. ⁵⁰ ^, ² ^. ¹⁵ )

≈

(23)

Mathematical Statistics