Mathematical Statistics Anna Janicka

(1)

Mathematical Statistics

Anna Janicka

Lecture XIV, 1.06.2020

BAYESIAN STATISTICS

(2)

Plan for Today

1. Bayesian Statistics

 a priori and a posteriori distributions

 Bayesian estimation:

 Maximum a posteriori probability (MAP)

 Bayes Estimator

(3)

Bayesian Statistics vs. traditional statistics

Frequentist: unknown parameters are given (fixed), observed data are random

Bayesian: observed data are given (fixed), parameters are random

(4)

Bayesian Statistics

Our knowledge about the unknown parameters is described by means of probability distributions, and additional knowledge may affect our description.

Knowledge:

 general

 specific

Example: coin toss

(5)

Bayesian Model

 X₁, ..., X_n come from distribution P_θ , with density f_θ (x) – conditional density given a specific value of θ (likelihood function).

 P – family of probability distributions P_θ , indexed by the parameter θ∈Θ

 General knowledge: distribution Π over the parameter space Θ, given by π(θ) – the so- called a priori/prior distribution of θ^,

θ ~ Π

(6)

Bayesian Model – cont.

Additional knowledge (specific, contextual):

based on observation. We have a joint distribution of observations and θ:

on this basis we can derive the conditional distribution of θ (given the observed data)

where

is a marginal distribution for the obs.

𝑓𝑓(𝑥𝑥₁, 𝑥𝑥₂, . . . , 𝑥𝑥_𝑛𝑛, 𝜃𝜃) = 𝑓𝑓(𝑥𝑥₁, 𝑥𝑥₂, . . . , 𝑥𝑥_𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃)

𝜋𝜋(𝜃𝜃|𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛) = 𝑓𝑓(𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃) 𝑚𝑚(𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛) , 𝑚𝑚(𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛) = �

Θ𝑓𝑓( 𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃)𝑑𝑑𝜃𝜃

(7)

Bayesian Model – a posteriori distribution

is called the a posteriori/

posterior distribution, denoted Π_x The posterior distribution reflects all

knowledge: general (initial) and specific (based on the observed data).

Grounds for Bayesian inference and modeling

𝜋𝜋(𝜃𝜃|𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛)

(8)

Prior and posterior distributions: examples

1.Let X₁, ..., X_n be IID r.v. from a 0-1 distr. with prob. of success θ; let

for θ∈(0,1) where

and

then the posterior distribution:

conjugate prior for Bernoulli distr.

𝜋𝜋(𝜃𝜃) = 𝜃𝜃^𝛼𝛼−1(1 − 𝜃𝜃)^𝛽𝛽−1 𝐵𝐵(𝛼𝛼, 𝛽𝛽)

𝐵𝐵(𝛼𝛼, 𝛽𝛽) = �

0

1𝑢𝑢^𝛼𝛼−1(1 − 𝑢𝑢)^𝛽𝛽−1𝑑𝑑𝑢𝑢 = Γ(𝛼𝛼)Γ(𝛽𝛽) Γ(𝛼𝛼 + 𝛽𝛽) Γ(𝛼𝛼) = �

0

∞𝑢𝑢^𝛼𝛼−1exp( − 𝑢𝑢)𝑑𝑑𝑢𝑢 = (𝛼𝛼 − 1)Γ(𝛼𝛼 − 1)

Beta(�

𝑖𝑖=1 𝑛𝑛

𝑥𝑥_𝑖𝑖 + 𝛼𝛼, 𝑛𝑛 − �

𝑖𝑖=1 𝑛𝑛

𝑥𝑥_𝑖𝑖 + 𝛽𝛽)

Beta(α,β) distr with mean

= α/(α+ β)

(9)

For a Beta (1,1) prior and data: n=10 and 1, 5, 9 successes

(10)

(11)

(12)

(13)

(14)

(15)

Prior and posterior distributions: examples (2)

2. Let X₁, ..., X_n be IID r.v. from N(θ^,σ²^{), and} σ² known; θ ~N(m, τ ²) for m, τ known.

Then the posterior distribution for θ^:

conjugate prior for a normal distr.

𝑁𝑁 𝑛𝑛 1𝜎𝜎² ̄𝑋𝑋 + 1𝜏𝜏² 𝑚𝑚 𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²

, 1

𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²

(16)

Bayesian Statistics

Based on the Bayes approach, we can

 find estimates

 find an equivalent of confidence intervals

 verify hypotheses

 make predictions

(17)

Bayesian Most Probabale (BMP) / Maximum a posteriori Probability (MAP) estimate

Similar to ML estimation: the argument which maximizes the posterior distribution:

i.e.

𝜋𝜋( ̂𝜃𝜃_{𝐵𝐵𝐵𝐵𝐵𝐵}|𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛) = max_𝜃𝜃 𝜋𝜋 (𝜃𝜃|𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛)

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = ̂𝜃𝜃_{𝐵𝐵𝐵𝐵𝐵𝐵} = argmax_𝜃𝜃 𝜋𝜋 (𝜃𝜃|𝑥𝑥₁, . . . , 𝑥𝑥_𝑛𝑛)

(18)

BMP: examples

1. Let X₁, ..., X_n be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ∈(0,1)

We know the posterior distribution:

we have max for

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same prior distr., we have BMP(θ )=9/10

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

Beta(�

𝑖𝑖=1 𝑛𝑛

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = ∑_𝑖𝑖=1^𝑛𝑛 𝑥𝑥_𝑖𝑖 + 𝛼𝛼 − 1 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼 − 2

(19)

BMP: examples (2)

2. Let X₁, ..., X_n be IID r.v. from N(θ, σ²), with σ²

known; θ ~N(m, τ ²) for m, τ known.

Then the posterior distr. for θ : so

i.e. if we have a sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

BMP(θ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

BMP(θ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

, 1

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑛𝑛 1𝜎𝜎² ̄𝑋𝑋 + 1𝜏𝜏² 𝑚𝑚 𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²

(20)

Bayes Estimator

An estimation rule which minimizes the

posterior expected value of a loss function

L(θ, a) – loss function, depends on the true value of θ and the decision a.

e.g. if we want to estimate g(θ ):

L(θ, a) = (g(θ) - a)² – quadratic loss function L(θ, a) = |g(θ) - a| – module loss function

(21)

Bayes Estimator – cont.

We can also define the accuracy of an estimate for a given loss function :

(the average loss of the estimator for a given prior distribution and data, i.e. for a specific posterior distribution)

𝑎𝑎𝑎𝑎𝑎𝑎(Π, �𝑔𝑔(𝑥𝑥)) = 𝐸𝐸 𝐿𝐿(𝜃𝜃, �𝑔𝑔(𝑥𝑥))|𝑋𝑋 = 𝑥𝑥 = �

Θ𝐿𝐿(𝜃𝜃, �𝑔𝑔(𝑥𝑥))𝜋𝜋(𝜃𝜃|𝑥𝑥)𝑑𝑑𝜃𝜃

(22)

Bayes Estimator – cont. (2)

The Bayes Estimator for a given loss function L(θ, a) is such that

For a quadratic loss function (θ ^{– a)}²^:

For a module loss function |θ ^{– a|}²^:

�𝑔𝑔_𝐵𝐵

∀𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎(Π, �𝑔𝑔_𝐵𝐵(𝑥𝑥)) = min_𝑎𝑎 𝑎𝑎 𝑎𝑎𝑎𝑎(Π, 𝑎𝑎)

̂𝜃𝜃_𝐵𝐵 = 𝐸𝐸(𝜃𝜃|𝑋𝑋 = 𝑥𝑥) = 𝐸𝐸(Π_𝑥𝑥)

̂𝜃𝜃_𝐵𝐵 = 𝐵𝐵𝑀𝑀𝑑𝑑(Π_𝑥𝑥)

more generally: E(g(θ)|x)

(23)

Bayes Estimator: Example (1)

We know the posterior distribution:

so the Bayes Estimator is

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have =6/12 = ½

and for 9 successes in 10 trials for the same prior distr., we have

=10/12 = 5/6

Beta(�

𝑖𝑖=1 𝑛𝑛

Beta(α,β) distr with mean

= α/(α+ β)

�𝜃𝜃_𝐵𝐵 = ∑_𝑖𝑖=1^𝑛𝑛 𝑥𝑥_𝑖𝑖 + 𝛼𝛼 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼

̂𝜃𝜃_𝐵𝐵

(24)

BMP: examples

We know the poster distribution:

we have max for

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same prior distr., we have BMP(θ )=9/10

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

Beta(�

𝑖𝑖=1 𝑛𝑛

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = ∑_𝑖𝑖=1^𝑛𝑛 𝑥𝑥_𝑖𝑖 + 𝛼𝛼 − 1 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼 − 2

(25)

Bayes Estimator: examples (2)

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

= (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

= (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

, 1

̂𝜃𝜃_𝐵𝐵 = 𝑛𝑛 1𝜎𝜎² ̄𝑋𝑋 + 1𝜏𝜏² 𝑚𝑚 𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²

̂𝜃𝜃_𝐵𝐵

(26)

BMP: examples (2)

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

BMP(θ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

BMP(θ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

𝑁𝑁 𝑛𝑛 1𝜎𝜎² ̄𝑋𝑋 + 1𝜏𝜏²𝑚𝑚 𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²

, 1

𝑛𝑛 1𝜎𝜎²+ 1𝜏𝜏²

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑛𝑛 1𝜎𝜎² ̄𝑋𝑋 + 1𝜏𝜏² 𝑚𝑚 𝑛𝑛 1𝜎𝜎² + 1𝜏𝜏²