• Nie Znaleziono Wyników

Mathematical Statistics Anna Janicka

N/A
N/A
Protected

Academic year: 2021

Share "Mathematical Statistics Anna Janicka"

Copied!
26
0
0

Pełen tekst

(1)

Mathematical Statistics

Anna Janicka

Lecture XIV, 1.06.2020

BAYESIAN STATISTICS

(2)

Plan for Today

1. Bayesian Statistics

a priori and a posteriori distributions

Bayesian estimation:

Maximum a posteriori probability (MAP)

Bayes Estimator

(3)

Bayesian Statistics vs. traditional statistics

Frequentist: unknown parameters are given (fixed), observed data are random

Bayesian: observed data are given (fixed), parameters are random

(4)

Bayesian Statistics

Our knowledge about the unknown parameters is described by means of probability distributions, and additional knowledge may affect our description.

Knowledge:

general

specific

Example: coin toss

(5)

Bayesian Model

X1, ..., Xn come from distribution Pθ , with density fθ (x) – conditional density given a specific value of θ (likelihood function).

P – family of probability distributions Pθ , indexed by the parameter θ∈Θ

 General knowledge: distribution Π over the parameter space Θ, given by π(θ) – the so- called a priori/prior distribution of θ,

θ ~ Π

(6)

Bayesian Model – cont.

Additional knowledge (specific, contextual):

based on observation. We have a joint distribution of observations and θ:

on this basis we can derive the conditional distribution of θ (given the observed data)

where

is a marginal distribution for the obs.

𝑓𝑓(𝑥𝑥1, 𝑥𝑥2, . . . , 𝑥𝑥𝑛𝑛, 𝜃𝜃) = 𝑓𝑓(𝑥𝑥1, 𝑥𝑥2, . . . , 𝑥𝑥𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃)

𝜋𝜋(𝜃𝜃|𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛) = 𝑓𝑓(𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃) 𝑚𝑚(𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛) , 𝑚𝑚(𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛) = �

Θ𝑓𝑓( 𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛|𝜃𝜃)𝜋𝜋(𝜃𝜃)𝑑𝑑𝜃𝜃

(7)

Bayesian Model – a posteriori distribution

is called the a posteriori/

posterior distribution, denoted Πx The posterior distribution reflects all

knowledge: general (initial) and specific (based on the observed data).

Grounds for Bayesian inference and modeling

𝜋𝜋(𝜃𝜃|𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛)

(8)

Prior and posterior distributions: examples

1.Let X1, ..., Xn be IID r.v. from a 0-1 distr. with prob. of success θ; let

for θ∈(0,1) where

and

then the posterior distribution:

conjugate prior for Bernoulli distr.

𝜋𝜋(𝜃𝜃) = 𝜃𝜃𝛼𝛼−1(1 − 𝜃𝜃)𝛽𝛽−1 𝐵𝐵(𝛼𝛼, 𝛽𝛽)

𝐵𝐵(𝛼𝛼, 𝛽𝛽) = �

0

1𝑢𝑢𝛼𝛼−1(1 − 𝑢𝑢)𝛽𝛽−1𝑑𝑑𝑢𝑢 = Γ(𝛼𝛼)Γ(𝛽𝛽) Γ(𝛼𝛼 + 𝛽𝛽) Γ(𝛼𝛼) = �

0

𝑢𝑢𝛼𝛼−1exp( − 𝑢𝑢)𝑑𝑑𝑢𝑢 = (𝛼𝛼 − 1)Γ(𝛼𝛼 − 1)

Beta(�

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛼𝛼, 𝑛𝑛 − �

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛽𝛽)

Beta(α,β) distr with mean

= α/(α+ β)

(9)

For a Beta (1,1) prior and data: n=10 and 1, 5, 9 successes

(10)

For a Beta (1,1) prior and data: n=100 and 10, 50, 90 successes

(11)

For a Beta (10,10) prior and data: n=10 and 1, 5, 9 successes

(12)

For a Beta (10,10) prior and data: n=100 and 10, 50, 90 successes

(13)

For a Beta (1,5) prior and data: n=10 and 1, 5, 9 successes

(14)

For a Beta (1,5) prior and data: n=100 and 10, 50, 90 successes

(15)

Prior and posterior distributions: examples (2)

2. Let X1, ..., Xn be IID r.v. from N(θ, σ2), and σ2 known; θ ~N(m, τ 2) for m, τ known.

Then the posterior distribution for θ:

conjugate prior for a normal distr.

𝑁𝑁 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

, 1

𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

(16)

Bayesian Statistics

Based on the Bayes approach, we can

 find estimates

 find an equivalent of confidence intervals

 verify hypotheses

 make predictions

(17)

Bayesian Most Probabale (BMP) / Maximum a posteriori Probability (MAP) estimate

Similar to ML estimation: the argument which maximizes the posterior distribution:

i.e.

𝜋𝜋( ̂𝜃𝜃𝐵𝐵𝐵𝐵𝐵𝐵|𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛) = max𝜃𝜃 𝜋𝜋 (𝜃𝜃|𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛)

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = ̂𝜃𝜃𝐵𝐵𝐵𝐵𝐵𝐵 = argmax𝜃𝜃 𝜋𝜋 (𝜃𝜃|𝑥𝑥1, . . . , 𝑥𝑥𝑛𝑛)

(18)

BMP: examples

1. Let X1, ..., Xn be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ∈(0,1)

We know the posterior distribution:

we have max for

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same prior distr., we have BMP(θ )=9/10

𝜋𝜋(𝜃𝜃) = 𝜃𝜃𝛼𝛼−1(1 − 𝜃𝜃)𝛽𝛽−1 𝐵𝐵(𝛼𝛼, 𝛽𝛽)

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

Beta(�

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛼𝛼, 𝑛𝑛 − �

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛽𝛽)

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑖𝑖=1𝑛𝑛 𝑥𝑥𝑖𝑖 + 𝛼𝛼 − 1 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼 − 2

(19)

BMP: examples (2)

2. Let X1, ..., Xn be IID r.v. from N(θ, σ2), with σ2

known; θ ~N(m, τ 2) for m, τ known.

Then the posterior distr. for θ : so

i.e. if we have a sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

BMP(θ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

BMP(θ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

𝑁𝑁 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

, 1

𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

(20)

Bayes Estimator

An estimation rule which minimizes the

posterior expected value of a loss function

L(θ, a) – loss function, depends on the true value of θ and the decision a.

e.g. if we want to estimate g(θ ):

L(θ, a) = (g(θ) - a)2 – quadratic loss function L(θ, a) = |g(θ) - a| – module loss function

(21)

Bayes Estimator – cont.

We can also define the accuracy of an estimate for a given loss function :

(the average loss of the estimator for a given prior distribution and data, i.e. for a specific posterior distribution)

𝑎𝑎𝑎𝑎𝑎𝑎(Π, �𝑔𝑔(𝑥𝑥)) = 𝐸𝐸 𝐿𝐿(𝜃𝜃, �𝑔𝑔(𝑥𝑥))|𝑋𝑋 = 𝑥𝑥 = �

Θ𝐿𝐿(𝜃𝜃, �𝑔𝑔(𝑥𝑥))𝜋𝜋(𝜃𝜃|𝑥𝑥)𝑑𝑑𝜃𝜃

(22)

Bayes Estimator – cont. (2)

The Bayes Estimator for a given loss function L(θ, a) is such that

For a quadratic loss function (θ – a)2:

For a module loss function |θ – a|2:

�𝑔𝑔𝐵𝐵

∀𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎(Π, �𝑔𝑔𝐵𝐵(𝑥𝑥)) = min𝑎𝑎 𝑎𝑎 𝑎𝑎𝑎𝑎(Π, 𝑎𝑎)

̂𝜃𝜃𝐵𝐵 = 𝐸𝐸(𝜃𝜃|𝑋𝑋 = 𝑥𝑥) = 𝐸𝐸(Π𝑥𝑥)

̂𝜃𝜃𝐵𝐵 = 𝐵𝐵𝑀𝑀𝑑𝑑(Π𝑥𝑥)

more generally: E(g(θ)|x)

(23)

Bayes Estimator: Example (1)

1. Let X1, ..., Xn be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ∈(0,1)

We know the posterior distribution:

so the Bayes Estimator is

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have =6/12 = ½

and for 9 successes in 10 trials for the same prior distr., we have

=10/12 = 5/6

𝜋𝜋(𝜃𝜃) = 𝜃𝜃𝛼𝛼−1(1 − 𝜃𝜃)𝛽𝛽−1 𝐵𝐵(𝛼𝛼, 𝛽𝛽)

Beta(�

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛼𝛼, 𝑛𝑛 − �

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛽𝛽)

Beta(α,β) distr with mean

= α/(α+ β)

�𝜃𝜃𝐵𝐵 = 𝑖𝑖=1𝑛𝑛 𝑥𝑥𝑖𝑖 + 𝛼𝛼 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼

̂𝜃𝜃𝐵𝐵

̂𝜃𝜃𝐵𝐵

(24)

BMP: examples

1. Let X1, ..., Xn be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ∈(0,1)

We know the poster distribution:

we have max for

i.e. for 5 successes in 10 trials for a prior U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same prior distr., we have BMP(θ )=9/10

𝜋𝜋(𝜃𝜃) = 𝜃𝜃𝛼𝛼−1(1 − 𝜃𝜃)𝛽𝛽−1 𝐵𝐵(𝛼𝛼, 𝛽𝛽)

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

Beta(�

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛼𝛼, 𝑛𝑛 − �

𝑖𝑖=1 𝑛𝑛

𝑥𝑥𝑖𝑖 + 𝛽𝛽)

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑖𝑖=1𝑛𝑛 𝑥𝑥𝑖𝑖 + 𝛼𝛼 − 1 𝑛𝑛 + 𝛽𝛽 + 𝛼𝛼 − 2

(25)

Bayes Estimator: examples (2)

2. Let X1, ..., Xn be IID r.v. from N(θ, σ2), with σ2

known; θ ~N(m, τ 2) for m, τ known.

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

= (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

= (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

𝑁𝑁 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

, 1

𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

̂𝜃𝜃𝐵𝐵 = 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

̂𝜃𝜃𝐵𝐵

̂𝜃𝜃𝐵𝐵

(26)

BMP: examples (2)

2. Let X1, ..., Xn be IID r.v. from N(θ, σ2), with σ2

known; θ ~N(m, τ 2) for m, τ known.

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(θ, 4) and the prior distr is θ ~N(1, 1), then

BMP(θ) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the prior distr were θ ~N(3, 1), then

BMP(θ) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

𝑁𝑁 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

, 1

𝑛𝑛 1𝜎𝜎2+ 1𝜏𝜏2

𝐵𝐵𝐵𝐵𝐵𝐵(𝜃𝜃) = 𝑛𝑛 1𝜎𝜎2 ̄𝑋𝑋 + 1𝜏𝜏2 𝑚𝑚 𝑛𝑛 1𝜎𝜎2 + 1𝜏𝜏2

Cytaty

Powiązane dokumenty

INDEPENDENCE OF EVENTS BERNOULLI PROCESS..

Well-behaved transformations of continuous

CUMULATIVE DISTRIBUTION FUNCTION – cont., EXPECTED VALUE – INTRO... The definition of

 Expected value for discrete random variables – cont..  Expected value for continuous random

In reality, we frequently do not know the distributions of random variables, and work with samples

Example: joint distribution is more than the aggregate of marginal distributions.. No simple definitions

Two-dimensional normal RV with mean and a covariance matrix Q. Two-dimensional

Best (in terms of average square deviation) linear approximation of variable Y with variable X,