• Nie Znaleziono Wyników

1. Chi-squared tests – cont.

N/A
N/A
Protected

Academic year: 2021

Share "1. Chi-squared tests – cont. "

Copied!
33
0
0

Pełen tekst

(1)

Mathematical Statistics

Anna Janicka

Lecture XIV, 27.05.2019

BAYESIAN STATISTICS

(2)

Plan for Today

1. Chi-squared tests – cont.

2. Bayesian Statistics

a priori and a posteriori distributions Bayesian estimation:

Maximum a posteriori probability (MAP) Bayes Estimator

(3)

Chi-squared goodness-of-fit test – reminder.

General form of the test:

here: or

Theorem. If H

0

is true, for n→∞ the distribution of the χ

2

statistic converges to a chi-squared distr. with k-1 degrees of freedom χ

2

(k-1) or to a chi-squared distr. with k-d-1 degrees of freedom χ

2

(k-d-1) (depending on the

dimension d of unknown parameter θ )

= expected value

value) expected

- value observed

( 2

χ2

=

=

i i k i

i np

np

N 2

1

2 ( - )

χ χ =

= ( (θ)θ

i

2 i

k i

i np

)) ( np - N

1 2

(4)

Chi-squared goodness-of-fit test – version for continuous distributions

Kolmogorov tests are better, but the chi- squared test may also be used

Model: X

1

, X

2

, ..., X

n

are an IID sample from a continuous distribution.

H

0

: The distribution is given by F

H

1

: ¬ H

0 (i.e. the distribution is different)

It suffices to divide the range of values of the random variable into classes and count the

observations. The expected values are known

(result from F).Then: the chi-squared test.

(5)

Chi-squared goodness-of-fit test – practical notes

The test should be used for large samples The expected counts can’t be too small (<5). If they are smaller, observations should be grouped.

The classes in the „continuous” version

may be chosen arbitrarily, but it is best if

the theoretical probabilities are balanced.

(6)

Chi-squared test of independence

Model: (X

1

,Y

1

), ..., (X

n

,Y

n

) are an IID sample from a two-dimensional distribution with r*s values

(denoted by the set {1, ..., r} × {1, ..., s}).

Let the theoretical distribution be

Denote

We want to veryfy independence of X and Y:

H

0

:

H

1

: ¬ H

0

s j

r i

j Y

i X

P

p

ij

= ( = , = ) = 1 ,..., = 1 ,...,

= =

= =

r

i ij

j s

j ij

i

p p p

p

1

,

1

r j

s i

p p

p

ij

=

i

j

= 1 ,..., , = 1 ,...,

(7)

Chi-squared test of independence – cont.

The empirical distribution may be summarized by a table (so-called contingency table, or

crosstab)

i \ j 1 2 ... s Ni•

1 N11 N12 N1s N1•

2 N21 N22 N2s N2•

...

r Nr1 Nr2 Nrs Nr•

N•j N•1 N•2 N•s n

(8)

Chi-squared test of independence – cont. (2)

This is a special case of a goodness-of-fit test with (r-1) + (s-1) parameters to be

estimated:

The test statistic:

has a chi-squared distribution with (r-1)(s-1) degrees of freedom (if H

0

is true)

∑ ∑

= =

=

r

i

s j

j i

j i

ij

n N

N

n N

N N

1 1

2 2

/

)

/

χ (

(9)

Chi-squared test of independence – example

We verify independence of political and

musical preferences, for signif. level α =0.05

Source: W. Niemiro

Support X Do not support X Total

Listen to jazz 25 10 35

Listen to rock 20 20 40

Listen to hip-hop 15 10 25

Total 60 40 100

57 . 100 3

/ 25

* 40

) 100 / 25

* 40 10

( 100

/ 40

* 40

) 100 / 40

* 40 20

( 100

/ 35

* 40

) 100 / 35

* 40 10

(

100 / 25

* 60

) 100 / 25

* 60 15

( 100

/ 40

* 60

) 100 / 40

* 60 20

( 100

/ 35

* 60

) 100 / 35

* 60 25

(

2 2

2

2 2

2 2

+

+ +

+ +

= χ

99 . 5 )

2 ( ))

1 3

)(

1 2

((

02.95

2 05 . 0

1

− − = χ ≈

χ

→ no grounds to reject H0.

(10)

Bayesian Statistics vs. traditional statistics

Frequentist: unknown parameters are given (fixed), observed data are random

Bayesian: observed data are given (fixed),

parameters are random

(11)

Bayesian Statistics

Our knowledge about the unknown parameters is described by means of probability distributions, and additional knowledge may affect our description.

Knowledge:

general specific

Example: coin toss

(12)

Bayesian Model

X

1

, ..., X

n

come from distribution P

θ

, with density f

θ

(x) – conditional density given a specific value of θ (likelihood function).

P – family of probability distributions P

θ

, indexed by the parameter θ ∈Θ

General knowledge: distribution Π over the parameter space Θ, given by π ( θ ) – the so- called a priori/prior distribution of θ ,

θ ~ Π

(13)

Bayesian Model – cont.

Additional knowledge (specific, contextual):

based on observation. We have a joint distribution of observations and θ :

on this basis we can derive the conditional distribution of θ (given the observed data)

where

is a marginal distribution for the obs.

) ( )

| ,...,

, (

) , ,...,

,

( x

1

x

2

x

n

θ f x

1

x

2

x

n

θ π θ

f =

) , ,...,

(

) ( )

| ,...,

) ( ,...,

| (

1 1

1

n n

n

m x x

x x

x f

x θ π θ

θ

π =

θ θ

π

θ d

x x

f x

x

m (

1

,...,

n

) = ∫

Θ

(

1

,...,

n

| ) ( )

(14)

Bayesian Model – a posteriori distribution

is called the a posteriori/

posterior distribution, denoted Π

x

The posterior distribution reflects all

knowledge: general (initial) and specific (based on the observed data).

Grounds for Bayesian inference and modeling

) ,...,

|

( θ x

1

x

n

π

(15)

A priori and a posteriori distributions: examples

1.Let X

1

, ..., X

n

be IID r.v. from a 0-1 distr. with prob. of success θ ; let

for θ ∈(0,1) where

and

then the posterior distribution:

conjugate prior

for Bernoulli distr.

) ,

(

) 1

) ( (

1 1

β α

θ θ θ

π

α β

B

=

) (

) ( ) ) (

1 ( )

,

( 1

0

1 1

β α

β β α

α α β

+ Γ

Γ

= Γ

=

u u du

B

) 1 (

) 1 (

) exp(

)

( 0

1 = Γ

=

Γ α

uα u du α α

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

Beta(α,β) distr with mean

= α/(α+ β)

(16)

For a Beta (1,1) prior and data: n=10 and 1, 5, 9 successes

(17)

For a Beta (1,1) prior and data: n=100 and 10, 50, 90 successes

(18)

For a Beta (10,10) prior and data: n=10 and 1, 5, 9 successes

(19)

For a Beta (10,10) prior and data: n=100 and 10, 50, 90 successes

(20)

For a Beta (1,5) prior and data: n=10 and 1, 5, 9 successes

(21)

For a Beta (1,5) prior and data: n=100 and 10, 50, 90 successes

(22)

A priori and a posteriori distributions: examples (2)

2. Let X

1

, ..., X

n

be IID r.v. from N( θ , σ

2

), and σ

2

known; θ ~N(m, τ

2

) for m, τ known.

Then the posterior distribution for θ :

conjugate prior

for a normal distr.

 

 

+ +

+

2 2

2 2

2 2

1 1

1 1

1

1

1

,

τ σ

τ σ

τ σ

n n

m X

N n

(23)

Bayesian Statistics

Based on the Bayes approach, we can find estimates

find an equivalent of confidence intervals verify hypotheses

make predictions

(24)

Bayesian Most Probabale (BMP) / Maximum a posteriori Probability (MAP) estimate

Similar to ML estimation: the argument which maximizes the posterior distribution:

i.e.

) ,...,

| ( max

) ,...,

ˆ |

( θ

BMP

x

1

x

n

π θ x

1

x

n

π =

θ

) ,...,

| ( max

ˆ arg )

(

BMP

x

1

x

n

BMP θ = θ =

θ

π θ

(25)

BMP: examples

1. Let X

1

, ..., X

n

be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ ∈(0,1)

We know the posterior distribution:

we have max for

i.e. for 5 successes in 10 trials for an a priori U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same a priori distr., we have BMP(θ )=9/10

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

α β

B

=

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

2 ) 1

(

1

− +

+

= ∑

=

+

α β

θ α

n BMP x

n

i i

(26)

BMP: examples (2)

2. Let X

1

, ..., X

n

be IID r.v. from N( θ , σ

2

), with σ

2

known; θ ~N(m, τ

2

) for m, τ known.

Then the posterior distr. for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(

θ

, 4) and the a priori distr is

θ

~N(1, 1), then

BMP(

θ

) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the a priori distr were

θ

~N(3, 1), then

BMP(

θ

) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

+ +

+

2 2

2 2

2 2

1 1

1 1

1

1 1

,

τ σ

τ σ

τ σ

n n

m X

N n

2 2

2 2

1 1

1 1

) (

τ σ

τ

θ

σ

+

= +

n

m X

BMP n

(27)

Bayes Estimator

An estimation rule which minimizes the

posterior expected value of a loss function

L( θ , a) – loss function, depends on the true value of θ and the decision a.

e.g. if we want to estimate g(

θ

):

L(

θ

, a) = (g(

θ

) - a)2 – quadratic loss function L(

θ

, a) = |g(

θ

) - a| – module loss function

(28)

Bayes Estimator – cont.

We can also define the accuracy of an estimate for a given loss function :

(the average loss of the estimator for a given a priori distribution and data, i.e. for a specific

posterior distribution)

( = ) = ∫

Θ

=

Π g x E L θ g x X x L θ g x π θ x d θ

acc ( , ˆ ( )) ( , ˆ ( )) | ( , ˆ ( )) ( | )

(29)

Bayes Estimator – cont. (2)

The Bayes Estimator for a given loss function L( θ , a) is such that

For a quadratic loss function ( θ – a)

2

:

For a module loss function | θ – a|

2

:

B

) ,

( min

)) ˆ (

, (

acc g x acc a

x Π

B

=

a

Π

) (

)

| ˆ (

x B

= E θ X = x = E Π

θ

) ˆ (

x B

= Med Π

θ

more generally: E(g(θ)|x)

(30)

Bayes Estimator: Example (1)

1. Let X

1

, ..., X

n

be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ ∈(0,1)

We know the posterior distribution:

so the Bayes Estimator is

i.e. for 5 successes in 10 trials for an a priori U(0,1) (i.e. Beta(1,1) distr.), we have =6/12 = ½

and for 9 successes in 10 trials for the same a priori distr., we have

=10/12 = 5/6

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

α β

B

= )

, (

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

Beta(α,β) distr with mean

= α/(α+ β)

α β

θ α

+ +

= ∑

=

+ n

n

x

i i

B

ˆ

1

θ

ˆB

θ

ˆB

(31)

BMP: examples

1. Let X

1

, ..., X

n

be IID r.v. from a Bernoulli distr. with prob. of success θ ; for θ ∈(0,1)

We know the poster distribution:

we have max for

i.e. for 5 successes in 10 trials for an a priori U(0,1) (i.e. Beta(1,1) distr.), we have BMP(θ)=5/10 = ½

and for 9 successes in 10 trials for the same a priori distr., we have BMP(θ )=9/10

) , (

) 1

) ( (

1 1

β α

θ θ θ

π

α β

B

=

Beta(α,β) distr; the mode of this distr

= (α-1)/(α+ β-2) for α>1, β>1

) ,

(

Beta ∑

=1

+ α − ∑

=1

+ β

n

i i

n

i

x

i

n x

2 ) 1

(

1

− +

+

= ∑

=

+

α β

θ α

n BMP x

n

i i

(32)

Bayes Estimator: examples (2)

2. Let X

1

, ..., X

n

be IID r.v. from N( θ , σ

2

), with σ

2

known; θ ~N(m, τ

2

) for m, τ known.

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(

θ

, 4) and the a priori distr is

θ

~N(1, 1), then

= (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the a priori distr were

θ

~N(3, 1), then

= (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

+ +

+

2 2

2 2

2 2

1 1

1 1

1

1 1

,

τ σ

τ σ

τ σ

n n

m X

N n

2 2

2 2

1 1

1 1

ˆ

τ σ

τ

θ

σ

+

= +

n

m X

n

B

θ

ˆB

θ

ˆB

(33)

BMP: examples (2)

2. Let X

1

, ..., X

n

be IID r.v. from N( θ , σ

2

), with σ

2

known; θ ~N(m, τ

2

) for m, τ known.

Then the a posteriori distr for θ : so

i.e. if we have sa sample of 5 obs 1.2; 1.7 ; 1.9 ; 2.1; 3.1 from distr. N(

θ

, 4) and the a priori distr is

θ

~N(1, 1), then

BMP(

θ

) = (5 /4 * 2 + 1)/(5/4 + 1) = 14/9 ≈ 1.56 and if the a priori distr were

θ

~N(3, 1), then

BMP(

θ

) = (5 /4 * 2 + 1*3)/(5/4 + 1) = 22/9 ≈ 2.44

+ +

+

2 2

2 2

2 2

1 1

1 1

1

1 1

,

τ σ

τ σ

τ σ

n n

m X

N n

2 2

2 2

1 1

1 1

) (

τ σ

τ

θ

σ

+

= +

n

m X

BMP n

Cytaty

Powiązane dokumenty

Przyjmuj¸ ac poziom istotno´sci 0,05 zweryfikowa´ c hipotez¸e, ˙ze prawdopodobie´ nstwo wyst¸ apienia na tym terenie wypadku spowodowanego przez kierowc¸e w stanie nie- trze´

– Czy możemy obliczyć prawdopodobieństwo wystąpienia nowotworu przy spożywaniu więcej niż 80 g alkoholu dziennie na podstawie badań przypadek-kontrola (case-control).

Poniższa tabela przedstawia liczby prosiąt zdrowych i chorych na nosoryjówke w zależności od tego, czy matka była zdrowa, czy też chora.. Zbadać, czy istnieje zależność

Przyjmując poziom istotności 0,05 zweryfikować hipotezę, że prawdopodobieństwo wystą- pienia na tym terenie wypadku spowodowanego przez kierowcę w stanie nietrzeźwym jest

Test Chi-kwadrat – zadania do samodzielnego

One hundred people were interviewed outside a chocolate shop to nd out which avor of chocolate cream they preferred... Therefore, we reject the null

One hundred people were interviewed outside a chocolate shop to nd out which avor of chocolate cream they preferred... Therefore, we reject the null

Uderzenie ukrytą pięścią (Dłoń zakrywająca ramię i pięść) Yan Shou Gong Quan | 掩手肱拳.. Zielononiebieski (lazurowy) smok wynurza się z wody Qin Lung Chu Shui |