• Nie Znaleziono Wyników

1. Fisher information

N/A
N/A
Protected

Academic year: 2021

Share "1. Fisher information"

Copied!
21
0
0

Pełen tekst

(1)

Mathematical Statistics

Anna Janicka

Lecture VI, 25.03.2019

PROPERTIES OF ESTIMATORS, PART II

(2)

Plan for Today

1. Fisher information

2. Information inequality 3. Estimator efficiency

4. Asymptotic estimator properties

consistency

asymptotic normality

asymptotic efficiency

(3)

Fisher information

If a statistical model with obs. X

1

, X

2

, ..., X

n

and probability f

θ

fulfills regularity conditions, i.e.:

1. Θ is an open 1-dimensional set.

2. The support of the distribution {x: f

θ

(x)>0} does not depend on θ .

3. The derivative exists.

we can define Fisher information

(Information) for sample X

1

, X

2

, ..., X

n

:

we do not assume independence of X1, X2, ..., Xn θ

θ

d df

( ln (

1

,

2

,..., ) )

2

)

(

dd n

n

E f X X X

I θ =

θ θ θ

(4)

Fisher information – what does it mean?

It is a measure of how much a sample of size n can tell us about the value of the unknown parameter θ (on average).

If the density around θ is flat, then

information from a single observation or a

small sample will not allow to differentiate

among possible values of θ . If the density

around θ is steep, the sample contributes

a lot of info leading to θ identification.

(5)

Fisher Information – cont.

Some formulae:

if the distribution is continuous

if the distribution is discrete

if f

θ

is twice differentiable

dx x

x f

I f

d

x df

n

( )

) ) (

(

) 2 (

θ θ

θθ

θ ∫  

 

= 

X

 

 

= 

X x

d x dP

n

P x

x

I P ( )

) ) (

(

) 2 (

θ θ

θθ

θ

( ln ( , ,..., ) )

)

(

2 1 2

2

d n d

n

E f X X X

I θ = −

θ θ θ

(6)

Fisher information – cont. (2)

If the sample consists of independent random variables from the same

distribution, then

where I1(

θ

) is Fisher information for a single observation

) (

)

( θ nI

1

θ

I

n

=

(7)

Fisher Information – examples

Exponential distribution exp(λ)

Poisson distribution Poiss( θ )

1 2

... 1 )

( λ = = λ

I

θ ) ... θ 1

1

( = =

I

(8)

Information Inequality (Cramér-Rao)

Let X=(X

1

, X

2

, ..., X

n

) be observations from a joint distribution with density f

θ

(x), where θ ∈ Θ ⊆ R. If:

T(X) is a statistic with a finite expected value, and Eθ T(X)=g(

θ

)

Fisher information is well defined, In(

θ

) ∈(0,∞) All densities fθ have the same support

The order of differentiating (d/d

θ

) and integrating ∫.... dx may be reversed.

Then, for any θ : ( )

) (

) (

) ' (

Var

2

θ θ

θ

I

n

X g

T

(9)

Information inequality – implications

The MSE of an unbiased estimator (= the variance) cannot be lower than a given function of n and θ .

If the MSE of an estimator is equal to the lower bound of the information inequality, then the estimator is MVUE.

If is an unbiased estimator of θ ˆ X ( ) θ , then )

( ) 1

ˆ (

Var

θ

θ θ

I

n

X

(10)

Information inequality – examples

In the Poisson model, is MVUE( θ )

In the exponential model, is MVUE(1/ λ )

The Cramér-Rao inequality is not always optimal.

In the exponential model, is a biased estimator of λ .

is an unbiased estimator, which is also MVUE( λ ), although its variance is higher than the bound in the Cramér-Rao inequality.

= X

θ ˆ

X

X / ˆ = 1

λ

X n

~ n 1

λ =

X n

Varθ ( ) = θ

2

) 1

( λ

λ X n

Var =

(11)

Efficiency

The efficiency of an unbiased estimator of g( θ ) is:

Relative efficiency of unbiased estimators and :

) ˆ X (

g ( )

) ( ˆ )

( Var

) ( ) '

( ˆ ef

2

θ θ

θ

g I

n

g g

= ⋅

) ˆ

1

( X

g g ˆ

2

( X )

ˆ ) ( ef

ˆ ) ( ef ˆ )

( Var

ˆ ) ( ) Var

, ˆ ( ˆ

ef

2 1 1

2 2

1

g

g g

g g

g = =

θ θ

(12)

Efficiency and the information inequality

If the information inequality holds, then for any unbiased estimator

If = MVUE(g), then it is possible that

, but it is also possible that

If , then the estimator is efficient.

Cramér-Rao efficiency

1 ˆ )

(

ef g

1 ˆ )

(

ef g =

1 ˆ )

(

ef g <

1 ˆ )

(

ef g =

(13)

Efficiency – examples

In the Poisson model, is efficient.

In the exponential model, is an efficient estimator of 1/ λ .

In the exponential model,

is not an efficient estimator of λ , although it is MVUE( λ ).

In a uniform model U(0, θ ), for the MLE(θ) we get ef >1 (that is because the assumptions of the information inequality are not fulfilled)

= X

θ ˆ

X

X n ˆ n 1

λ =

(14)

Asymptotic poperties of estimators

Limit theorems describing estimator properties when n→∞

In practice: information on how the

estimators behave for large samples, approximately

Problem: usually, there is no answer to the

question what sample is large enough (for

the approximation to be valid)

(15)

Consistency

Let X

1

, X

2

, ..., X

n

,... be an IID sample (of

independent random variables from the same

distribution) . Let be a

sequence of estimators of the value g( θ ).

is a consistent estimator, if for all θ ∈Θ, for any ε >0:

(i.e. converges to g( θ ) in probability) )

,..., ,

ˆ ( X

1

X

2

X

n

g

1 )

| ) ( )

,..., ,

ˆ ( (|

lim

1 2

− ≤ =

P

θ

g X X X

n

g θ ε

n

(16)

Strong consistency

Let X

1

, X

2

, ..., X

n

,... be an IID sample (of

independent random variables from the same

distribution). Let be a

sequence of estimators of the value g( θ ).

is strong consistent, if for any θ ∈Θ:

(i.e. converges to g( θ ) almost surely) )

,..., ,

ˆ ( X

1

X

2

X

n

g

P

θ

( lim

g ˆ ( X

1

, X

2

,..., X

n

) = g ( θ ) ) = 1

n

(17)

Consistency – note

From the Glivenko-Cantelli theorem it follows that empirical CDFs converge almost surely to the theoretical CDF.

Therefore, we should expect (strong)

consistency from all sensible estimators.

Consistency = minimal requirement for a

sensible estimator.

(18)

Consistency – how to verify?

From the definition: for example with the use of a version of the Chebyshev inequality:

Given that the MSE of an estimator is

we get a sufficient condition for consistency:

From the LLN

2

))

2

( )

( ) (

| ) ( )

(

(| ε

ε θ

θ E g X g

g X

g

P

− )

)

))

2

( )

ˆ ( ( ˆ )

,

( θ g E

θ

g X g θ

MSE = −

0 ˆ )

, (

lim =

MSE g

n

θ

(19)

Consistency – examples

For any family of distributions with an

expected value: the sample mean is a

consistent estimator of the expected value µ ( θ )=E

θ

(X

1

). Convergence from the SLLN.

For distributions having a variance:

and

are consistent estimators of the variance σ

2

( θ )=Var

θ

(X

1

). Convergence from the SLLN.

X

n

=

=

n

i i

n n

X X

S

1

2 1

2 1

)

( = ∑

n=

i i

n n

X X

S

1

1 2

2

( )

ˆ

(20)

Consistency – examples/properties

An estimator may be unbiased but

unconsistent; eg. T

n

(X

1

, X

2

, ..., X

n

)=X

1

as an estimator of µ ( θ )=E

θ

(X

1

).

An estimator may be biased but

consistent; eg. the biased estimator of

the variance or any unbiased consistent

estimator + 1/n.

(21)

Cytaty

Powiązane dokumenty

By means of a study based on multi-epoch high-resolution spectra of 83 A–F-type candidate hybrid pulsating stars from the Kepler mission, collected at various observatories, we derive

The last of the factor maps, a map of mesocli- matic diversity (Fig. Its mosaicism is very large. In Dębnica River catchment 7% of the area comprises the areas of the

Celem pracy jest przedstawienie możliwości terapeu- tycznych światła spolaryzowanego w leczeniu bli- znowca u 63-letniego pacjenta po zabiegu operacyj- nym

Im większa wartość tego współczynnika, tym szerszy przedział ufności, a więc mniejsza dokładność estymacji parametru.. Im mniejsza wartość tego współczynnika, tym

• topicality ( q a ) – we consider the information contained in the data to be current, if its record available at some point describes within acceptable error limits the current

The condition from the definition means: the random interval includes the unknown value g( θ ) with given (high) probability. If we calculate the realization

Assume that the duration of an element of type A follows an exponential distribution with an unknown parameter a, and the duration of an element of type B follows an

[r]