1. Fisher information

(1)

Mathematical Statistics

Anna Janicka

Lecture VI, 25.03.2019

PROPERTIES OF ESTIMATORS, PART II

(2)

Plan for Today

1. Fisher information

2. Information inequality 3. Estimator efficiency

4. Asymptotic estimator properties

consistency

asymptotic normality

asymptotic efficiency

(3)

Fisher information

If a statistical model with obs. X

₁

, X

₂

, ..., X

_n

and probability f

_θ

fulfills regularity conditions, i.e.:

1. Θ is an open 1-dimensional set.

2. The support of the distribution {x: f

_θ

(x)>0} does not depend on θ .

3. The derivative exists.

we can define Fisher information

(Information) for sample X

₁

, X

₂

, ..., X

_n

:

we do not assume independence of X₁, X₂, ..., X_n θ

θ

d df

( ^ln ⁽

₁

^,

₂

^,..., ⁾ )

²

)

(

_d^d _n

n

E f X X X

I θ =

_θ _θ _θ

(4)

Fisher information – what does it mean?

It is a measure of how much a sample of size n can tell us about the value of the unknown parameter θ (on average).

If the density around θ is flat, then

information from a single observation or a

small sample will not allow to differentiate

among possible values of θ . If the density

around θ is steep, the sample contributes

a lot of info leading to θ identification.

(5)

Fisher Information – cont.

Some formulae:

if the distribution is continuous

if the distribution is discrete

if f

_θ

is twice differentiable

dx x

x f

I f

^d

x df

n

( )

) ) (

(

) 2 (

θ θ

θθ

θ ∫  





 



= 

X

∑

∈

 





 



= 

X x

d x dP

n

P x

x

I P ( )

) ) (

(

) 2 (

θ θ

θθ

θ

( ^ln ⁽ ^, ^,..., ⁾ )

)

(

2 ₁ ₂

2

d n d

n

E f X X X

I θ = −

_θ _θ _θ

(6)

Fisher information – cont. (2)

If the sample consists of independent random variables from the same

distribution, then

where I₁(

θ

) is Fisher information for a single observation

) (

)

( θ ^nI

₁

θ

I

_n

=

(7)

Fisher Information – examples

Exponential distribution exp(λ)

Poisson distribution Poiss( θ )

1 2

... 1 )

( λ = = λ

I

θ ⁾ ^... θ ¹

1

( = =

I

(8)

Information Inequality (Cramér-Rao)

Let X=(X

₁

, X

₂

, ..., X

_n

) be observations from a joint distribution with density f

_θ

(x), where θ ∈ Θ ⊆ R. If:

T(X) is a statistic with a finite expected value, and E_θ T(X)=g(

θ

)

Fisher information is well defined, I_n(

θ

) ∈(0,∞) All densities f_θ have the same support

The order of differentiating (d/d

θ

) and integrating ∫.... dx may be reversed.

Then, for any θ : ( )

) (

) ' (

Var

2

θ θ

θ

I

n

X g

T ≥

(9)

Information inequality – implications

The MSE of an unbiased estimator (= the variance) cannot be lower than a given function of n and θ .

If the MSE of an estimator is equal to the lower bound of the information inequality, then the estimator is MVUE.

If is an unbiased estimator of θ ^{ˆ X} ⁽ ⁾ θ , then )

( ) 1

ˆ (

Var

_θ

θ θ

I

n

X ≥

(10)

Information inequality – examples

In the Poisson model, is MVUE( θ ⁾

In the exponential model, is MVUE(1/ λ )

The Cramér-Rao inequality is not always optimal.

In the exponential model, is a biased estimator of λ .

is an unbiased estimator, which is also MVUE( λ ), although its variance is higher than the bound in the Cramér-Rao inequality.

= X

θ ^ˆ

X

X / ˆ = 1

λ

X n

~ n 1 −

λ =

X n

Var_θ ( ) = ^θ

2

) 1

( _λ

λ X _n

Var =

(11)

Efficiency

The efficiency of an unbiased estimator of g( θ ) is:

Relative efficiency of unbiased estimators and :

) ˆ X (

g ( )

) ( ˆ )

( Var

) ( ) '

( ˆ ef

2

θ θ

θ

g I

n

g g

= ⋅

) ˆ

₁

( X

g g ˆ

₂

( X )

ˆ ) ( ef

ˆ ) ( ef ˆ )

( Var

ˆ ) ( ) Var

, ˆ ( ˆ

ef

2 1 1

2 2

1

g

g g

g = =

θ θ

(12)

Efficiency and the information inequality

If the information inequality holds, then for any unbiased estimator

If = MVUE(g), then it is possible that

, but it is also possible that

If , then the estimator is efficient.

Cramér-Rao efficiency

1 ˆ )

(

ef g ≤ gˆ

1 ˆ )

(

ef g =

1 ˆ )

(

ef g <

1 ˆ )

(

ef g =

(13)

Efficiency – examples

In the Poisson model, is efficient.

In the exponential model, is an efficient estimator of 1/ λ .

In the exponential model,

is not an efficient estimator of λ , although it is MVUE( λ ).

In a uniform model U(0, θ ), for the MLE(θ) we get ef >1 (that is because the assumptions of the information inequality are not fulfilled)

= X

θ ^ˆ

X

X n ˆ n 1 −

λ =

(14)

Asymptotic poperties of estimators

Limit theorems describing estimator properties when n→∞

In practice: information on how the

estimators behave for large samples, approximately

Problem: usually, there is no answer to the

question what sample is large enough (for

the approximation to be valid)

(15)

Consistency

Let X

₁

, X

₂

, ..., X

_n

,... be an IID sample (of

independent random variables from the same

distribution) . Let be a

sequence of estimators of the value g( θ ).

is a consistent estimator, if for all θ ∈Θ, for any ε >0:

(i.e. converges to g( θ ) in probability) )

,..., ,

ˆ ( X

₁

X

₂

X

_n

g

gˆ

1 )

| ) ( )

,..., ,

ˆ ( (|

lim

₁ ₂

− ≤ =

∞

→

P

_θ

g X X X

_n

g θ ε

n

gˆ

(16)

Strong consistency

Let X

₁

, X

₂

, ..., X

_n

,... be an IID sample (of

independent random variables from the same

distribution). Let be a

sequence of estimators of the value g( θ ).

is strong consistent, if for any θ ∈Θ:

(i.e. converges to g( θ ) almost surely) )

,..., ,

ˆ ( X

₁

X

₂

X

_n

g

gˆ P

θ

( ^lim

_→_∞

g ^ˆ ⁽ X

¹

^, X

²

^,..., X

_n

⁾ ⁼ g ⁽ ^θ ⁾ ) ⁼ ¹

n

gˆ

(17)

Consistency – note

From the Glivenko-Cantelli theorem it follows that empirical CDFs converge almost surely to the theoretical CDF.

Therefore, we should expect (strong)

consistency from all sensible estimators.

Consistency = minimal requirement for a

sensible estimator.

(18)

Consistency – how to verify?

From the definition: for example with the use of a version of the Chebyshev inequality:

1. Fisher information

Mathematical Statistics

Anna Janicka

Plan for Today