Mathematical Statistics
Anna Janicka
Lecture VII, 1.04.2019
ESTIMATOR PROPERTIES, PART III
CONFIDENCE INTERVALS – INTRO
Plan for Today
1. Asymptotic properties of estimators – cont.
asymptotic normality asymptotic efficiency
2. Consistency, asymptotic normality and asymptotic efficiency of MLE estimators
3. Interval estimation – confidence intervals
Asymptotic normality
is an asymptotically normal estimator of g( θ ), if for any θ ∈Θ there exists σ
2( θ ) such that, when n→∞
Convergence in distribution, i.e. for any a
in other words, the distribution of is for large n similar to
) ,...,
,
ˆ ( X
1X
2X
ng
( g ˆ ( X
1, X
2,..., X ) g ( θ ) ) N ( 0 , σ
2( θ ))
n
n− →
D(
ˆ( , ,..., ) ( ))
( ))
lim (n g X1 X2 X g a a
P n
n = Φ
− ≤
∞
→ θ
θ
θ σ
) ,...,
,
ˆ ( X
1X
2X
ng
) ),
(
( g
n2N θ
σAsymptotic normality – properties
An asymptotically normal estimator is consistent (not necessarily strongly).
A similar condition to unbiasedness – the expected value of the asymptotic
distribution equals g( θ ) (but the estimator does not need to be unbiased).
Asymptotic variance defined as
or – the variance of the asymptotic distribution
n )
2
( θ σ
)
2
( θ
σ
Asymptotic normality – what it is not
For an asymptotically normal estimator we usually have:
but these properties needn’t hold, because convergence in distribution does not imply convergence of moments.
) ( )
,..., ,
ˆ (
1 2θ
θ
g X X X g
E
n →
n→
∞) ( )
,..., ,
ˆ (
var g X
1X
2X
n →
n→
∞σ
2θ
n
Asymptotic normality – example
Let X
1, X
2, ..., X
n,... be an IID sample from a distribution with mean µ and variance σ
2. On the base of the CLT, for the sample
mean we have
In this case the asymptotic variance, , is equal to the estimator variance.
) ,
0 ( )
( X µ N σ
2n − →
Dn σ 2
Asymptotic normality – how to prove it
In many cases, the following is useful:
Delta Method. Let T
nbe a sequence of
random variables such that for n→∞ we have
and let h:R→R be a function differentiable at point µ such that h’( µ )≠0. Then
µ, σ2 are functions of θ
usually used when estimators are functions of statistics Tn, which can be easily shown co converge on the base of CLT
) ,
0 ( )
( T µ N σ
2n
n− →
D( h ( T ) h ( µ ) ) N ( 0 , σ
2( h ' ( µ ))
2)
n
n− →
DAsymptotic normality – examples cont.
In an exponential model:
From CLT, we get
so from the Delta Method for h(t)=1/t:
so is an asymptotically normal (and consistent) estimator of λ .
MLE ( λ ) =
X1) ,
0 ( )
(
1 12λ
N
λX
n − →
D) ) (
, 0 ( )
(
2) / 1 (
1 1
1
2
2 λ
λ →
λ⋅ −
− N
n
X DX 1
Asymptotic efficiency
For an asymptotically normal estimator
of g( θ ) we define asymptotic efficiency as
where σ
2( θ )/n is the asymptotic variance, i.e.
for n→∞
( g ˆ ( X
1, X
2,..., X ) g ( θ ) ) N ( 0 , σ
2( θ ))
n
n− →
D) ,...,
,
ˆ ( X
1X
2X
ng
( )
) , ( )
(
) ( ) '
( ˆ
as.ef
22
θ θ
σ
θ
I
nn g g
= ⋅
( )
) ( ) (
) ( ) '
( ˆ as.ef
1 2
2
θ θ
σ
θ I g g
= ⋅
modification of the definition of efficiency to the limit case, with the asymptotic
variance in place of the normal variance
Relative asymptotic efficiency
Relative asymptotic efficiency for asymptotically normal estimators
and
ˆ ) ( as.ef
ˆ ) ( as.ef )
( ) ) (
, ˆ ( ˆ
as.ef
2 1 2
1 2 2 2
1
g
g g
g = =
θ σ
θ σ
) ˆ
1( X
g g ˆ
2( X )
Note. A less (asymptotically) efficient estimator may have other properties, which will make it preferable to a more efficient one.
Relative asymptotic efficiency – examples.
Is the mean better than the median?
Depends on the distribution!
a) normal model N( µ , σ
2):
b) Laplace model Lapl( µ , λ )
c) some distributions do not have a mean...
Theorem: For a sample from a continuous distribution with density f(x), the sample median is an asymptotically normal estimator for the median m
(provided the density is continuous and ≠0 at point m):
(
X µ)
N(0,σ 2)n − →D
(
meˆd µ)
N(0,πσ22 )n − →D
1 )
, d eˆ m (
as.ef X = π2 <
(
X µ)
N(0, λ22 )n − →D
(
meˆd)
(0, 2 )1
µ N λ
n − →D as.ef(meˆd, X ) = 2 > 1
(
meˆd)
(0, 4( ( ))2 )1 m f
D N
m
n − →
Consistency of ML estimators
Let X1, X2, ..., Xn,... be a sample from a distribution with density fθ (x). If Θ ⊆ R is an open set, and:
all densities fθ have the same support;
the equation has exactly one solution, .
Then is the MLE(θ ) and it is consistent
Note. MLE estimators do not have to be unbiased!
0 ) (
ln θ = θ L
d d
θˆ
θ ˆ
Asymptotic normality of ML estimators
Let X1, X2, ..., Xn,... be a sample with density fθ (x), such that Θ ⊆ R is open, and is a consistent
m.l.e. (for example, fulfills the assumptions of the previous theorem), and
exists
Fisher Information may be calculated, 0<I1(θ )<∞
the order of integration with respect to x and derivation with respect to θ may be changed
then is asymptotically normal and
θ ˆ
) (
2 ln
2
θ L θ d
d
θ ˆ
( θ ˆ θ )
DN ( 0 ,
I1(1θ ))
n − →
Asymptotic normality of ML estimators
Additionally, if g:R→R is a function
differentiable at point θ , such that g’( θ ) ≠ 0, and is MLE(g( θ )), then
( ˆ (
1,
2,..., ) ( ) ) ( 0 ,
( '(( ))))
1
2
θ
θ
D gI θn
g N
X X
X g
n − →
) ,...,
,
ˆ ( X
1X
2X
ng
Asymptotic efficiency of ML estimators
If the assumptions of the previous theorems
are fulfilled, then the ML estimator (of θ or
g( θ )) is asymptotically efficient.
Asymptotic normality and efficiency of ML estimators – examples
In the normal model: the mean is an asymptotically efficient estimator of µ
In the Laplace model: the median is an asymptotically efficient estimator of µ
Examples
Summary: basic (point) estimator properties
bias
variance MSE
efficiency
consistency
asymptotic normality
asymptotic efficiency
Interval estimation – confidence intervals
We do not provide a single value estimate, but rather a lower and an upper bound for the estimate (the true value will fit into
these bounds with given probability)
We estimate with given precision
Confidence interval
Let g( θ ) be a function of unknown parameter θ , and let and
be statistics
Then, is a confidence interval for g( θ ) with a confidence level 1- α , if for any θ
) ,...,
,
( X
1X
2X
ng
g =
( θ ) α
θ
g ( X
1, X
2,..., X
n) ≤ g ( ) ≤ g ( X
1, X
2,..., X
n) ≥ 1 − P
) ,...,
,
( X
1X
2X
ng
g =
]
,
[ g g
Confidence intervals – use and interpretation
Typically:
α
is a small number, for example 1-α
= 0,95 or 1-α
= 0,99The condition from the definition means: the random interval includes the unknown value g(
θ
) with given (high) probability.If we calculate the realization of the
confidence interval (e.g. ) then
we CAN’T say that the unknown parameter is included in the range with probability 1-
α
anymore!
the parameter is either in the interval or not – the event is not random, it is just something we don’t know.
] , [g g
3 ,
1 =
= g g
Confidence intervals – construction
The confidence interval depends on the underlying probability distribution
Usually, normal samples are considered (the distribution most frequently
observed in nature)
Confidence intervals – construction cont.
Convenient method: we look for random
variables which depend on sample data and
parameter values, but whose distributions do not depend on unknown parameters (pivotal method) If U = U(X1, X2, ..., Xn, θ ) is such a function, then we look for confidence intervals [a,b] such that
Usually we look for „symmetric” CI
( ) α
θ a ≤ U ≤ b ≥ 1− P
( ) ( )
2 2 ,
α α
θ
θ U < a ≤ P U > b ≤
P