Mathematical Statistics Anna Janicka
Lecture IV, 11.03.2019
POINT ESTIMATION
Plan for today
1. Estimation
2. Sample characteristics as estimators 3. Estimation techniques
method of moments method of quantiles
maximum likelihood method
Point Estimation
The choice, on the base of the data, of the best parameter θ , from the set of
parameters which may describe P
θAn Estimator of parameter θ is any statistic
with values in Θ (we interpret it as an
approximation of θ ). Usually denoted by Sometimes we estimate g( θ ) rather than θ .
) ,...,
,
( X
1X
2X
nT
T =
θ ˆ
Estimation: an example Empirical frequency
Quality control example:
0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
Model: X = {0,1, 2, ..., n} (here n=50), for θ ∈[0,1]
parameter θ : probability of faulty element an obvious estimator:
n – sample size
X – number of faulty elements in sample
650
ˆ =
X n=
θ
x n x
x x n
X
P − −
=
= ) (1 )
( θ θ
θ
For a different model (all outcomes) the estimator is an average
Problems with (frequency) estimators...
Example: three genotypes in a population, with frequencies
In a population of size n, N
1and N
2and N
3individuals of particular genotypes were
observed.
Should we take ? or rather
? How about ? Maybe something else?
→ How do we choose the best one?
2 2 : 2
θ
(1θ
) : (1θ
)θ
− −n N1
θ ˆ =
n N3
ˆ = 1 −
θ
nN n
N1 2 2
ˆ = + 1
θ
Estimation – sample characteristics
Sample characteristics:
estimators based on the empirical
distribution (empirical CDF)
Empirical CDF
Let X
1, X
2, ..., X
nbe a sample from a
distribution given by F (modeled by {P
F}) (n-th) empirical CDF
For a given realization {X
i} it is a function of t, the CDF of the empirical distribution (uniform over x
1, x
2, ..., x
n). For a given t it is a statistic with a distribution
n
t n X
i t i
n n
i
X
it
F = ∑
=1 (−∞, ]=
number of observations :X ≤1
( )
)
ˆ ( 1
n k
t F t
k F t n
F
P( ˆ( ) kn) ( )k (1− ( ))n k, = 0,1,...,
=
= −
Empirical CDF: properties
1.
2.
3. from CLT:
i.e., for any z:
4. Glivenko-Cantelli Theorem for
) ( )
ˆ ( t F t F
E
F n=
)) (
1 )(
( )
ˆ (
Var F
nt =
n1F t − F t
) 1 , 0 )) (
( 1
)(
(
) ( )
ˆ (
N t n
F t
F
t F t
F
n
n →
−
−
∞
→
) )) (
( 1 )(
(
) ( )
ˆ (
z z
t n F t
F
t F t
P Fn → Φ
≤
−
−
0
| ) ( )
ˆ (
|
sup − →
. .∈
s a t n
t F t
F
R
n → ∞
if sample size increases, we will approximate the
unknown distribution with any given level of precision
Order statistics
Let X
1, X
2, ..., X
nbe a sample from a
distribution with CDF F. If we organize the observations in ascending order:
X
1:n, X
2:n, ..., X
n:n← order statistics (X
1:n= min, X
n:n= max)
An empirical CDF is a stair-like function,
constant over intervals [X
i:n, X
i+1:n)
Distribution of order statistics
Let X
1, X
2, ..., X
nbe independent random variables from a distribution with CDF F.
Then X
k:nhas a CDF equal to
If additionally the distribution is continuous with density f, then X
k:nhas density
( ) ( )
∑
=−
−
=
≤
=
nk i
i n i
n k n
k
F x F x
i x n
X P x
F
:( ) (
:) ( ) 1 ( )
( ) (
k)
n kn
k
f x F x F x
k n n x
f
−−
−
−
= − ( ) ( ) 1 ( )
1 ) 1
(
1:
Sample moments and quantiles as estimators
Sample moments and quantiles are
moments and quantiles of the empirical distribution, so they are estimators of the corresponding theoretical values.
sample mean = estimator of the expected value
sample variance = estimator of variance sample median = estimator of median
sample quantiles = estimators of quantiles
Method of Moments Estimation (MM)
We compare the theoretical moments
(depending on unknown parameter(s)) to their empirical counterparts.
Justification: limit theorems
We need to solve a (system of)
equation(s).
EMM – cont.
If θ is single-dimensional, we use one equation, usually:
If θ is two-dimensional, we use two equations, usually:
If θ is k-dimensional, we use k equations, usually
X X
E
θ=
=
=
ˆ
2Var
, S X
X X
E
θ θ
−
=
−
−
=
−
=
=
∑ ∑
=
=
n i
k n i
k
n
i i
n
X X
X E X
E
X X
X E X
E
S X
X X
E
1 1
1 1 3
3 2
) (
) (
....
, ) (
) (
ˆ , Var
,
θ θ
θ θ
θ θ
MME – Example 1.
Exponential model: X
1, X
2, ..., X
nare a sample from an exponential distr. Exp( λ ).
we know:
equation:
solution:
λ
λ
= 1 X E
= X
λ
1
MME ( ) ˆ
MMX 1
ˆ = λ = λ =
λ
MME – Example 2.
Gamma model: X
1, X
2, ..., X
nare a sample from distr. Gamma( α , λ ).
We know:
System of equations:
Solution:
, 2 , , Var
λ α λ
α
λ α λ
α X = X =
E
2
2 ˆ
, S
X =
=
λ
α λ
α
2 2 2
, ˆ ˆ ˆ ˆ
S X S
X
MM
MM
= α =
λ
, Γ dla 0
Method of Quantiles Estimation (MQ)
If moments are hard to calculate or
formulae are complicated, we can use
quantiles instead of moments. We choose as many levels of p as we have
parameters, and we put
or equivalently
p
p
q
q ( θ ) = )
p q
F ( )
p) =
θ
MQE – Example 1.
Exponential model: X
1, X
2, ..., X
nare a sample from an exponential distr. Exp( λ ).
CDF: for λ >0
one parameter → one equation, usually for the median
solution:
) exp(
1 x
F
λ= − − λ
2 1 2
/ 1
) exp( ˆ
1 − − λ q =
Med 2 ln ˆ
ˆ ln )
(
2 / 1
2 1 =
−
=
= q
MQE
λ λ
MQMQE – Example 2.
Weibull Model: X
1, X
2, ..., X
nare a sample from a distribution with CDF
where b, c >0 are unknown parameters.
two parameters → two equations, usually quartiles
solution:
) exp(
,
1
b c
b
cx
F = − −
for b=1
exponential distr. with parameter c
=
−
−
=
−
−
34 4
/ 3
14 4
/ 1
ˆ ) exp(
1
ˆ ) exp(
1
b b
q c
q c
b MQ
MQ
q c
c MQE
q q
b b
MQE
ˆ 4 / 3
4 / 1 4
/ ) 3
3 ln 4 4(ln ln
4 ˆ ln )
(
ˆ ), ˆ ln
/(ln )
ˆ ln(
) (
−
−
=
=
−
=
= )
Properties of MME and MQE estimators
Simple conceptually
Not too complicated calculations
BUT: sometimes not optimal (large
errors, bad properties for small samples) Better method (usually): maximum
likelihood
Maximum Likelihood Estimation (MLE)
We choose the value of θ for which the
obtained results have the highest probability Likelihood – describes the (joint) probability f (density or discrete probability) treated as a function of θ , for a given set of observations;
L:Θ→R
) ,...,
,
; ( )
( f x
1x
2x
nL θ = θ
Maximum Likelihood Estimator
is the MLE of θ , if
for any x
1, x
2, ..., x
n. Denoted:
MLE(g( θ )) = g(MLE( θ )) )
,..., ,
ˆ (
ˆ θ X
1X
2X
nθ =
) ,...,
,
; ( sup
) ,...,
, );
,..., ,
ˆ ( (
2 1
2 1
2 1
n n
n
x x
x f
x x
x x
x x
f
θ θ
θ∈Θ
=
=
) ˆ (
ˆ θ θ
θ =
ML= MLE
independence of observations not required in the definition, but greatly simplifies calculations
MLE – practical problems
Usually: sample of independent obs.
Then:
If L( θ ) is differentiable, and θ is k-
dimensional, then the maximum may be found by solving:
very frequently: instead of max L( θ ) we look for max l( θ ) = ln(L( θ ))
) (
)...
( )
( )
( f x
1f x
2f x
nL θ =
θ θ θk L j
j
,..., 2
, 1
, ) 0
( = =
∂
∂
θ
θ
MLE – Example 1.
Quality control, cont. We maximize or equivalently maximize
i.e. solve solution:
x n x
x x n
X P
L − −
=
=
= ( ) (1 )
)
(θ θ θ θ
) 1
ln(
) (
) ln(
ln )
) 1
ln((
) ln(
ln )
(θ θ θ + θ + − −θ
=
− +
+
= − x n x
x n x
l n x n x
1 0 )
(
' =
−
− −
= θ θ
θ x n x l
n
MLE ( θ ) = θ ˆ
ML= x
MLE – Example 2.
Exponential model: X
1, X
2, ..., X
nare a sample from Exp( λ ), λ unknown.
We have:
we maximize we solve
we get
xi
n
n
e
x x
x f
L ( λ ) =
λ(
1,
2,..., ) = λ
−λ Σx
in L
l ( λ ) = ln ( λ ) = ln λ − λ Σ 0 )
(
' = n − Σ x
i=
l λ λ
ML
X ˆ = 1
λ
MLE – Example 3.
Normal model: X
1, X
2, ..., X
nare a sample from N( µ , σ
2). µ , σ unknown.
we solve
we get:
( ) ( )
( )
(
2 2)
2 1 2
2 2
1 2
1
2 ln
) 2 ln(
) (
exp ln
) , (
2 2
µ µ
σ π
µ σ
µ
σ σ σ
π
n x
x n
x l
i i
n
i n
+ Σ
− Σ
−
−
−
=
−
−
=
∑
=
− Σ
=
= +
Σ
− Σ
+
−
=
∂∂
∂∂
0
0 )
2 (
2 2
3
1
2 1 2
σ µ µ σ
σ σ
σ
µ µ
n i
l
i i
l n
x n x
x
1 2
2