The choice, on the base of the data, of the best parameter θ , from the set of

(1)

Mathematical Statistics Anna Janicka

Lecture IV, 11.03.2019

POINT ESTIMATION

(2)

Plan for today

1. Estimation

2. Sample characteristics as estimators 3. Estimation techniques

method of moments method of quantiles

maximum likelihood method

(3)

Point Estimation

The choice, on the base of the data, of the best parameter θ , from the set of

parameters which may describe P

_θ

An Estimator of parameter θ is any statistic

with values in Θ (we interpret it as an

approximation of θ ). Usually denoted by Sometimes we estimate g( θ ) rather than θ .

) ,...,

,

( X

₁

X

₂

X

_n

T

T =

θ ^ˆ

(4)

Estimation: an example Empirical frequency

Quality control example:

0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

Model: X = {0,1, 2, ..., n} (here n=50), for θ ∈[0,1]

parameter θ : probability of faulty element an obvious estimator:

n – sample size

X – number of faulty elements in sample

650

ˆ =

^X _n

=

θ

x n x

x x n

X

P  − ⁻



 



= 

= ) (1 )

( θ θ

θ

For a different model (all outcomes) the estimator is an average

(5)

Problems with (frequency) estimators...

Example: three genotypes in a population, with frequencies

In a population of size n, N

₁

and N

₂

and N

₃

individuals of particular genotypes were

observed.

Should we take ? or rather

? How about ? Maybe something else?

→ How do we choose the best one?

2 2 : 2

θ

(1

θ

) : (1

θ

)

θ

− −

n N₁

θ ˆ =

n N₃

ˆ = 1 −

θ

n

N n

N₁ ₂ 2

ˆ = + 1

θ

(6)

Estimation – sample characteristics

Sample characteristics:

estimators based on the empirical

distribution (empirical CDF)

(7)

Empirical CDF

Let X

₁

, X

₂

, ..., X

_n

be a sample from a

distribution given by F (modeled by {P

_F

}) (n-th) empirical CDF

For a given realization {X

_i

} it is a function of t, the CDF of the empirical distribution (uniform over x

₁

, x

₂

, ..., x

_n

). For a given t it is a statistic with a distribution

n

t n X

i t i

n n

i

X

i

t

F = ∑

₌1 (_−∞, ]

=

^number ^of ^observatio^ns ^:^X ^≤

1

( )

)

ˆ ( 1

n k

t F t

k F t n

F

P( ˆ( ) ^k_n)  ( )^k (1− ( ))ⁿ ^k, = 0,1,...,



 



= 

= ⁻

(8)

Empirical CDF: properties

1.

2. 3. from CLT:

i.e., for any z:

4. Glivenko-Cantelli Theorem for

) ( )

ˆ ( t F t F

E

_F _n

=

)) (

1 )(

( )

ˆ (

Var F

_n

t =

_n¹

F t − F t

) 1 , 0 )) (

( 1

)(

(

) ( )

ˆ (

N t n

F t

F

t F t

F

n

n  →

−

∞

→

) )) (

( 1 )(

(

) ( )

ˆ (

z z

t n F t

F

t F t

P Fⁿ  → Φ









 ≤

−

0 | ) ( )

ˆ (

|

sup −   → 

^. ^.

∈

s a t n

t F t

F

R

n → ∞

if sample size increases, we will approximate the

unknown distribution with any given level of precision

(9)

Order statistics

Let X

₁

, X

₂

, ..., X

_n

be a sample from a

distribution with CDF F. If we organize the observations in ascending order:

X

_1:n

, X

_2:n

, ..., X

_n:n

← order statistics (X

_1:n

= min, X

_n:n

= max)

An empirical CDF is a stair-like function,

constant over intervals [X

_i:n

, X

_i+1:n

)

(10)

Distribution of order statistics

Let X

₁

, X

₂

, ..., X

_n

be independent random variables from a distribution with CDF F.

Then X

_k:n

has a CDF equal to

If additionally the distribution is continuous with density f, then X

_k:n

has density

( ) ( )

∑

=

−

 

 



= 

≤

=

ⁿ

k i

i n i

n k n

k

F x F x

i x n

X P x

F

_:

( ) (

_:

) ( ) 1 ( )

( ) (

^k

)

ⁿ ^k

n

k

f x F x F x

k n n x

f 

⁻

−

⁻



 





−

= − ( ) ( ) 1 ( )

1 ) 1

(

¹

:

(11)

Sample moments and quantiles as estimators

Sample moments and quantiles are

moments and quantiles of the empirical distribution, so they are estimators of the corresponding theoretical values.

sample mean = estimator of the expected value

sample variance = estimator of variance sample median = estimator of median

sample quantiles = estimators of quantiles

(12)

Method of Moments Estimation (MM)

We compare the theoretical moments

(depending on unknown parameter(s)) to their empirical counterparts.

Justification: limit theorems

We need to solve a (system of)

equation(s).

(13)

EMM – cont.

If θ is single-dimensional, we use one equation, usually:

If θ is two-dimensional, we use two equations, usually:

If θ is k-dimensional, we use k equations, usually

X X

E

_θ

=

 



=

ˆ

2

Var

, S X

X X

E

θ θ











−

=

−

=

−

=

∑ ∑

=

n i

k n i

k

n

i i

n

X X

X E X

E

X X

X E X

E

S X

X X

E

1 1

1 1 3

3 2

) (

....

, ) (

) (

ˆ , Var

,

θ θ

(14)

MME – Example 1.

Exponential model: X

₁

, X

₂

, ..., X

_n

are a sample from an exponential distr. Exp( λ ).

we know:

equation:

solution:

λ

= 1 X E

= X

λ

1 MME ( ) ˆ

_MM

X 1

ˆ = λ = λ =

λ

(15)

MME – Example 2.

Gamma model: X

₁

, X

₂

, ..., X

_n

are a sample from distr. Gamma( α , λ ).

We know:

System of equations:

Solution:

, 2 , , Var

λ α λ

α

λ α λ

α X = X =

E

2

2 ˆ

, S

X =

=

λ

α λ

α

2 2 2

, ˆ ˆ ˆ ˆ

S X S

X

MM

= α =

λ

, Γ dla 0

(16)

Method of Quantiles Estimation (MQ)

If moments are hard to calculate or

formulae are complicated, we can use

quantiles instead of moments. We choose as many levels of p as we have

parameters, and we put

or equivalently

p

q

q ( θ ) = )

p q

F ( )

_p

) =

θ

(17)

MQE – Example 1.

Exponential model: X

₁

, X

₂

, ..., X

_n

are a sample from an exponential distr. Exp( λ ).

CDF: for λ ^>0

one parameter → one equation, usually for the median

solution:

) exp(

1 x

F

_λ

= − − λ

2 1 2

/ 1

) exp( ˆ

1 − − λ q =

Med 2 ln ˆ

ˆ ln )

(

2 / 1

2 1 =

−

=

= q

MQE

λ λ

_MQ

(18)

MQE – Example 2.

Weibull Model: X

₁

, X

₂

, ..., X

_n

are a sample from a distribution with CDF

where b, c >0 are unknown parameters.

two parameters → two equations, usually quartiles

solution:

) exp(

,

1

b c

b

cx

F = − −

for b=1

exponential distr. with parameter c

 



=

−

=

−

34 4

/ 3

14 4

/ 1

ˆ ) exp(

1 ˆ ) exp(

1

b b

q c

b MQ

MQ

q c

c MQE

q q

b b

MQE

ˆ 4 / 3

4 / 1 4

/ ) 3

3 ln 4 4(ln ln

4 ˆ ln )

(

ˆ ), ˆ ln

/(ln )

ˆ ln(

) (

−

=

−

=

= )

(19)

Properties of MME and MQE estimators

Simple conceptually

Not too complicated calculations

BUT: sometimes not optimal (large

errors, bad properties for small samples) Better method (usually): maximum

likelihood

(20)

Maximum Likelihood Estimation (MLE)

We choose the value of θ for which the

obtained results have the highest probability Likelihood – describes the (joint) probability f (density or discrete probability) treated as a function of θ , for a given set of observations;

L:Θ→R

) ,...,

,

; ( )

( f x

₁

x

₂

x

_n

L θ = θ

(21)

Maximum Likelihood Estimator

is the MLE of θ , if

for any x

₁

, x

₂

, ..., x

_n

. Denoted:

MLE(g( θ )) = g(MLE( θ )) )

,..., ,

ˆ (

ˆ θ X

1

X

2

X

n

θ =

) ,...,

,

; ( sup

) ,...,

, );

,..., ,

ˆ ( (

2 1

n n

n

x x

x f

x x

f

θ θ

θ∈Θ

=

) ˆ (

ˆ θ θ

θ =

_ML

= ^MLE

independence of observations not required in the definition, but greatly simplifies calculations

(22)

MLE – practical problems

Usually: sample of independent obs.

Then:

If L( θ ) is differentiable, and θ is k-

dimensional, then the maximum may be found by solving:

very frequently: instead of max L( θ ) we look for max l( θ ) = ln(L( θ ))

) (

)...

( )

( f x

₁

f x

₂

f x

_n

L θ =

_θ _θ _θ

k L j

j

,..., 2

, 1

, ) 0

( = =

∂

θ

(23)

MLE – Example 1.

Quality control, cont. We maximize or equivalently maximize

i.e. solve solution:

x n x

x x n

X P

L  − ⁻



 



= 

=

= ( ) (1 )

)

(θ _θ θ θ

) 1

ln(

) (

) ln(

ln )

) 1

ln((

) ln(

ln )

(θ θ θ  + θ + − −θ



 



= 

− +

 +



 



=  ⁻ x n x

x n x

l n ^x ⁿ ^x

1 0 )

(

' =

−

− −

= θ θ

θ ^x ⁿ ^x l

n

MLE ( θ ) = θ ^ˆ

_ML

= x

(24)

Exponential model: X

₁

, X

₂

, ..., X

_n

are a sample from Exp( λ ), λ unknown.

We have:

we maximize we solve

we get

xi

n

e

x x

x f

L ( λ ) =

_λ

(

₁

,

₂

,..., ) = λ

⁻^λ ^Σ

x

i

n L

l ( λ ) = ln ( λ ) = ln λ − λ Σ 0 )

(

' = n − Σ x

_i

=

l λ λ

ML

X ˆ = 1

λ

(25)

Normal model: X

₁

, X

₂

, ..., X

_n

are a sample from N( µ , σ

²

). µ , σ unknown.

we solve

we get:

( ) ( )

( )

(

² ²

)

2 1 2

2 2

1 2

1

2 ln

) 2 ln(

) (

exp ln

) , (

2 2

µ µ

σ π

µ σ

µ

σ σ σ

π

n x

x n

x l

i i

n

i n

+ Σ

− Σ

−

=

−

=

∑



 



=

− Σ

=

= +

Σ

− Σ

+

−

=

∂∂

0 0 )

2 (

2 2

3

1

2 1 2

σ µ µ σ

σ σ

σ

µ µ

n i

l

i i

l n

x n x

x

1 2

2

( )

ˆ ,

ˆ

^ML

⁼ ^X ^σ

^ML

⁼

ⁿ

∑ ^x

ⁱ

⁻ ^X

µ

(26)