the statistical model

(1)

Mathematical Statistics Anna Janicka

Lecture III, 04.03.2019

INTRODUCTION TO MATHEMATICAL STATISTICS

(2)

Plan for today

1. Introduction to Mathematical Statistics

the statistical model

2. Statistics and their distributions

the normal model

3. Estimation – introduction

(3)

MATHEMATICAL STATISTICS

(4)

Assumptions

Empirical data reflect the functioning of a random mechanism

Therefore: we are dealing with random variables defined over some probabilistic space; the realizations of these random variables are the collected data.

Problem: we do not know the distribution

of these random variables...

(5)

Difference between Probability Calculus and Mathematical Statistics

1. PC, example:

Phrasing: in a production process each produced unit may be defective. This happens with probability 10%.

The defects of different units are independent.

Problems: What is the chance that in a batch of 50 items, exactly 6 will be defective? What is the

average number of defective elements? What is the most probable number of defective elements?

Solution: we build a probabilistic model. Here: a Bernoulli Scheme with n=50, p=0,1.

Alternatively, if we are interested in questions dealing with order (e.g. what is the chance that the first 5

items are defective?): a different model

(6)

Difference between Probability Calculus and Mathematical Statistics – cont.

2. MS, example:

Formulation: An inspector verified a batch of 50 items, with the following results (1– item defective, 0 – OK):

0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 Problems: what is the probability that an item is

defective (assessment)? Is the producer’s declaration that defectiveness is equal to 10% credible?

Solution: we build a statistical model, i.e. a probabilistic model with unknown distribution parameter(s).

(7)

Statistical Model

Statistical Model:

where:

X – the space of values for the observed

random variable X (often n-dimensional, if

we have an n-dimensional sample X

₁

, ..., X

_n

) F

_X

– σ-algebra on X

P – a family of probability distributions P

_θ

, indexed by a parameter θ ∈Θ

In a less formal setting we usually provide: X, P, Θ

) ,

,

( Ω F P

) P , F

X,

(

_X

in PC:

(8)

Statistical model – example

X = {0,1}

ⁿ

– sample space Joint probability distribution:

for θ ∈[0,1]

(we have n=50, X

₂

= X

₁₀

= X

₁₅

= X

₃₂

=X

₄₂

= X

₅₀

=1, other X

_i

= 0)

i i

x n x

n

i

x x

n

x

X x

X P

Σ

− Σ

=

−

=

−

=

= ∏

) 1

(

) 1

( )

,..., ,

(

1

1 2

2 1

1

θ θ

θ

(9)

Statistical model – example cont.

Alternative formulation (if we only record the number of defective items in a sample):

X = {0,1, 2, ..., n} – sample space Joint probability distribution:

for θ ∈[0,1]

(we have n=50 and X=6)

x n x

x x n

X

P  −

⁻



 



= 

= ) ( 1 )

( θ θ

θ

(10)

Statistical model – example cont. (2) Possible questions

Based on the sample:

What is the value of θ ?

we are interested in a precise value

we are interested in an interval (confidence)

→ estimation

Verification of the hypothesis that θ =0.1

→ hypothesis testing

→ predictions

(11)

Statistical Model: example 2 Growths on the market

An analyst studies the length of periods of growth on the stock market. He is interested in times of growth (until the first fall), in days. Assume the times of growth, X

₁

, X

₂

, ..., X

_n

are a sample from an exponential distribution Exp( λ ^{), where:}

λ – unknown parameter X =(0,∞)

ⁿ

– sample space Joint probability distribution:

for λ > 0

∏

=

− −

=

≤

≤ ⁿ

i

x n

n

e i

x X

P

1 2

2 1

1 , ,..., ) (1 )

( ^λ

λ

xi

n

n e

x x

x

f_λ( ₁, ₂,..., ) = λ ⁻^λ ^Σ

(12)

Statistical Model: example 3 Measurements with error

We repeat measuring µ , the results of

measurements are independent random variables X

₁

, X

₂

, ..., X

_n

, (our machine is not perfect). Each measurement is normally distributed N( µ ^, σ

²

^).

µ , σ

²

– unknown parameters (so θ = ( µ , σ )) X = R

ⁿ

– sample space

Joint probability distribution:

or

for µ ∈R, σ >0

∏ ( )

=

Φ −

=

≤

≤ ⁿ

i

x n

n

x i

X x

X x X

P

1 2

2 1

1

,_σ ( , ,..., ) _σ ^µ

µ

( ) (

⁻

_∑

₌ ⁻

)

= ⁿ

i i

n

n x

x x

x

f 1

2 2

1 2

1

,_σ ( , ,..., ) _π_σ exp _σ₂ ( µ)

µ

(13)

STATISTICS

(objects)

(14)

Statistics

Parameter estimation (both point and

interval) as well as hypothesis testing are conducted based on statistics

Statistic = a function of observations, i.e.

any random variable

The distribution of a statistic T depends on the distribution of X, but the statistic as such cannot depend on parameter θ , e.g.

X

₁

+X

₂

- θ )

,..., ,

( X

₁

X

₂

X

_n

T

T =

(15)

Statistics – examples

are statistics for a sample size of n;

are statistics for a single observation

The choice of a statistic depends on the question we want to answer.

1 . 0

,

1 1

3 1

1 2

1

= ∑ = ∑ = ∑ −

=

n

i n i n

i

T X T X

X T

1 . 0

,

₂ ₃

1

= = = −

n T X

X

T

(16)

Distribution of statistics

In many cases statistical models refer to a common set of assumptions → similar

models are applied.

The most commonly used is the normal

model

(17)

The normal model

X

₁

, X

₂

, ..., X

_n

are a sample from N( µ , σ

²

).

The most important statistics (in general, not only for this model):

Mean:

sample variance:

standard deviation:

∑

=

ⁿ

i

X

i

X n

1

2 1

, ) (

S S

X X

S

n

i n i

=

−

= ∑

=

−

what are their

distributions?

(18)

Chi-squared Distribution χ

²

(n)

A special case of the gamma distribution.

The sum of squares of n IIN random variables

(independent identically N(0,1) distributed) has a

χ

²

(n) distribution

(19)

The normal model – cont. (1)

Theorem: In the normal model, the and S

²

statistics are independent random

variables such that

in particular:

X

) ,

(

~ N

² _n

X µ

^σ

) 1 (

~

²

1 2

2

χ −

−

S n

n σ

) 1 2 (

Var and

,

² ⁴

2 2

,

= = −

S n S

E

_µ _σ

σ σ

) 1 , 0 ( ) ~

(X n N

σ µ

−

(20)

The normal model – cont. (2)

In the normal model, the variable

has a t-Student distribution with n -1 degrees of freedom, T ~ t(n -1)

S X

T = n ( − µ )

(21)

t -Student Distribution t(n), n=1,2,7

defined as the distribution of the random variable

for independent X and Y, X~N(0,1), Y~χ

²

(n)

(22)

ESTIMATION

(23)

Point Estimation

The choice, on the base of the data, of the best parameter θ , from the set of

parameters which may describe P

_θ

An Estimator of parameter θ is any statistic

with values in Θ (we interpret it as an

approximation of θ ). Usually denoted by Sometimes we estimate g( θ ) rather than θ .

) ,...,

,

( X

₁

X

₂

X

_n

T

T =

θ ^ˆ

(24)

Estimation: an example Empirical frequency

Quality control example:

0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

Model: X = {0,1, 2, ..., n} (here n=50), for θ ∈[0,1]

parameter θ : probability of faulty element an obvious estimator:

n – sample size

X – number of faulty elements in sample

650

ˆ =

^X _n

=

θ

x n x

x x n

X

P  −

⁻



 



= 

= ) ( 1 )

( θ θ

θ

For a different model (all outcomes) the estimator is an average

(25)

Problems with (frequency) estimators...

Example: three genotypes in a population, with frequencies

In a population of size n, N

₁

and N

₂

and N

₃

individuals of particular genotypes were

observed.

Should we take ? or rather

? How about ? Maybe something else?

→ How do we choose the best one?

2 2

: 2 θ ( 1 θ ) : ( 1 θ )

θ − −

n N₁

θ ˆ =

n N₃

ˆ = 1 −

θ n

N n

N

₁ ₂

2 ˆ = + 1

θ

(26)

Estimation – sample characteristics

Sample characteristics:

estimators based on the empirical

distribution (empirical CDF)

(27)

Empirical CDF

Let X

₁

, X

₂

, ..., X

_n

be a sample from a

distribution given by F (modeled by {P

_F

}) (n-th) empirical CDF

For a given realization {X

_i

} it is a function of t, the CDF of the empirical distribution (uniform over x

₁

, x

₂

, ..., x

_n

). For a given t it is a statistic with a distribution

n

t X : X ns observatio of

number n

i t i

n n

i

X

i

t

F = ∑

₌1 (_−∞, ]

=

^≤

1

( )

)

ˆ ( 1

n k

t F t

k F t n

F

P( ˆ( ) ^k_n)  ( )^k (1− ( ))ⁿ ^k, = 0,1,...,



 



= 

= ⁻

(28)

Empirical CDF: properties

1.

2. 3. from CLT:

i.e., for any z:

4. Glivenko-Cantelli Theorem

for ) ( )

ˆ ( t F t F

E

_F _n

=

)) (

1 )(

( )

ˆ (

Var F

_n

t =

_n¹

F t − F t

) 1 , 0 )) (

( 1

)(

(

) ( )

ˆ (

N t n

F t

F

t F t

F

n

n  →

−

∞

→

) )) (

( 1 )(

(

) ( )

ˆ (

z z

t n F t

F

t F t

P Fⁿ  → Φ









 ≤

−

0 | ) ( )

ˆ (

|

sup −   → 

^. ^.

∈

s a t n

t F t

F

R

n → ∞

if sample size increases, we will approximate the

unknown distribution with any given level of precision

(29)

Order statistics

Let X

₁

, X

₂

, ..., X

_n

be a sample from a

distribution with CDF F. If we organize the observations in ascending order:

X

_1:n

, X

_2:n

, ..., X

_n:n

← order statistics (X

_1:n

= min, X

_n:n

= max)

An empirical CDF is a stair-like function,

constant over intervals [X

_i:n

, X

_i+1:n

)

(30)

Distribution of order statistics

Let X

₁

, X

₂

, ..., X

_n

be independent random variables from a distribution with CDF F.

Then X

_k:n

has a CDF equal to

If additionally the distribution is continuous with density f, then X

_k:n

has density

( ) ( )

∑

=

−

 

 



= 

≤

=

ⁿ

k i

i n i

n k n

k

F x F x

i x n

X P x

F

_:

( ) (

_:

) ( ) 1 ( )

( ) (

^k

)

ⁿ ^k

n

k

f x F x F x

k n n x

f 

⁻

−

⁻



 





−

= − ( ) ( ) 1 ( )

1 ) 1

(

¹

:

(31)