• Nie Znaleziono Wyników

Random variable and distribution of probability

N/A
N/A
Protected

Academic year: 2021

Share "Random variable and distribution of probability"

Copied!
54
0
0

Pełen tekst

(1)

Introduction to theory of probability and statistics

Lecture 4.

Random variable and distribution of probability

dr hab.inż. Katarzyna Zakrzewska, prof.AGH Katedra Elektroniki, AGH

e-mail: zak@agh.edu.pl

http://home.agh.edu.pl/~zak

(2)

Outline :

Concept of random variable

Quantitative description of random variables

Examples of probability distributions

(3)

The concept of random variable

R x

e X

R X

e e

i

i

= ∈

→ Ω

= Ω

) (

:

} ,

,

{

1 2

K

Random variable is a function X, that attributes a real value x to a certain results of a random experiment.

Examples:

1) Coin toss: event ‘head’ takes a value of 1; event ‘tails’ - 0.

2) Products: event ‘failure’ - 0, well-performing – 1 3) Dice: ‘1’ – 1, ‘2’ – 2 etc.…

4) Interval [a, b]– a choice of a point of a coordinate ‘x’ is attributed a value, e.g. sin2(3x+17) etc. .…

(4)

Random variable

Discrete

When the values of random

variable X are isolated points on an number line

• Toss of a coin

• Transmission errors

• Faulty elements on a production line

• A number of connections coming in 5 minutes

Continuous

When the values of

random variable cover all points of an interval

Electrical current, I

• Temperature, T

• Pressure, p

The concept of random variable

(5)

Quantitative description of random variables

Probability distributions and probability mass functions (for discrete random variables)

Probability density functions (for continuous variables)

• Cumulative distribution function (distribution function for discrete and continuous variables)

• Characteristic quantities (expected value,

variance, quantiles, etc.)

(6)

Distribution of random variable

Distribution of random variable (probability distribution for discrete variables) is a set of pairs (x

i

, p

i

) where x

i

is a value of random variable X and p

i

is a probability, that a random variable X will takes a value x

i

2 ) 1

( )

1 (

1

1

1

= p X = = p x =

x

Example 4.1

Probability mass function for a single toss of coin.

Event corresponding to heads is attributed x1=1; tails means x2=0.

2 ) 1

( )

0 (

0

2

2

= p X = = p x =

x

(7)

2 )}

, 1 0 ( 2 ), , 1 1 ( {

Example 4.1 cont.

Probability mass function for a single toss of coin is given by a set of the following pairs:

Random variable when discrete entails probability distribution also discrete.

0,0 0,5 1,0

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

prawdopodob. zdarzenia

p(X)

X

Distribution of random variable

(8)

Probability density function

) (

)

( x dx P x X x dx

f ≡ ≤ < +

0 )

( .

1 f x

Probability function is introduced for continuous variables; it is related to probability in the following way:

Properties of probability density function:

+∞

= 1 )

( )

( .

2 f x is normalized f x dx

3. f(x) has a measure of 1/x

(9)

=

<

<

b

a

dx x

f b

X a

P ( ) ( )

Directly from a definition of probability density function f(x) we get a formula of calculating the probability that the random variable will assume a value within an interval of [a,b]:

Probability density function

Question: what is a probability of x=a is incorrect!!!

(10)

= =

=

<

≤ 10) 10 ( ) 100,05 0,5 0

( X f x dx dx

P

Let the continuous random variable X denote the current measured in a thin copper wire in mA. Assume that the range of X is [0, 20 mA], and assume that the probability density function of X is f(x)=0,05 for 0≤x≤20. What is the probability that a current measured is less than 10 mA.

Example 4.2

0 10 20 30

0,00 0,02 0,04 0,06 0,08 0,10

gestosc prawdop.

f(x)

Probability density function

(11)

Quantitative description of random variables

• Cumulative distribution function (CDF) F(x) is a probability of an event that the random variable X will assume a value smaller than or equal to x (at most x)

) (

)

( x P X x

F = ≤

Example 4.1 cont.

CDF of coin toss:

2 ) 1

0 (

) 0

( x = = P X ≤ = F

1 )

1 (

) 1

( x = = P X ≤ =

F

(12)

Properties of CDF

1 )

( 0

.

1 ≤ F x

) (

) (

.

4 xyF xF y 0

) (

.

2 F −∞ = 1 )

( .

3 F +∞ =

non-decreasing function 5. F(x) has no unit

dx x x dF

f ( )

) ( .

6 =

Relationship between cumulative

distribution function and probability

(13)

CDF of discrete variable

=

=

x x

i

i

x f x

X P x

F ( ) ( ) ( )

f (xi) – probability mass function Example 4.3

Determine probability mass function of X from the following cumulative distribution function F(x)

x dla

x dla

x dla

x dla x

F

<

<

<

=

2 1

2 0

7 , 0

0 2

2 , 0

2 0

) (

From the plot, the only points to receive f(x)≠0 are -2, 0, 2.

2 , 0 0

2 , 0 )

2

( − = − =

f f ( 0 ) = 0 , 7 − 0 , 2 = 0 , 5 f ( 2 ) = 1 , 0 − 0 , 7 = 0 , 3

(14)

=

=

t

dx x

f t

X P

t

F ( ) ( ) ( )

CDF for continuous variable

Cumulative distribution function F(t) of continuous variable is a non- decreasing continuous function and can be calculated as an area

under density probability function f(x) over an interval from - ∞ to t.

(15)

Numerical descriptors

Parameters of

Position

Quantile (e.g. median, quartile)

Mode

Expected value (average)

Variance (standard deviation)

Range

Dispersion

(16)

q du

u f

x X

P x

F

xq

q

q

= ≤ = ∫ =

) ( )

( )

(

Quantile xq represents a value of random variable for which the cumulative distribution function takes a value of q.

Median i.e. x0.5 is the most frequently used quantile.

In example 4.2 current I=10 mA is a median of distribution.

For a discrete distribution : 19, 21, 21, 21, 22, 22, 23, 25, 26, 27 median is 22 (middle value or arithmetic average of two middle values)

Example 4.4

Numerical descriptors

(17)

Mode represents the most frequently occurring value of random variable (x at which probability distribution attains a maximum)

Unimodal distribution has one mode (multimodal distributions – more than one mode)

In example 4.4: xk= 19, 21, 21, 21, 22, 22, 23, 25, 26, 27 mode equals to 21 (which appears 3 times, i.e. , the most frequently)

Numerical descriptors

(18)

Average value

Arithmetic average:

xi - belongs to a set of n – elements

n

= i

x

i

= n x

1

1

In example 4.4: xi = 19, 21, 21, 21, 22, 22, 23, 25, 26, 27, the arithmetic average is 22,7.

(19)

Arithmetic average

0,04 25,2

0,14 12,3

0,04 10,2

=

2 1

+ +

+

= f x + +

f x + f x

=

x

1 2 n n

15,77

= x

xk nk fk

10,2 1 0,0357 12,3 4 0,1429 12,4 2 0,0714 13,4 8 0,2857 16,4 4 0,1429 17,5 3 0,1071 19,3 1 0,0357 21,4 2 0,0714 22,4 2 0,0714 25,2 1 0,0357

Sum 28

Example 4.5

Many elements having the same value, we divide the set into classes containing nk identical elements

k p

k

k k

p k

k

x f n =

x n

=

x ∑ ∑

=

=

1 1

where: ,p number of classes

(

p n

)

n

= n

fk k − ≤

Normalization condition

(20)

Moment of the order k with respect to x0

for discrete variables

The most important are the moments calculated with respect to x0=0 (mk) and X0=m1 the first moment (m1 is called the expected value) – these are central moments µk.

) ( )

( )

(

0 i i 0 k i

k

x x x p x

m ≡ ∑ −

dx x

f x

x x

m

k

(

0

) (

0

)

k

( )

for continuous variables

Moments of distribution functions

(21)

Symbols: m1, E(X), µ, ,

for discrete variables

i i

x

i

p X

E ( ) =

dx x

f x X

E ( ) ( )

for continuous variables

x xˆ

Expected value

(22)

Properties of E(X)

E(X) is a linear operator, i.e:

1.

In a consequence:

E(C)= C

E(CX)= CE(X)

E(X

1

+X

2

)=E(X

1

)+E(X

2

) 2. For independent variables X

1,

X

2

, … X

n

Variables are independent when:

) (

)

(

i i

i i

i

C

i

X C E X

E=

) (

)

(

i

X

i i

E X

i

E=

) (

...

) (

) (

) ,...,

,

( X X X f X f X f X

f = ⋅ ⋅

(23)

Properties of E(X)

3. For a function of X; Y= Y(X) the expected value E(Y) can be found on the basis of distribution of variable X without

necessity of looking for distribution of f(y)

i

i

y x

i

p

Y

E ( ) = ∑ ( )

for discrete variables

dx x

f x y Y

E ( ) ( ) ( )

for continuous variables

Any moment m

k

(x

0

) can be treated as an expected value of a function Y(X)=(X-x

0

)

k

) ) ((

) ( )

( )

(

0 0 k 0 k

k

x x x f x dx E x x

m ≡ ∫ − = −

(24)

VARIANCE (dispersion) symbols: σ2(X), var(X), V(X), D(X).

Standard deviation σ(x)

Variance (or the standard deviation) is a measure of scatter of random variables around the expected value.

2 2

( X ) p ( x

i

E ( X ))

i i

≡ ∑

σ

dx X

E x

x f

X

2

2

( ) ( ) ( ( )

σ

Variance

for discrete variables

for continuous variables

) (

) (

)

(

2 2

2

X = E XE X

σ

(25)

Properties of σ

2

(X)

Variance can be calculated using expected values only:

1.

In a consequence we get:

σ

2

(C)= 0

σ

2

(CX)= C

2

σ

2

(X) σ

2

(C

1

X+C

2

)= C

12

σ

2

(X) 2. For independent variables X

1,

X

2

, … X

n

) (

) (

)

(

2 2

2

X = E XE X

σ

) ( )

(

2 2

2 i

C

i

X

i i

C X

i

σ

σ ∑ =

(26)

UNIFORM DISTRIBUTION

a  ≤   x  ≤  b

(27)

Czebyszew inequality

Interpretation of variance results from Czebyszew theorem:

( ( ) . ( ) ) 1

2

X a a

X E X

P − ≥ σ ≤

Theorem:

Probability of the random variable X to be shifted from the expected value E(X) by a-times standard deviation is smaller or equal to 1/a

2

This theorem is valid for all distributions that have a variance

and the expected value. Number a is any positive real value.

(28)

Big scatter of data

Smaller scatter of data

Variance as a measure of

data scatter

(29)

RANGE = x

max

- x

min

Range as a measure of scatter

(30)

Practical ways of calculating variance

( )

average x

x n x

=

s

n

= i

i

− ∑ −

1 2 2

1 1

( )

value ected

exp μ

μ N x

= 1

σ

N

1

= i

2 i

2

∑ −

Variance of n-element sample:

Variance of N-element population :

(31)

( )

n

= i

i

x

n x

= s

1

2

1 1

( )

N

= i

i

μ

N x

= σ

1

1

2

Standard deviation of sample (or: standard uncertainty):

Standard deviation (population):

Practical ways of calculating

standard deviation

(32)

Two-point distribution (zero-one), e.g. coin toss, head = failure x=0, tail = success x=1, p – probability of success, its distribution:

x

i

0 1

p

i 1-p

p

Binomial (Bernoulli)

where 0<p<1; X={0, 1, 2, … k} k – number of successes when n-times sampled with replacement

For k=1 two-point distribution

Examples of probability

distributions – discrete variables

n k

p k p

p

k

n ⎟⎟ ⋅

k

( 1 − )

n k

, = 0 , 1 , K ,

⎜⎜ ⎞

= ⎛

(33)

Binomial distribution - assumptions

Random experiment consists of n Bernoulli trials :

1. Each trial is independent of others.

2. Each trial can have only two results: „success” and

„failure” (binary!).

3. Probability of success p is constant.

Probability pk of an event that random variable X will be equal to the number of k-successes at n trials.

n k

p k p

p

k

n ⎟⎟ ⋅

k

( 1 − )

n k

, = 0 , 1 , K ,

⎜⎜ ⎞

= ⎛

(34)

Pascal’s triangle

2 1 2 2

1 1 2

0 2 2

1 1 1 1

0 1 1

0 1 0 0

⎟⎟ =

⎜⎜ ⎞

= ⎛

⎟⎟ ⎠

⎜⎜ ⎞

= ⎛

⎟⎟ ⎠

⎜⎜ ⎞

= ⎛

⎟⎟ =

⎜⎜ ⎞

= ⎛

⎟⎟ ⎠

⎜⎜ ⎞

= ⎛

⎟⎟ =

⎜⎜ ⎞

= ⎛

n n n

!

! ) (

! k k n

n k

n

= −

⎟⎟ ⎠

⎜⎜ ⎞

Symbol ⎛

k n n k

k

n

a b

k b n

a

=

⎟⎟

⎜⎜ ⎞

= ⎛

+ ∑

0

) (

Newton’s binomial

(35)

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

n = 0 n = 1 n = 2 n = 3 n = 4 n = 5 n = 6

+

Pascal’s triangle

(36)

Bernoulli distribution

Example 4.6

Probability that in a company the daily use of water will not exceed a certain level is p=3/4. We monitor a use of water for 6 days.

Calculate a probability the daily use of water will not

exceed the set-up limit in 0, 1, 2, …, 6 consecutive days, respectively.

Data:

6 , ,

1 , 0 4 6

1 4

3 = = = K

= q N k

p

(37)

0 6

1 5

2 4

3 3

4 2

5 1

6 0

4 1 4

3 6

) 6 6 (

6

4 1 4

3 5

) 6 5 (

5

4 1 4

3 4

) 6 4 (

4

4 1 4

3 3

) 6 3 (

3

4 1 4

3 2

) 6 2 (

2

4 1 4

3 1

) 6 1 (

1

4 1 4

3 0

) 6 0 (

0

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

⎟⎟

⎜⎜

=

=

=

k P k

k P k

k P k

k P k

k P k

k P k

k P k

Bernoulli distribution

(38)

178 . 0 ) 0 ( 9 729

9 9 1

1 3 ) 6 ( 6

356 . 0 ) 0 ( 4 1458

3 9 9 6 4

1 4

6 3 ) 5 ( 5

297 . 0 ) 0 ( 4 1215

9 9 15 4

1 4

15 3 )

4 ( 4

132 . 0 ) 0 ( 4 540

3 9 20 4

1 4

20 3 )

3 ( 3

033 . 0 ) 0 ( 4 135

9 15 4

1 4

15 3 )

2 ( 2

004 . 0 ) 0 ( 4 18

3 6 4

1 4 6 3 ) 1 ( 1

00024 .

4 0 1 1 1 ) 0 ( 0

6

6 1

5

6 2

4

6 3

3

6 4

2

6 5

6

⋅ =

= ⋅

⎟ ⋅

⎜ ⎞

⋅⎛

=

=

⋅ =

= ⋅

⎟ ⋅

⎜ ⎞

⋅⎛

=

=

⋅ =

= ⋅

⎟ ⋅

⎜ ⎞

⋅⎛

=

=

⋅ =

= ⋅

⎟ ⋅

⎜ ⎞

⋅⎛

=

=

⋅ =

=

⎟ ⋅

⎜ ⎞

⋅⎛

=

=

⋅ =

=

=

=

=

=

P P

k

P P

k

P P

k

P P

k

P P

k

P P

k

P k

Bernoulli distribution

(39)

0,00024 0,004

0,033

0,132

0,297

0,356

0,178

0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4

0 1 2 3 4 5 6 7

P(k)

k

Maximum for k=5

Bernoulli distribution

(40)

Bernoulli distribution

(41)

Expected value

Variance

np X

E ( ) = μ =

) 1

( )

( X

2

np p

V = σ = −

Bernoulli distribution

(42)

Errors in transmission

Example 4.7

Digital channel of information transfer is prone to errors in single bits. Assume that the probability of single bit error is p=0,1

Consecutive errors in transmissions are independent. Let X denote the random variable, of values equal to the number of bits in error, in a sequence of 4 bits.

E - bit error, O - no error

OEOE corresponds to X=2; for EEOO - X=2 (order does not matter)

(43)

Example 4.7 cd

For X=2 we get the following results:

{EEOO, EOEO, EOOE, OEEO, OEOE, OOEE}

What is a probability of P(X=2), i.e., two bits will be sent with error?

Events are independent, thus

P(EEOO)=P(E)P(E)P(O)P(O)=(0,1)2 (0,9)2 = 0,0081

Events are mutually exhaustive and have the same probability, hence

P(X=2)=6 P(EEOO)= 6 (0,1)2 (0,9)2 = 6 (0,0081)=0.0486

Errors in transmission

(44)

Example 4.7 continued

Therefore, P(X=2)=6 (0,1)2 (0,9)2 is given by Bernoulli distribution

! 6 2

! ) 2 (

! 4 2

4 ⎟⎟ ⎠ = =

⎜⎜ ⎞

1 , 0 ,

4 , 3 , 2 , 1 , 0 ,

) 1

4 ( )

( ⎟⎟ ⋅ −

4

= =

⎜⎜ ⎞

= ⎛

= p p

x p

x x X

P

x x

P(X = 0) = 0,6561 P(X = 1) = 0,2916 P(X = 2) = 0,0486 P(X = 3) = 0,0036 P(X = 4) = 0,0001

Errors in transmission

(45)

Poisson’s distribution

We introduce a parameter λ=pn (E(X) = λ)

x n x

x n x

n x n

p n x p

x n X

P

⎜ ⎞

⎝ ⎛ −

⎟ ⎠

⎜ ⎞

⎟⎟ ⎛

⎜⎜ ⎞

= ⎛

⎟⎟ ⋅

⎜⎜ ⎞

= ⎛

= λ λ

1 )

1 ( )

(

Let us assume that n increases while p decreases, but λ=pn remains constant. Bernoulli distribution changes to Poisson’s distribution.

1 ! )

( lim

lim P X x n x n n e x

x

x n x

n n

λ λ

λ

λ

⎟ =

⎜ ⎞

⎝ ⎛ −

⎟ ⎠

⎜ ⎞

⎟⎟ ⎛

⎜⎜ ⎞

= ⎛

=

(46)

It is one of the rare cases where, expected value equals to variance:

λ

=

= np X

E ( )

Why?

λ

σ = − = =

=

np np np

X

V ( )

n

lim

p

(

2

)

0 , 2

Poisson’s distribution

(47)

0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4

0 5 10 15 20 25

lambda=1 lambda=5 lambda=10

x

p(X)

Bernoulli        n=50;  p=0,02

Poisson: 

λ=1  0

1 2 3 4 5 6

0,364 0,372 0,186 0,061 0,014 0,003 0,000

0,368 0,368 0,184 0,061 0,015 0,003 0,001

(x- integer, infinite; x≥ 0) For big n Bernoulli distribution resembles Poisson’s distribution

Poisson’s distribution

(48)

Limiting case

(normal distribution)

Normal distribution (Gaussian)

The most widely used model for the distribution of random variable is a normal distribution.

Central limit theorem formulated in 1733 by De Moivre

Whenever a random experiment is replicated, the random variable that

equals the average (or total) result over the replicas tends to have a normal distribution as the number of replicas becomes large.

(49)

+∞

<

<

⎥ ∞

⎢ ⎤

⎡ − −

= x , w here - x

x

f

2

2

2 exp (

2 ) 1

( σ

μ π

σ

A random variable X with probability density function f(x):

is a normal random variable with two parameters:

1 , >

+∞

<

<

− μ σ

We can show that E(X)=μ and V(X)=σ2

Notation N(μ,σ) is used to denote this distribution

Normal distribution (Gaussian)

(50)

Expected value, maximum of density probability (mode) and median overlap (x=μ). Symmetric curve (Gaussian curve is bell shaped).

Variance is a measure of the width of distribution. At x=+σ and x=- σ there are the inflection points of N(0, σ).

Normal distribution (Gaussian)

(51)

Is used in experimental physics and describes distribution of random errors. Standard deviation σ is a measure of random uncertainty. Measurements with larger σ correspond to bigger scatter of data around the average value and thus have less precision.

Normal distribution (Gaussian)

(52)

Standard normal distribution

+∞

<

<

⎥ ∞

⎢ ⎤

⎡ −

= z , w here - z

z

N exp 2

2 ) 1

(

2

π

A normal random variable Z with probability density N(z):

is called a standard normal random variable

1 )

( ,

0 )

( Z = V Z =

E

σ μ

= XZ

Definition of standard normal variable

N(0,1)

(53)

Advantages of standardization:

• Tables of values of probability density and CDF can be constructed for N(0,1). A new variable of the N(µ,σ) distribution can be created by a simple transformation X= σ*Z+µ

• By standardization we shift all original random variables to the region close to zero and we rescale the x-axis. The unit changes to standard deviation. Therefore, we can compare different distribution.

Standard normal distribution

Confidence level

Significance level

(54)

(-σ, + σ)

x

Φ(x)

P(μ-σ <X< μ+σ) = 0,6827 (about 2/3 of results) P(μ-2σ <X< μ+2σ) = 0,9545

P(μ-2σ <X< μ+2σ) = 0,9973 (almost all) 68.2%

pow.

(-2σ, + 2σ)

Calculations of probability (Gaussian distribution)

(-3σ, + 3σ)

Cytaty

Powiązane dokumenty

Let X denote the total time a lamp works until the supply of light bulbs is finished.. 10 girls and 10 boys are

Consider the following game: we toss a symmetric coin until heads appear?. What is a reasonable price for participation in

This means that the test we are using has almost no ability of distinguishing a value of p = 1 2 from a value of p = 0.51: if the true value of the parameter were p = 0.51, the value

• It is often tempting to select the observations that are most convenient as the sample or to exercise judgment in sample

The density of a three-parameter gamma distribution of a random variable X which is of the form (1) is equal to the density of the infinite product f[£Li xk of indepedent

In the following by N we shall denote a positive integer-valued random variable which has the distribution function dependent on a parameter 2(2 &gt; 0) i.e.. We assume that

You are not required to find the coordinates of the

1) MCAR (missing completely at random) – a fully random mechanism. Missing data are defined as completely random, when the probability of missing data neither depends on the