dr hab.inż. Katarzyna Zakrzewska, prof.AGH Katedra Elektroniki, AGH
e-mail: zak@agh.edu.pl
http://home.agh.edu.pl/~zak
Lecture 1.
Introduction
Introduction to probability and
statistics
References:
● D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, Third Edition, J. Wiley & Sons, 2003
● A. Plucińska, E. Pluciński, Probabilistyka, rachunek
prawdopodobieństwa, statystyka matematyczna, procesy stochastyczne, WNT, 2000
● J. Jakubowski, R. Sztencel, Wstęp do teorii prawdopodobieństwa, SCRIPT, 2000
● M. Sobczyk, Statystyka, Wydawnictwo C.H. Beck, Warszawa 2010
● A. Zięba, Analiza danych w naukach ścisłych i technice, PWN, Warszawa 2013, 2014
Introduction to probability and statistics. lecture 1 2
Outline
●
Probability and statistics - scope
●
Historical background
●
Paradox of Chevalier de Méré
●
Statistics – type of data and the concept of random variable
●
Graphical representation of data
●
The role of probability and statistics in science and
engineering
Probabilistic and statistical approach
Introduction to probability and statistics. lecture 1 4
Theory of probability (also calculus of probability or
probabilistics) – branch of mathematics that deals with random events and stochastic processes. Random event is a result of random (non-deterministic) experiment.
Random experiment can be repeated many times under identical or nearly identical while its result cannot be predicted.
When n increases, the frequency tends to some constant value Ll – number of times with the given result
n – number of repetitions
Statistics deals with methods of data and information (numerical in nature) acquisition, their analysis and interpretation.
Probabilistics studies abstract mathematical concepts that are devised to describe non-deterministic phenomena:
1. random variables in the case of single events
2. stochastic processes when events are repeated in time Big data are considered by statistics
One of the most important achievement of modern physics was a discovery of probabilistic nature of phenomena at microscopic scale which is fundamental to quantum mechanics.
Probabilistic and statistical approach
Statistics
DESCRIPTIVE STATISTICS
●Arrangement of data
●Presentation of data
STATISTICAL INFERENCE
Gives methods of formulating conclusions concerning the object of studies (general population) based on a a smaller sample
graphical numerical
Introduction to probability and statistics.
lecture 1 6
Probabilistic and statistical approach
Historical background
• Theory of probability goes back to 17th century when Pierre de Fermat and Blaise Pascal analyzed games of chance. That is why, initially it concentrated on discreet variables, only, using methods of combinatorics.
• Continuous variables were introduced to theory of probability much later
• The beginning of modern theory of probability is generallly accepted to be axiomatization performed in 1933 by
Andriej Kołmogorow.
Gambling
Is based on probability of random events...
...and may be analyzed by theory of probability.
●
Probability of a „tail”
●
Certain combination of cards held in one hand
...simple, as a coin toss, ...
...fully random as roulette...
...complicated, as a poker game..
Introduction to probability and statistics. lecture 1 8
Blaise Pascal (1601-1662) Paris, France
Immortalized Chevalier de Méré and gambling paradox
Pascal’s triangle for binomial coefficients
k n k n
k
n
a b
k b n
a
−=
⎟⎟
⎠
⎜⎜ ⎞
⎝
= ⎛
+ ∑
0
) (
Newton’s binomial
Historical background
Pascal’s Triangle
10
6 1 6 6
5 15 6
4 20 6
3 15 6
2 6 6
1 1 6
0 6 6
5 1 5 5
4 10 5
3 10 5
2 5 5
1 1 5
0 5 5
4 1 4 4
3 6 4
2 4 4
1 1 4
0 4 4
3 1 3 3
2 3 3
1 1 3
0 3 3
2 1 2 2
1 1 2
0 2 2
1 1 1 1
0 1 1
0 1 0 0
⎟⎟=
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟ =
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟ =
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟=
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟=
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟ =
⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟⎠
⎜⎜ ⎞
⎝
= ⎛
⎟⎟=
⎠
⎜⎜ ⎞
⎝
= ⎛
n n n n n n n
!
! ) (
! k k n
n k
n
= −
⎟⎟ ⎠
⎜⎜ ⎞
⎝
⎛
Binomial coefficients (read „n choose k”)
Introduction to probability and statistics. lecture 1
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
n = 0 n = 1 n = 2 n = 3 n = 4 n = 5 n = 6
+
Pascal’s Triangle
Pierre de Fermat (1601-1665) Touluse, France
Studied properties of prime numbers, theory of numbers, in parallel he
developed the concept of coordinates in geometry.
In collaboration with Pascal he laid a base for modern theory of probability.
Introduction to probability and statistics. lecture 1 12
Historical background
Siméon Denis Poisson (1781-1840) Paris, France
Friend of Lagrange, student of Laplace at famous École Polytechnique.
Except for physics, he took interest in theory of probability.
Stochastic processes (like Markow’s process), Poisson’s distribution – cumulative distribution function
Historical background
Carl Frederich Gauss (1777-1855)
Goettingen, Germany University Professor
Ingenious mathematician who even in his
childhood was far ahead of his contemporaries.
While a pupil of primary school he solved a problem of a sum of numbers from 1 to 40 proposing - (40+1)*20
Normal distribution function, Gauss distribution
Introduction to probability and statistics. lecture 1 14
Historical background
Paradox of Chevalier a de Méré
Two gamblers S1 and S2 agree to play a certain sequence of sets. The winner is the one who will be the first to gain 5 sets.
What is the score, when the game is interrupted abruptly?
Assume that S1 wins 4 times and S2 only 3 times. How to share the
stake?
Proposal no. 1: money should be paid in ratio of 4:3
Proposal 2: (5-3):(5-4)=2:1
wg W.R. Fuchs, Matematyka popularna,
Paradox of Chevalier de Méré
Blaise Pascal is believed to have found the solution to this problem quite simply by assuming that the game will be resolved if they play two times more (at the most).
Introduction to probability and statistics. lecture 1 16
If the first set is won by S1, the whole game is finished.
If the first set is solved by S2, the second victory of S1 makes a deal.
Only in the case both sets are won by S2 makes him win the score. Then, it is justified to share money as 3:1.
Statistics – types of data
QUANTITATIVE, NUMERICAL
Examples:
● Set of people
● Age
● Height
● Salary
Calculations of certain
parameters, like averages, median, extrema, make sense.
QUALITATIVE, CATEGORIAL
Examples:
Sex
Marital status
One can ascribe arbitrary numerical values to different categories.
Calculations of parameters do not make sense, only percentage
contributions can be given.
The concept of random variable
R x
e X
R X
e e
i
i
= ∈
→ Ω
= Ω
) (
:
} ,
,
{
1 2K
Random variable is a function X, that attributes a real value x to a certain results of a random experiment.
Examples:
1) Coin toss: event ‘head’ takes a value of 1; event ‘tails’ - 0.
2) Products: event ‘failure’ - 0, well-performing – 1 3) Dice: ‘1’ – 1, ‘2’ – 2 etc.…
4) Interval [a, b]– a choice of a point of a coordinate ‘x’ is attributed a value, e.g. sin2(3x+17) etc. .…
Introduction to probability and statistics. lecture 1 18
Random variable
Discreet
• Toss of a coin
• Transmission errors
• Faulty elements on a production line
• A number of connections coming in 5 minutes
Continuous
• Electrical current, I
• Temperature, T
• Pressure, p
Statistics – types of data
Graphical presentation of data
x Number of
outcomes Frequency
1 3 3/23 = 0,1304
2 5 5/23 = 0,2174
3 10 10/23 = 0,4348
4 4 4/23 = 0,1739
5 1 1/23 = 0,0435
Sum: 23 1,0000
Introduction to probability and statistics.
lecture 1 20
13%
22%
44%
17%
4%
PIE chart
1 2 3 4 5
1 0,13043478
2 0,2173913
3 0,43
4 0,17391
5 0,04347826
graf1
Graphical presentation of data
Introduction to probability and statistics. lecture 1 22 0
0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45
1 2 3 4 5
Columnar plot
Serie1
1 0,13043478
2 0,2173913
3 0,43
4 0,17391
5 0,04347826
Graphical presentation of data
Numerical data
Results of 34 measurements (e.g. grain size in [nm], temperature in consequitive days at 11:00 in [deg. C], duration of telephone calls in [min], etc.
3,6 13,2 12 12,8 13,5 15,2 4,8 12,3 9,1 16,6 15,3 11,7 6,2 9,4
6,2 6,2 15,3 8 8,2 6,2 6,3
12,1 8,4 14,5 16,6 19,3 15,3 19,2
6,5 10,4 11,2 7,2 6,2 2,3
These data are difficult to deal with!
Histogram
How to prepare a histogram:
1. Order your data (increasing or decreasing values – program Excel programme has such an option.
2. Results of experiments ( a set of n numbers ) can contain the same numerical values. We divide them into classes.
3. The width of a class is not necessarily constant but usually it is chosen to be the same.
4. Number of classes should not be to small or to big. The optimum number of classes 'k' is given by Sturge formula.
Introduction to probability and statistics. lecture 1 24
Histogram
0 2 8 14 20
0 2 4 6 8 10 12 14 16
3 klasy
x
Częstość bezwględna
Histogram
0 2 3,5 5 6,5 8 9,5 11 12,5 14 15,5 17 18,5 20
0 1 2 3 4 5 6 7 8
12 klas
x
Częstość bezwzględna
Introduction to probability and statistics. lecture 1 26
Histogram
0 2 2,5 3 3,5 4 4,5 5 5,5 6 6,5 7 7,5 8 8,5 9 9,5 10 10,5 11 11,5 12 12,5 13 13,5 14 14,5 15 15,5 16 16,5 17 17,5 18 18,5 19 19,5 0
1 2 3 4 5 6 7 8
35 klas
x
Częstość bezwzględna
Sturge formula k= 1+ 3,3log
10n
n= 34 k= 5. 59≈ 6
In our case:
Sample count, n Number of classes, k
< 50 5 – 7
50 – 200 7 – 9
200 – 500 9 – 10
500 – 1000 10 -11 1000 – 5000 11 – 13 5000 – 50000 13 – 17 50000 < 17 – 20
Introduction to probability and statistics. lecture 1 28
Optimum histogram
0 2 5 8 11 14 17 20
0 0,05 0,1 0,15 0,2 0,25 0,3
6 klas (optymalnie)
x
Częstość względna
Statistics allows us to analyze and perform modelling of development of diseases with the aim to prevent epidemics.
• Medical statistics , e.g. the average number of cases (incidence of influenza) in a certain region
• Social statistics, e.g.
density of population
• Industrial statistics, e.g.
GDP (gross domestic product), expenses for medical care
Incidence of swine flu in 2009,USA
(Source: http://commons.wikimedia.org)
Introduction to probability and statistics. lecture 1 30
The role of probability and statistics in
science and engineering
Metrology
Weather forecast models enable to predict potential disasters like storms,
tornados, tsunami, etc.
(Source:stormdebris.net/Math_Forecasting.html)
32
How to solve an engineering problem?
Introduction to probability and statistics. lecture 1
Example: Suppose that an engineer is designing a nylon connector to be used in an automotive engine application. The engineer is
considering establishing the design specification on wall thickness at 3/32 inch but is somewhat uncertain about the effect of this decision on the connector pull-off force. If the pull-off force is too low, the connector may fail when it is installed in an engine.
Problem description
Identification of the most important factors
How to solve an engineering problem?
Eight prototype units are produced and their pull-off forces
measured, resulting in the following data (in pounds): 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6, 13.1. As we anticipated, not all of the prototypes have the same pull-off force. We say that there is
variability in the pull-off force measurements. Because the pull-off force measurements exhibit variability, we consider the pull-off
force to be a random variable.
34
How to solve an engineering problem?
A convenient way to think of a random variable, say X, that represents a measurement, is by using the model
The constant remains the same with every measurement, but small changes in the environment, test equipment, differences in the
individual parts themselves, and so forth change the value of
disturbance. If there were no disturbances, X would always be equal to the constant . However, this never happens in the real world, so the actual measurements X exhibit variability. We often need to describe, quantify and ultimately reduce variability.
constant disturbance
Introduction to probability and statistics. lecture 1
Proposed model
How to solve an engineering problem?
Figure 1-2 presents a dot diagram of these data. The dot diagram is a very useful plot for displaying a small body of data—say, up to about 20 observations. This plot allows us to see easily two features of the data; the location, or the middle, and the scatter or variability. When the number of observations is small, it is usually
difficult to identify any specific patterns in the variability, although the dot diagram is a convenient way to see any unusual data features.
The average pull-off force is 13.0 pounds.
Experiments
36
How to solve an engineering problem?
The need for statistical thinking arises often in the solution of engineering problems.
Consider the engineer designing the connector. From testing the prototypes, he knows that the average pull-off force is 13.0 pounds. However, he thinks that this may be too low for the intended application, so he decides to consider an alternative design with a greater wall thickness, 1/8 inch. Eight prototypes of this design are built, and the observed pull-off force measurements are 12.9, 13.7, 12.8, 13.9, 14.2, 13.2, 13.5, and 13.1.
Results for both samples are plotted as dot diagrams in Fig. 1-3.
.
The average pull-off force is 13.4 pounds.
Introduction to probability and statistics. lecture 1
Model modification
How to solve an engineering problem?
This display gives the impression that increasing the wall thickness has led to an increase in pull-off force.
Confirmation of the solution
Is it really the case?
38
How to solve an engineering problem?
Statistics can help us to answer the following questions:
• How do we know that another sample of prototypes will not give different results?
• Is a sample of eight prototypes adequate to give reliable results?
• If we use the test results obtained so far to conclude that increasing the wall thickness increases the strength, what risks are associated with this decision?
• Is it possible that the apparent increase in pull-off force observed in the thicker prototypes is only due to the inherent variability in the system and that
increasing the thickness of the part (and its cost) really has no effect on the pull-off force?
Introduction to probability and statistics. lecture 1
Conclusions and recommendations
http://physics.nist./gov/Uncertainty
Wyrażanie Niepewności Pomiaru. Przewodnik. Warszawa, Główny Urząd Miar 1999
In October 1992, a new policy on expressing measurement uncertainty was instituted at NIST, National Institute of Standards and Technology.
Elaboration of Guide to Expression of Uncertainty in Measurement by International Organization for Standardization, ISO, 1993
Uncertainty in Measurements
Applicable to results associated with:
• international comparisons of measurement standards,
• basic research,
• applied research and engineering,
• calibrating client measurement standards,
• certifying standard reference materials, and
• generating standard reference data.
MEASUREMENT
The result of a measurement is only an approximation or estimate of the value of the specific quantity subject to measurement, the measurand which can be classified as:
simple, or
complex
Example: Mathematical pendulum, l – the length, T – period are simple measurands; measured directly
Determination of gravitational acceleration : g-complex measurand
g 2 l
T = π
Introduction to probability and statistics. lecture 1 40
In the course of measurements values different from those predicted by theory are obtained. The source of discrepancies between theory and experiment can be traced back to imperfections due to:
-experimentalist,
-measuring equipment, -object measured
More perfect the experiment is made, discrepancies decrease. Error, uncertainty can be reduced.
MEASUREMENT
Result of a measurement should be given in one of the following forms:
m/s
29,866(28) g =
C 10 3)
(98
F = ± ⋅
3Example: In an experiment, the electrochemical equivalent k was found to be:
k=0,0010963 g/C
∆k=0,0000347 g/C
How one can express this result?
significant digits Non-significant digits
Answer. k= (0,00110 ± 0,00004) g/C or k= 0,00110(4) g/C
Introduction to probability and statistics. lecture 1 42
Absolute error
xi – experimental result, x0 – real value
Relative error:
0 i
i x x
x = − Δ
0 i
x Δ x
= δ
(1)
(2)
Uncertainty / error
Note: real values of measurand are unknown in most cases
Uncertainty
Quantities given by formulas (1) and (2) are singular realization of random variable which is why they cannot be treated by theory of uncertainty.
Practically, we do not know real values and estimate uncertainties, due to dispersion of results, from the laws of statistics.
Uncertainty is
• a parameter related to the result of measurements,
• characterized by dispersion
• assigned to the measurand in a justified way.
Introduction to probability and statistics. lecture 1 44
Absolute uncertainty u is expressed in the same units as a measurand
Symbols: u or u(x) or u(concentration of NaCl)
Relative uncertainty u
r(x) the ratio of absolute uncertainty to the measured value:
x x x u
u
r( )
) ( =
Relative uncertainty has no units and can be expressed in %
Absolute and relative uncertainty
Measures of uncertainty
There exist two measures:
standard uncertainty u(x)
maximum uncertainty ∆x
x
0x
x
0-u(x) x
0+u(x)
x
0-Δx x
0+Δx
Introduction to probability and statistics. lecture 1 46
Standard uncertainty
Generally accepted and suggested.
1. Distribution of random variable x
i, with a
dispersion around the average x is characterized by standard deviation defined as:
2. Exact values of standard deviation are unknown.
Standard uncertainty represents an estimate of standard deviation.
( )
n x x
i 2lim
n∑ −
= σ
∞
→
Maximum uncertainty
Deterministic measure.
Within this interval :
x
0- ∆x < x
i< x
0+ ∆x
all the results x
i, will fall.
It is recommended to replace the maximum uncertainty by a standard uncertainty:
3 ) x
x (
u = Δ
Introduction to probability and statistics. lecture 1 48
Classification of errors
Results of measurements follow some regular patterns i.e. they are distributed in a way typical for random variables. According to distribution functions and sources of errors one can distinguish:
Gross errors (mistakes) that have to be eliminated
Systematic error that can be reduced when improving the measurement
Random errors that result from numerous random contributions and cannot be eliminated; they should be treated within the formalism of statistics and probabilistics.
aa1 aa2
Slajd 49
aa1 aaa; 2015-03-31
aa2 aaa; 2015-03-31
Distribution functions
x x
x
0x x
0=x
Φ(x) Φ(x)
Systematic error Random error – Gauss distribution function
Φ(x) – probability density function
Analysis of uncertainties
Type A
All methods that use statistical approach:
•large number of repetitions is required
• applies to random sources of errors Type B
Is based on scientific estimate performed by the experimentalist that has to use all information on the
measurement and the source of its uncertainty
• applies when the laws of statistics cannot be used
•for a systematic error or for a single result of measurement
Introduction to probability and statistics. lecture 1 51
TYPE A
Example:
We have performed a series of measurements getting the following results x1,x2, ….xn.
In such a sample that can be considered as big some of the results are the same; nk is a number of random
experiments, in which the same result xk has occurred.
nk/n is a frequency of the result
xk nk nk/n
5,2 1 0,011
5,3 1 0,011
5,4 2 0,021
5,5 4 0,043
5,6 7 0,075
5,7 10 0,106
5,8 14 0,149
5,9 16 0,170
6,0 13 0,138
6,1 12 0,128
6,2 6 0,064
6,3 4 0,043
6,4 3 0,032
6,5 1 0,011
Sum 94
Introduction to probability and statistics. lecture 1 53
Analysis of data
5 ,2 5 ,4 5 ,6 5 ,8 6 ,0 6 ,2 6 ,4
0 2 4 6 8 1 0 1 2 1 4 1 6
nk
xk
H isto g ra m Arithmetic average
x=5,9
Standard uncertainty
( )
) 1 (
2
−
= −
= ∑
n
x x x
u
iσ
σ=0,2
n x x
n
i
∑
i=
=1Standard uncertainty of the average
( )
) 1 n
( n
x ) x
x ( u
2 i
−
= ∑ −
Gauss distribution function
Probability density function for the result x or its error Δ x according to Gauss
x0 is the most probable result and can be represented by the arithmetic average, σ is standard deviation, σ2 is
variance
⎟⎟ ⎠
⎜⎜ ⎞
⎝
⎛ −
−
=
Φ
20 22
) exp (
2 ) 1
( σ π σ
x x x
Introduction to probability and statistics. lecture 1 55
Normal distribution
2σ 95.4 %
99.7 %
x
Φ(x)
Within the interval x
0-σ < x < x
0+σ we find 68.2 % (2/3), For x
0-2σ < x < x
0+2σ - 95.4 %
For x
0-3σ < x < x
0+3σ - 99.7 %
of all results
68.2%
pow.
0 5 10 15 20 25 30 0
1 2 3
Φ (x)
x
x0=15
σ=2
σ=5
Bigger σ means higher scatter of the results around its average, smaller precision
.Introduction to probability and statistics. lecture 1 57
Gauss distribution function
TYPE B
A type B evaluation of standard uncertainty is
usually based on scientific judgement using all the relevant information available, which may include:
• previous measurement data,
• experience with, or general knowledge of, the behavior and property of relevant materials and instruments,
• manufacturer’s specification
• data provided in calibration and other reports
• uncertainties assigned to reference data taken from handbooks
Type A evaluations of uncertainty based on limited data are not necessarily more reliable than soundly based Type B evaluations.
Introduction to probability and statistics. lecture 1 59
TYPE B
Example: Type B uncertainty of pendulum length measurement.
Using a ruler the following results were obtained:
L=140 mm, u(L)=1 mm (elemental scale interval), u
r(L)=u(L)/L=1/140, percentage uncertainty 0,7%
Most often the type B deals with evaluation of uncertainty resulting from a finite accuracy of an instrument.
TYPE B
Uncertainty of complex measurand – propagation of errors
0 2 4
0 20 40 60 80 100 120 140
y
x
u(y)
u(x) function
y = f(x)
tangent dy/dx
) x ( dx u ) dy
y (
u =
Introduction to probability and statistics. lecture 1 61
Total differential
For a complex measurand y=f(x
1,x
2,...x
n) under the assumption that Δx
1, Δx
2, ... Δx
nare small as compared with measured x
1,x
2, ... x
n, maximum uncertainty of y can be calculated from the differential calculus :
n n
x x x y
x x y
x
y y Δ
∂ + ∂ +
∂ Δ + ∂
∂ Δ
= ∂
Δ
2...
2 1
1
(3)
Law of propagation of uncertainties
Standard uncertainty of complex measurand y=f(x
1,x
2,...x
n) can be calculated from the law of propagation of uncertainties as a geometric sum of partial differentials.
2 2
2 2
2 1 1
) ( ...
) ( )
( )
( ⎥
⎦
⎢ ⎤
⎣
⎡
∂ + ∂
⎥ +
⎦
⎢ ⎤
⎣
⎡
∂ + ∂
⎥ ⎦
⎢ ⎤
⎣
⎡
∂
= ∂
nn
c
u x
x x y
x u x y
x u y y
u
y y y u
u
cr c( ) )
( =
Introduction to probability and statistics. lecture 1 63
Example
In a certain experiment one determines gravitational acceleration g on Earth by measuring the period T and length L of a mathematical pendulum. Directly
measured length is reported as 1.1325±0.0014 m.
Independently estimated relative uncertainty of period measurement is 0,06%, i.e.,
4
r
6 10
T ) T ( ) u
T (
u = = ⋅
−Calculate the relative uncertainty of g assuming that
the uncertainties of L and T are independent and result
from random sources of errors.
0 40 80 120 160 200 240 280 320 6070
8090 100110 120130 140150 160170 180
Rules applied to data plotting
Is this graph made according to the rules?
1. Mark the experimental points!!!
Introduction to probability and statistics. lecture 1 65
2. Measurement uncertainty is missing
0 40 80 120 160 200 240 280 320 60 70
80 90
100 110
120 130
140 150
160 170
180
3. Adjust the axis to the range of experimental data!!!
0 40 80 120 160 200 240 280 320 60 70
80 90 100 110 120 130 140 150 160 170 180
Introduction to probability and statistics. lecture 1 67
4. Properly describe the axes and
choose the scale in order to read the data easily.
160 200 240 280 320 60 70
80 90 100 110 120 130 140 150 160 170 180
What quantity is represented by this axis???
5. Do not connect the experimental points by polygonal chains!!! If the theoretical model is known, it is advised to make a fit to the experimental data.
160 200 240 280 320 60
90 120 150 180
ρ [μΩ
cm
]T [K]
Introduction to probability and statistics.
lecture 1 69
160 200 240 280 320 60
90 120 150 180
dane eksperymentalne dopasowanie
ρ [μΩ
cm
]T [K]
6. Take care of the esthetic aspect of your plot (legend, frame, etc.)
experimental data theoretical fit
160 200 240 280 320 60
90 120 150 180
dane eksperymentalne dopasowanie
ρ [μΩ cm ]
T [K]
Wykres 1
Rezystywnosc ρ probki Bi w funkcji temperatury T
Introduction to probability and statistics. lecture 1 71
experimental data theoretical fit
Fig.1
Resistivity ρ of Bi sample as a function of temperature T
Least Square Method - Linear Regression
4 6 8 10 12 14 16
0 20 40 60
f(xi) yi
xi
y
x f(x)=ax+b
a=3.23, b=-2.08
( )
[ ]
2min
2
= ∑
ny
i− ax
i+ b =
S
USEFUL HINTS
1. Results of laboratory measurements suffer from uncertainties, that the researcher is obliged to
estimate according to certain rules.
2. In the first place, one has to find all possible sources of errors, keeping in mind that results with gross
errors should not be taken into account. In student
laboratory systematic errors usually mask random errors.
3. Multiple repetitions of measurement does not make sense when the systematic error
predominates. In this case one should perform up to 3-5 measurements under the same conditions in order to make sure that the results are reproducible.
Introduction to probability and statistics. lecture 1 73
4. When random events are the main source of errors, it is necessary to make sure that distribution of results can be described by Gauss function. If not, should one expect some other distribution function? In order to solve this problem one has to repeat the measurements (e.g. 100 times) under the same conditions, calculate the average and variance, draw a histogram, etc.
5. As a measure of uncertainty use rather standard uncertainty, scarcely maximum uncertainty.
6. In the case of complex measurand, one should apply laws of error propagation. An effort should be made in order to estimate the contributions to the total value of error coming from measurements of simple measurands.
In order to achieve this goal one has to calculate relative uncertainties.
USEFUL HINTS
7. Graph is quite important part of lab report (not only in the student’s laboratory). Graphs should be prepared according to certain rules, unambiguous description is required.
8. If a theoretical model of phenomenon under study is known, one should place a theoretical curve (continuous line) upon clearly distinguished experimental points (right size symbols should be chosen; experimental cross-bar
errors should be included). Well-known methods of fitting should be applied.
9. Whenever possible, we can perform linearization of data,
plotting e.g., y vs. ln (x), or log y vs. log x, or y vs. 1/x etc.
To data prepared in such a way one can apply a method of linear regression.
Introduction to probability and statistics. lecture 1 75