INTRODUCTION TO DATA SCIENCE
WFAiS UJ, Informatyka Stosowana I stopień studiów
1
13/10/2020
Lecture based on:
M. Cetinkays-Rundel, „Data Analysis and Statistical Inference”, Univ. of Duke
Exploratory data analysis
13/10/2020
2
How to collect, visualise and interpret the data.
Exploratory data analysis
13/10/2020
3
Data: basics
13/10/2020
4
Observations, variables, data matrices
Type of variables
Relationship between variables
Example: data matrix
13/10/2020
5
Requests send to Google to remove links from the search engine database.
Type of variables
13/10/2020
6
Numerical variables
13/10/2020
7
Categorical variables
13/10/2020
8
Data matrix
13/10/2020
9
Data matrix
13/10/2020
10
Data matrix
13/10/2020
11
Data matrix
13/10/2020
12
Data matrix
13/10/2020
13
Data matrix
13/10/2020
14
Data matrix
13/10/2020
15
Relationships between variables
13/10/2020
16
Observational studies & experiments
13/10/2020
17
Observational studies & experiments
13/10/2020
18
Correlation & Causation
13/10/2020
19
Case study
Possible explanations
13/10/2020
20
Confounding variables
13/10/2020
21
Correlation & Causation
13/10/2020
22
Correlation does not imply causation
Sampling & sources bias
13/10/2020
23
Census vs sample
Source o bias
Sampling methods
Census
13/10/2020
24
A few sources of sampling bias
13/10/2020
25
A few sources of sampling bias
13/10/2020
26
Sampling methods
13/10/2020
27
Sampling methods: random sampling
13/10/2020
28
selected
Sampling methods: stratified sample
13/10/2020
29
selected
Sampling methods: cluster
13/10/2020
30
selected
Vizualizing numerical data
13/10/2020
31
Scatter plots for paired data
Other visualizations for describing distributions of numerical variables
Data matrix
13/10/2020
32
Scatterplots
13/10/2020
33
Evaluating their relationship
13/10/2020
34
Histogram
13/10/2020
35
Histogram
13/10/2020
36
Histogram
13/10/2020
37
Histogram
13/10/2020
38
Box plot
13/10/2020
39
Box plot
13/10/2020
40
Intensity map
13/10/2020
41
Measures of center
13/10/2020
42
Measures of center
13/10/2020
43
Measures of center
13/10/2020
44
Data matrix
13/10/2020
45
Measures of center
13/10/2020
46
Skewness vs. measures of center
13/10/2020
47
Measures of spread
13/10/2020
48
Measures of spread
13/10/2020
49
Measures of spread
13/10/2020
50
Measures of spread
13/10/2020
51
Measures of spread
13/10/2020
52
Robust statistics
13/10/2020
53
Robust statistics
13/10/2020
54
Transforming data
13/10/2020
55
Transforming data
13/10/2020
56
Transforming data
13/10/2020
57
Transforming data
13/10/2020
58
Exploring categorical variables
13/10/2020
59
Exploring categorical variables
13/10/2020
60
Exploring categorical variables
13/10/2020
61
Exploring categorical variables
13/10/2020
62
Exploring categorical variables
13/10/2020
63
Exploring categorical variables
13/10/2020
64
Relative frequencies
Exploring categorical variables
13/10/2020
65
Exploring categorical variables
13/10/2020
66
Exploring categorical variables
13/10/2020
67
Exploring categorical variables
13/10/2020
68