• Nie Znaleziono Wyników

INTRODUCTION TO DATA SCIENCE

N/A
N/A
Protected

Academic year: 2021

Share "INTRODUCTION TO DATA SCIENCE"

Copied!
68
0
0

Pełen tekst

(1)

INTRODUCTION TO DATA SCIENCE

WFAiS UJ, Informatyka Stosowana I stopień studiów

1

13/10/2020

Lecture based on:

M. Cetinkays-Rundel, „Data Analysis and Statistical Inference”, Univ. of Duke

(2)

Exploratory data analysis

13/10/2020

2

How to collect, visualise and interpret the data.

(3)

Exploratory data analysis

13/10/2020

3

(4)

Data: basics

13/10/2020

4

Observations, variables, data matrices

Type of variables

Relationship between variables

(5)

Example: data matrix

13/10/2020

5

Requests send to Google to remove links from the search engine database.

(6)

Type of variables

13/10/2020

6

(7)

Numerical variables

13/10/2020

7

(8)

Categorical variables

13/10/2020

8

(9)

Data matrix

13/10/2020

9

(10)

Data matrix

13/10/2020

10

(11)

Data matrix

13/10/2020

11

(12)

Data matrix

13/10/2020

12

(13)

Data matrix

13/10/2020

13

(14)

Data matrix

13/10/2020

14

(15)

Data matrix

13/10/2020

15

(16)

Relationships between variables

13/10/2020

16

(17)

Observational studies & experiments

13/10/2020

17

(18)

Observational studies & experiments

13/10/2020

18

(19)

Correlation & Causation

13/10/2020

19

Case study

(20)

Possible explanations

13/10/2020

20

(21)

Confounding variables

13/10/2020

21

(22)

Correlation & Causation

13/10/2020

22

Correlation does not imply causation

(23)

Sampling & sources bias

13/10/2020

23

Census vs sample

Source o bias

Sampling methods

(24)

Census

13/10/2020

24

(25)

A few sources of sampling bias

13/10/2020

25

(26)

A few sources of sampling bias

13/10/2020

26

(27)

Sampling methods

13/10/2020

27

(28)

Sampling methods: random sampling

13/10/2020

28

selected

(29)

Sampling methods: stratified sample

13/10/2020

29

selected

(30)

Sampling methods: cluster

13/10/2020

30

selected

(31)

Vizualizing numerical data

13/10/2020

31

Scatter plots for paired data

Other visualizations for describing distributions of numerical variables

(32)

Data matrix

13/10/2020

32

(33)

Scatterplots

13/10/2020

33

(34)

Evaluating their relationship

13/10/2020

34

(35)

Histogram

13/10/2020

35

(36)

Histogram

13/10/2020

36

(37)

Histogram

13/10/2020

37

(38)

Histogram

13/10/2020

38

(39)

Box plot

13/10/2020

39

(40)

Box plot

13/10/2020

40

(41)

Intensity map

13/10/2020

41

(42)

Measures of center

13/10/2020

42

(43)

Measures of center

13/10/2020

43

(44)

Measures of center

13/10/2020

44

(45)

Data matrix

13/10/2020

45

(46)

Measures of center

13/10/2020

46

(47)

Skewness vs. measures of center

13/10/2020

47

(48)

Measures of spread

13/10/2020

48

(49)

Measures of spread

13/10/2020

49

(50)

Measures of spread

13/10/2020

50

(51)

Measures of spread

13/10/2020

51

(52)

Measures of spread

13/10/2020

52

(53)

Robust statistics

13/10/2020

53

(54)

Robust statistics

13/10/2020

54

(55)

Transforming data

13/10/2020

55

(56)

Transforming data

13/10/2020

56

(57)

Transforming data

13/10/2020

57

(58)

Transforming data

13/10/2020

58

(59)

Exploring categorical variables

13/10/2020

59

(60)

Exploring categorical variables

13/10/2020

60

(61)

Exploring categorical variables

13/10/2020

61

(62)

Exploring categorical variables

13/10/2020

62

(63)

Exploring categorical variables

13/10/2020

63

(64)

Exploring categorical variables

13/10/2020

64

Relative frequencies

(65)

Exploring categorical variables

13/10/2020

65

(66)

Exploring categorical variables

13/10/2020

66

(67)

Exploring categorical variables

13/10/2020

67

(68)

Exploring categorical variables

13/10/2020

68

Cytaty

Powiązane dokumenty

 Personalisation: purhase history, monthly and yearly trends, etc.?. Customers who bought product A also bought

Cetinkaya-Rundel, Duke University Data Analysis and

Case studied are about building, evaluating, deploying inteligence in data analysis.. Regression: Predicting

– Time for you to write your code and (for me) to disscuss with each student her/his progress with assignments.. • COVID-19 times:

Case studied are about building, evaluating, deploying inteligence in data analysis. Use pre-specified or develop

Cetinkaya-Rundel, Duke University Data Analysis and

Guestrin, Univ

Case studied are about building, evaluating, deploying inteligence in data analysis. Use pre-specified or develop