INTRODUCTION TO DATA SCIENCE

(1)

INTRODUCTION TO DATA SCIENCE

WFAiS UJ, Informatyka Stosowana I stopień studiów

1

10/11, 17/11, 24/11/2020

This lecture is

based on course by E. Fox and C. Guestrin, Univ of Washington

(2)

What is a classification?

10/11, 17/11, 24/11/2020

2

(3)

Overwiew of the content

10/11, 17/11, 24/11/2020

3

(4)

10/11, 17/11, 24/11/2020

4

Linear classifier

(5)

An inteligent restaurant review system

10/11, 17/11, 24/11/2020

5

(6)

Reviews

10/11, 17/11, 24/11/2020

6

(7)

Classifying sentiment of review

10/11, 17/11, 24/11/2020

7

(8)

Classifier

10/11, 17/11, 24/11/2020

8

(9)

A (linear) classifier

10/11, 17/11, 24/11/2020

9

(10)

Scoring a sentence

10/11, 17/11, 24/11/2020

10

Score(xi) = 1.2+1.7 -2.1 = 0.8 >0

=> y = +1

positive review

(11)

Simple linear classifier

10/11, 17/11, 24/11/2020

11

(12)

Training a classifier = Learning the coefficients

10/11, 17/11, 24/11/2020

12

We will discuss

latter how do we

learn clasifier

from data

(13)

Decision boundary example

10/11, 17/11, 24/11/2020

13

(14)

Decision boundary

10/11, 17/11, 24/11/2020

14

(15)

Flow chart:

10/11, 17/11, 24/11/2020

15

(16)

Coefficients of classifier

10/11, 17/11, 24/11/2020

16

(17)

General notation

10/11, 17/11, 24/11/2020

17

(18)

Simple hyperplane

10/11, 17/11, 24/11/2020

18

(19)

D-dimensional hyperplane

10/11, 17/11, 24/11/2020

19

(20)

Flow chart:

10/11, 17/11, 24/11/2020

20

(21)

10/11, 17/11, 24/11/2020

21

Linear classifier

 Class probability

(22)

How confident is your prediction?

10/11, 17/11, 24/11/2020

22

(23)

Basics of probabilities

10/11, 17/11, 24/11/2020

23

(24)

Interpreting probabilities as degrees of belief

10/11, 17/11, 24/11/2020

24

(25)

Conditional probability

10/11, 17/11, 24/11/2020

25

(26)

Interpreting conditional probabilities

10/11, 17/11, 24/11/2020

26

(27)

How confident is your prediction?

10/11, 17/11, 24/11/2020

27

(28)

Learn conditional probabilities from data

10/11, 17/11, 24/11/2020

28

(29)

Predicting class probabilities

10/11, 17/11, 24/11/2020

29

(30)

Flow chart:

10/11, 17/11, 24/11/2020

30

(31)

Thus far we focused on decision boundaries

10/11, 17/11, 24/11/2020

31

How to relate

(32)

Interpreting Score(x _i )

10/11, 17/11, 24/11/2020

32

(33)

Why not just use regression to build classifier?

10/11, 17/11, 24/11/2020

33

(34)

Link function

10/11, 17/11, 24/11/2020

34

(35)

Flow chart:

10/11, 17/11, 24/11/2020

35

(36)

10/11, 17/11, 24/11/2020

36

Logistic regression classifier:

 linear score with logistic link

function

(37)

Simplest link function: sign(z)

10/11, 17/11, 24/11/2020

37

(38)

Logistic function (sigmoid, logit)

10/11, 17/11, 24/11/2020

38

0.5 0.0 0.12 0.88 1.0

(39)

Logistic regression model

10/11, 17/11, 24/11/2020

39

(40)

Understanding the logistic regression model

10/11, 17/11, 24/11/2020

40

0 -2 2 4

0.5

0.12

0.88

0.98

(41)

Logistic regression

10/11, 17/11, 24/11/2020

41

Score(xi) < 0

Score(xi) >0

(42)

Effect of coefficients

10/11, 17/11, 24/11/2020

42

(43)

Flow chart:

10/11, 17/11, 24/11/2020

43

(44)

Learning logistic regression model

10/11, 17/11, 24/11/2020

44

(45)

Categorical inputs

10/11, 17/11, 24/11/2020

45

(46)

Encoding categories as numeric features

10/11, 17/11, 24/11/2020

46

(47)

Multiclass classification

10/11, 17/11, 24/11/2020

47

(48)

Multiclass classification

10/11, 17/11, 24/11/2020

48

(49)

1 versus all

10/11, 17/11, 24/11/2020

49

(50)

1 versus all

10/11, 17/11, 24/11/2020

50

(51)

10/11, 17/11, 24/11/2020

51

(52)

Summary: Logistic regression classifier

10/11, 17/11, 24/11/2020

52

(53)

What you can do now…

10/11, 17/11, 24/11/2020

53

(54)

10/11, 17/11, 24/11/2020

54

Linear classifier

 Parameters learning

(55)

Learn a probabilistic classification model

10/11, 17/11, 24/11/2020

55

(56)

A (linear) classifier

10/11, 17/11, 24/11/2020

56

(57)

Logistic regression

10/11, 17/11, 24/11/2020

57

(58)

Flow chart:

10/11, 17/11, 24/11/2020

58

(59)

Learning problem

10/11, 17/11, 24/11/2020

59

(60)

Finding best coefficients

10/11, 17/11, 24/11/2020

60

(61)

Quality metric: probability of data

10/11, 17/11, 24/11/2020

61

(62)

Maximizing likelihood (probability of data)

10/11, 17/11, 24/11/2020

62

(63)

Maximum likelihood estimation (MLE)

10/11, 17/11, 24/11/2020

63

Learn logistic regression model with MLE

(64)

Flow chart:

10/11, 17/11, 24/11/2020

64

(65)

Find „best” classifier

10/11, 17/11, 24/11/2020

65

(66)

Find best classifier

10/11, 17/11, 24/11/2020

66

(67)

Maximizing likelihood

10/11, 17/11, 24/11/2020

67

(68)

Gradient ascent

10/11, 17/11, 24/11/2020

68

Finding the max via hill climbing

(69)

Gradient ascent

10/11, 17/11, 24/11/2020

69

Convergence criteria

(70)

Gradient ascent

10/11, 17/11, 24/11/2020

70

(71)

Gradient ascent

10/11, 17/11, 24/11/2020

71

(72)

Gradient ascent

10/11, 17/11, 24/11/2020

72

(73)

The log trick, often used in ML…

10/11, 17/11, 24/11/2020

73

(74)

Derivative for logistic regression

10/11, 17/11, 24/11/2020

74

See slides at the end of this lecture

If you are interested how it is derived.

(75)

10/11, 17/11, 24/11/2020

75

Derivative for logistic regression

(76)

Derivative for logistic regression

10/11, 17/11, 24/11/2020

76

(77)

Gradient ascent for logistic regression

10/11, 17/11, 24/11/2020

77

(78)

Choosing the step size

10/11, 17/11, 24/11/2020

78

(79)

Choosing the step size

10/11, 17/11, 24/11/2020

79

(80)

Choosing the step size

10/11, 17/11, 24/11/2020

80

(81)

Choosing the step size

10/11, 17/11, 24/11/2020

81

(82)

Choosing the step size

10/11, 17/11, 24/11/2020

82

(83)

Flow chart: final look at it

10/11, 17/11, 24/11/2020

83

(84)

What you can do now

10/11, 17/11, 24/11/2020

84

(85)

10/11, 17/11, 24/11/2020

85

Linear classifier

 Overfitting & regularization

(86)

Training a classifier = Learning the coefficients

10/11, 17/11, 24/11/2020

86

(87)

Classification error & accuracy

10/11, 17/11, 24/11/2020

87

(88)

Overfitting in classification

10/11, 17/11, 24/11/2020

88

Decision boundary example

(89)

Overfitting in classification

10/11, 17/11, 24/11/2020

89

Learned decision boundary

(90)

Overfitting in classification

10/11, 17/11, 24/11/2020

90

Quadratic features (in 2d)

(91)

Overfitting in classification

10/11, 17/11, 24/11/2020

91

Degree 6 features (in 2d)

(92)

Overfitting in classification

10/11, 17/11, 24/11/2020

92

Degree 20 features (in 2d)

(93)

Overfitting in classification

10/11, 17/11, 24/11/2020

93

(94)

Overfitting in logistic regression

10/11, 17/11, 24/11/2020

94

Remember about this

probability interpretation

(95)

Effect of coefficients on logistic regression model

10/11, 17/11, 24/11/2020

95

With increasing coefficients model becomes overconfident on

predictions

(96)

Learned probabilities

10/11, 17/11, 24/11/2020

96

(97)

Quadratic features: learned probabilities

10/11, 17/11, 24/11/2020

97

(98)

Overfitting → overconfident predictions

10/11, 17/11, 24/11/2020

98

(99)

Quality metric → penelazing large coefficients

10/11, 17/11, 24/11/2020

99

(100)

Desired total cost format

10/11, 17/11, 24/11/2020

100

(101)

Maximum likelihood estimation (MLE)

10/11, 17/11, 24/11/2020

101

 Measure of fit = Data likelihood

!!!

(102)

Measure of magnitude of logistic regression coefficients

10/11, 17/11, 24/11/2020

102

(103)

Consider specific total cost

10/11, 17/11, 24/11/2020

103

(104)

Consider resulting objectives

10/11, 17/11, 24/11/2020

104

(105)

Consider resulting objectives

10/11, 17/11, 24/11/2020

105

(106)

Bias-variance tradeoff

10/11, 17/11, 24/11/2020

106

(107)

Visualizing effect of regularisation

10/11, 17/11, 24/11/2020

107

(108)

Visualizing effect of regularisation

10/11, 17/11, 24/11/2020

108

(109)

Effect of regularisation

10/11, 17/11, 24/11/2020

109

(110)

Visualizing effect of regularisation

10/11, 17/11, 24/11/2020

110

(111)

Flow chart:

10/11, 17/11, 24/11/2020

111

Lets discuss now

finding best

L2-regularized

linear classifier

with gradient ascent

(112)

Gradient ascent

10/11, 17/11, 24/11/2020

112

(113)

Gradient of L2 regularized log-likelihood

10/11, 17/11, 24/11/2020

113

(114)

Gradient of L2 regularized log-likelihood

10/11, 17/11, 24/11/2020

114

(115)

Gradient of L2 regularized log-likelihood

10/11, 17/11, 24/11/2020

115

(116)

Gradient ascent with L2 regularization

10/11, 17/11, 24/11/2020

116

(117)

Logistic regression with L1 regularization

10/11, 17/11, 24/11/2020

117

(118)

Sparse logistic regression

10/11, 17/11, 24/11/2020

118

(119)

L1 regularised logistic regression

10/11, 17/11, 24/11/2020

119

(120)

L1 regularised logistic regression

10/11, 17/11, 24/11/2020

120

(121)

What you can do now…

10/11, 17/11, 24/11/2020

121

(122)

10/11, 17/11, 24/11/2020

122

Decision trees

(123)

What makes a loan risky?

10/11, 17/11, 24/11/2020

123

(124)

Credit history explained

10/11, 17/11, 24/11/2020

124

(125)

Income

10/11, 17/11, 24/11/2020

125

(126)

Loan terms

10/11, 17/11, 24/11/2020

126

(127)

Personal information

10/11, 17/11, 24/11/2020

127

(128)

Inteligent application

10/11, 17/11, 24/11/2020

128

(129)

Classifier: review type

10/11, 17/11, 24/11/2020

129

(130)

Classifier: decision trees

10/11, 17/11, 24/11/2020

130

(131)

Scoring a loan application

10/11, 17/11, 24/11/2020

131

(132)

Scoring a loan application

10/11, 17/11, 24/11/2020

132

(133)

Scoring a loan application

10/11, 17/11, 24/11/2020

133

(134)

Decision tree model

10/11, 17/11, 24/11/2020

134

(135)

Flow chart:

10/11, 17/11, 24/11/2020

135

(136)

Learn decision tree from data

10/11, 17/11, 24/11/2020

136

(137)

Learn decision tree from data

10/11, 17/11, 24/11/2020

137

(138)

Quality metric: Classification error

10/11, 17/11, 24/11/2020

138

(139)

Find the tree with lowest classification error

10/11, 17/11, 24/11/2020

139

(140)

How do we find the best tree?

10/11, 17/11, 24/11/2020

140

(141)

Simple (greedy) algorithm finds good tree

10/11, 17/11, 24/11/2020

141

(142)

Greedy algorithm

10/11, 17/11, 24/11/2020

142

(143)

Greedy algorithm

10/11, 17/11, 24/11/2020

143

(144)

Greedy algorithm

10/11, 17/11, 24/11/2020

144

(145)

Greedy algorithm

10/11, 17/11, 24/11/2020

145

(146)

Greedy algorithm

10/11, 17/11, 24/11/2020

146

(147)

Greedy decision tree learning

10/11, 17/11, 24/11/2020

147

(148)

Feature split learning

10/11, 17/11, 24/11/2020

148

(149)

Feature split learning

10/11, 17/11, 24/11/2020

149

Compact notation

(150)

Decision stump: single level tree

10/11, 17/11, 24/11/2020

150

(151)

Making predictions with a decision stump

10/11, 17/11, 24/11/2020

151

(152)

How do we select the best feature to split on?

10/11, 17/11, 24/11/2020

152

(153)

How do we measure effectiveness of a split?

10/11, 17/11, 24/11/2020

153

(154)

Calculating classification error

10/11, 17/11, 24/11/2020

154

(155)

Classification error

10/11, 17/11, 24/11/2020

155

(156)

Classification error

10/11, 17/11, 24/11/2020

156

(157)

Choice 1 vs Choise 2

10/11, 17/11, 24/11/2020

157

(158)

Feauture split selection algorithm

10/11, 17/11, 24/11/2020

158

(159)

Greedy decision tree learning algorithm

10/11, 17/11, 24/11/2020

159

(160)

Recursive stump learning

10/11, 17/11, 24/11/2020

160

(161)

Recursive stump learning

10/11, 17/11, 24/11/2020

161

(162)

Simple greedy decision tree learning

10/11, 17/11, 24/11/2020

162

Recursive algorithm

(163)

Stopping condition 1

10/11, 17/11, 24/11/2020

163

(164)

Stopping condition 2

10/11, 17/11, 24/11/2020

164

(165)

Greedy decision tree algorithm

10/11, 17/11, 24/11/2020

165

(166)

Predictions with decision trees

10/11, 17/11, 24/11/2020

166

(167)

Predictions with decision trees

10/11, 17/11, 24/11/2020

167

(168)

Predictions with decision tree

10/11, 17/11, 24/11/2020

168

(169)

Multiclass prediction

10/11, 17/11, 24/11/2020

169

(170)

Multiclass decision stump

10/11, 17/11, 24/11/2020

170

(171)

Predicting probabilities with decision trees

10/11, 17/11, 24/11/2020

171

(172)

How to use real values inputs

10/11, 17/11, 24/11/2020

172

(173)

How to use real values inputs

10/11, 17/11, 24/11/2020

173

(174)

Visualizing the threshold split

10/11, 17/11, 24/11/2020

174

(175)

Visualizing the threshold split

10/11, 17/11, 24/11/2020

175

(176)

Visualizing the threshold split

10/11, 17/11, 24/11/2020

176

(177)

Visualizing the threshold split

10/11, 17/11, 24/11/2020

177

(178)

Finding the best threshold split

10/11, 17/11, 24/11/2020

178

(179)

Finding the best threshold split

10/11, 17/11, 24/11/2020

179

(180)

Decision trees vs logistic regression

10/11, 17/11, 24/11/2020

180

(181)

Decision trees vs logistic regression

10/11, 17/11, 24/11/2020

181

(182)

Decision trees vs logistic regression

10/11, 17/11, 24/11/2020

182

(183)

Decision tree vs logistic regression

10/11, 17/11, 24/11/2020

183

(184)

Decision tree vs logistic regression

10/11, 17/11, 24/11/2020

184

(185)

Decision tree vs logistic regression

10/11, 17/11, 24/11/2020

185

(186)

What you can do now

10/11, 17/11, 24/11/2020

186

(187)

10/11, 17/11, 24/11/2020

187

Overfitting

in decision trees

(188)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

188

(189)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

189

(190)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

190

(191)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

191

(192)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

192

(193)

Overfitting in decision tree

10/11, 17/11, 24/11/2020

193

(194)

Simplest tree is better

10/11, 17/11, 24/11/2020

194

(195)

Simplest tree is better

10/11, 17/11, 24/11/2020

195

(196)

Simplest tree is better

10/11, 17/11, 24/11/2020

196

(197)

Simplest tree is better

10/11, 17/11, 24/11/2020

197

(198)

Early stopping for learning decision trees

10/11, 17/11, 24/11/2020

198

(199)

Early stopping condition 1

10/11, 17/11, 24/11/2020

199

(200)

Early stopping condition 2

10/11, 17/11, 24/11/2020

200

INTRODUCTION TO DATA SCIENCE