MACHINE LEARNING:

(1)

DATA SCIENCE WITH MACHINE LEARNING:

CLASSIFICATION

WFAiS UJ, Informatyka Stosowana I stopień studiów

1

12/01/2021

This lecture is

based on course by E. Fox and C. Guestrin, Univ of Washington

(2)

What is a classification?

12/01/2021

2

(3)

Overwiew of the content

12/01/2021

3

(4)

12/01/2021

4

Linear classifier

(5)

An inteligent restaurant review system

12/01/2021

5

(6)

Classifying sentiment of review

12/01/2021

6

(7)

A (linear) classifier: scoring a sentence

12/01/2021

7

Score(xi) = 1.2+1.7 -2.1 = 0.8 >0

=> y = +1

positive review

(8)

Training a classifier = Learning the coefficients

12/01/2021

8

We will discuss latter how do we learn clasifier from data

(9)

Decision boundary example

12/01/2021

9

(10)

Decision boundary

12/01/2021

10

(11)

Flow chart:

12/01/2021

11

(12)

Coefficients of classifier

12/01/2021

12

(13)

General notation

12/01/2021

13

(14)

Simple hyperplane

12/01/2021

14

(15)

D-dimensional hyperplane

12/01/2021

15

(16)

Flow chart:

12/01/2021

16

(17)

12/01/2021

17

Linear classifier



Class probability

(18)

How confident is your prediction?

12/01/2021

18

(19)

Conditional probability

12/01/2021

19

(20)

Interpreting conditional probabilities

12/01/2021

20

(21)

How confident is your prediction?

12/01/2021

21

(22)

Learn conditional probabilities from data

12/01/2021

22

(23)

Predicting class probabilities

12/01/2021

23

(24)

Flow chart:

12/01/2021

24

(25)

Why not just use regression to build classifier?

12/01/2021

25

(26)

Link function

12/01/2021

26

(27)

Flow chart:

12/01/2021

27

(28)

12/01/2021

28

Logistic regression classifier:



linear score with logistic link

function

(29)

Simplest link function: sign(z)

12/01/2021

29

(30)

Logistic function (sigmoid, logit)

12/01/2021

30

0.5

0.0 0.12 0.88 1.0

(31)

Logistic regression model

12/01/2021

31

(32)

Effect of coefficients

12/01/2021

32

(33)

Flow chart:

12/01/2021

33

(34)

Learning logistic regression model

12/01/2021

34

(35)

Categorical inputs

12/01/2021

35

(36)

Encoding categories as numeric features

12/01/2021

36

(37)

Multiclass classification

12/01/2021

37

(38)

1 versus all

12/01/2021

38

(39)

1 versus all

12/01/2021

39

(40)

Summary: Logistic regression classifier

12/01/2021

40

(41)

12/01/2021

41

Linear classifier



Parameters learning

(42)

Maximizing likelihood (probability of data)

12/01/2021

42

(43)

Maximum likelihood estimation (MLE)

12/01/2021

43

Learn logistic regression model with MLE

(44)

Flow chart:

12/01/2021

44

(45)

Find „best” classifier

12/01/2021

45

(46)

Maximizing likelihood

12/01/2021

46

(47)

Gradient ascent

12/01/2021

47

Convergence criteria

(48)

Gradient ascent

12/01/2021

48

(49)

The log trick, often used in ML…

12/01/2021

49

(50)

Derivative for logistic regression

12/01/2021

50

See slides at the end of this lecture If you are interested how it is derived.

(51)

12/01/2021

51

Derivative for logistic regression

(52)

Choosing the step size

12/01/2021

52

(53)

12/01/2021

53

(54)

12/01/2021

54

(55)

12/01/2021

55

(56)

12/01/2021

56

(57)

Flow chart: final look at it

12/01/2021

57

(58)

12/01/2021

58

Linear classifier



Overfitting & regularization

(59)

Training a classifier = Learning the coefficients

12/01/2021

59

(60)

Classification error & accuracy

12/01/2021

60

(61)

Overfitting in classification

12/01/2021

61

Decision boundary example

(62)

12/01/2021

62

Learned decision boundary

(63)

12/01/2021

63

Quadratic features (in 2d)

(64)

12/01/2021

64

Degree 6 features (in 2d)

(65)

12/01/2021

65

Degree 20 features (in 2d)

(66)

12/01/2021

66

(67)

Overfitting in logistic regression

12/01/2021

67

Remember about this probability interpretation

(68)

Effect of coefficients on logistic regression model

12/01/2021

68

With increasing coefficients model becomes overconfident on predictions

(69)

Learned probabilities

12/01/2021

69

(70)

Quadratic features: learned probabilities

12/01/2021

70

(71)

Overfitting → overconfident predictions

12/01/2021

71

(72)

Quality metric → penelazing large coefficients

12/01/2021

72

(73)

Desired total cost format

12/01/2021

73

(74)

Measure of magnitude of logistic regression coefficients

12/01/2021

74

(75)

Visualizing effect of regularisation

12/01/2021

75

(76)

Effect of regularisation

12/01/2021

76

(77)

Visualizing effect of regularisation

12/01/2021

77

(78)

Sparse logistic regression

12/01/2021

78

(79)

L1 regularised logistic regression

12/01/2021

79

(80)

12/01/2021

80

Decision trees

(81)

What makes a loan risky?

12/01/2021

81

(82)

Classifier: decision trees

12/01/2021

82

(83)

Quality metric: Classification error

12/01/2021

83

(84)

Find the tree with lowest classification error

12/01/2021

84

(85)

How do we find the best tree?

12/01/2021

85

(86)

Simple (greedy) algorithm finds good tree

12/01/2021

86

(87)

Greedy decision tree learning

12/01/2021

87

(88)

How do we select the best feature to split on?

12/01/2021

88

(89)

Classification error

12/01/2021

89

(90)

Classification error

12/01/2021

90

(91)

Choice 1 vs Choise 2

12/01/2021

91

(92)

Greedy decision tree learning algorithm

12/01/2021

92

(93)

Greedy decision tree algorithm

12/01/2021

93

(94)

Decision trees vs logistic regression

12/01/2021

94

(95)

Decision trees vs logistic regression

12/01/2021

95

(96)

Decision tree vs logistic regression

12/01/2021

96

(97)

12/01/2021

97

Overfitting

in decision trees

(98)

Overfitting in decision tree

12/01/2021

98

(99)

Overfitting in decision tree

12/01/2021

99

(100)

Early stopping

12/01/2021

100

(101)

Greedy decision tree learning

12/01/2021

101

(102)

12/01/2021

102

Strategies for

handling missing data

(103)

Handling missing data

12/01/2021

103

(104)

12/01/2021

104

(105)

12/01/2021

105

(106)

Idea 3: addapt algorithm

12/01/2021

106

(107)

Feature split selection with missing data

12/01/2021

107

(108)

Idea 3: addapt algorithm

12/01/2021

108

(109)

12/01/2021

109

Ensemble classifiers

and boosting

(110)

Simple classifiers

12/01/2021

110

(111)

Simple classifiers

12/01/2021

111

(112)

Can they be combined?

12/01/2021

112

(113)

Ensemble methods

12/01/2021

113

(114)

Ensemble classifier

12/01/2021

114

(115)

Boosting

12/01/2021

115

(116)

Weighted data

12/01/2021

116

(117)

Weighted data

12/01/2021

117

(118)

Boosting = greedy learning ensembles from data

12/01/2021

118

(119)

Boosting convergence & overfitting

12/01/2021

119

(120)

Boosting convergence & overfitting

12/01/2021

120

(121)

Example

12/01/2021

121

(122)

Example

12/01/2021

122

(123)

Boosting: summary

12/01/2021

123

(124)

Boosting: summary

12/01/2021

124

(125)

Classification: summary

12/01/2021

125

(126)

12/01/2021

126

Details



Derivative of likelihood

for logistic regression

(127)

The log trick, often used in ML…

12/01/2021

127

(128)

Log-likelihood function

12/01/2021

128

(129)

Log-likelihood function

12/01/2021

129

(130)

Rewritting log-likelihood

12/01/2021

130

Indicator function

(131)

Logistic regression

12/01/2021

131

(132)

Logistic regression

12/01/2021

132

(133)

Logistic regression

12/01/2021

133

(134)

Logistic regression

12/01/2021

134

(135)

12/01/2021

135

Details



ADA boosting

(136)

AdaBoost: learning ensemble

10/11, 17/11, 24/11/2020

136

(137)

AdaBoost: Computing coefficients w_t

10/11, 17/11, 24/11/2020

137

(138)

Weighted classification error

10/11, 17/11, 24/11/2020

138

(139)

AdaBoost formula

10/11, 17/11, 24/11/2020

139

(140)

10/11, 17/11, 24/11/2020

140

(141)

AdaBoost: updating weights a_i

10/11, 17/11, 24/11/2020

141

(142)

AdaBoost: updating weights a_i

10/11, 17/11, 24/11/2020

142

(143)

10/11, 17/11, 24/11/2020

143

(144)

AdaBoost: normlizing weights a_i

10/11, 17/11, 24/11/2020

144

(145)

10/11, 17/11, 24/11/2020

145

(146)

AdaBoost: example

10/11, 17/11, 24/11/2020

146

(147)

AdaBoost: example

10/11, 17/11, 24/11/2020

147

(148)

AdaBoost: example

10/11, 17/11, 24/11/2020

148

(149)

AdaBoost: example

10/11, 17/11, 24/11/2020

149

(150)

AdaBoost: example

10/11, 17/11, 24/11/2020

150

(151)

AdaBoost: learning ensemple

10/11, 17/11, 24/11/2020

151

(152)

Boosted decision stumps

10/11, 17/11, 24/11/2020

152

(153)

10/11, 17/11, 24/11/2020

153

(154)

10/11, 17/11, 24/11/2020

154

(155)

10/11, 17/11, 24/11/2020

155

a_i e^-0.69 a_i e^0.69

=

,if f_t(x_i) = y_i ,if f_t(x_i) ≠ y_i