DATA SCIENCE WITH MACHINE LEARNING:
CLASSIFICATION
WFAiS UJ, Informatyka Stosowana I stopień studiów
1
12/01/2021
This lecture is
based on course by E. Fox and C. Guestrin, Univ of Washington
What is a classification?
12/01/2021
2
Overwiew of the content
12/01/2021
3
12/01/2021
4
Linear classifier
An inteligent restaurant review system
12/01/2021
5
Classifying sentiment of review
12/01/2021
6
A (linear) classifier: scoring a sentence
12/01/2021
7
Score(xi) = 1.2+1.7 -2.1 = 0.8 >0
=> y = +1
positive review
Training a classifier = Learning the coefficients
12/01/2021
8
We will discuss latter how do we learn clasifier from data
Decision boundary example
12/01/2021
9
Decision boundary
12/01/2021
10
Flow chart:
12/01/2021
11
Coefficients of classifier
12/01/2021
12
General notation
12/01/2021
13
Simple hyperplane
12/01/2021
14
D-dimensional hyperplane
12/01/2021
15
Flow chart:
12/01/2021
16
12/01/2021
17
Linear classifier
Class probability
How confident is your prediction?
12/01/2021
18
Conditional probability
12/01/2021
19
Interpreting conditional probabilities
12/01/2021
20
How confident is your prediction?
12/01/2021
21
Learn conditional probabilities from data
12/01/2021
22
Predicting class probabilities
12/01/2021
23
Flow chart:
12/01/2021
24
Why not just use regression to build classifier?
12/01/2021
25
Link function
12/01/2021
26
Flow chart:
12/01/2021
27
12/01/2021
28
Logistic regression classifier:
linear score with logistic link
function
Simplest link function: sign(z)
12/01/2021
29
Logistic function (sigmoid, logit)
12/01/2021
30
0.5
0.0 0.12 0.88 1.0
Logistic regression model
12/01/2021
31
Effect of coefficients
12/01/2021
32
Flow chart:
12/01/2021
33
Learning logistic regression model
12/01/2021
34
Categorical inputs
12/01/2021
35
Encoding categories as numeric features
12/01/2021
36
Multiclass classification
12/01/2021
37
1 versus all
12/01/2021
38
1 versus all
12/01/2021
39
Summary: Logistic regression classifier
12/01/2021
40
12/01/2021
41
Linear classifier
Parameters learning
Maximizing likelihood (probability of data)
12/01/2021
42
Maximum likelihood estimation (MLE)
12/01/2021
43
Learn logistic regression model with MLE
Flow chart:
12/01/2021
44
Find „best” classifier
12/01/2021
45
Maximizing likelihood
12/01/2021
46
Gradient ascent
12/01/2021
47
Convergence criteria
Gradient ascent
12/01/2021
48
The log trick, often used in ML…
12/01/2021
49
Derivative for logistic regression
12/01/2021
50
See slides at the end of this lecture If you are interested how it is derived.
12/01/2021
51
Derivative for logistic regression
Choosing the step size
12/01/2021
52
Choosing the step size
12/01/2021
53
Choosing the step size
12/01/2021
54
Choosing the step size
12/01/2021
55
Choosing the step size
12/01/2021
56
Flow chart: final look at it
12/01/2021
57
12/01/2021
58
Linear classifier
Overfitting & regularization
Training a classifier = Learning the coefficients
12/01/2021
59
Classification error & accuracy
12/01/2021
60
Overfitting in classification
12/01/2021
61
Decision boundary example
Overfitting in classification
12/01/2021
62
Learned decision boundary
Overfitting in classification
12/01/2021
63
Quadratic features (in 2d)
Overfitting in classification
12/01/2021
64
Degree 6 features (in 2d)
Overfitting in classification
12/01/2021
65
Degree 20 features (in 2d)
Overfitting in classification
12/01/2021
66
Overfitting in logistic regression
12/01/2021
67
Remember about this probability interpretation
Effect of coefficients on logistic regression model
12/01/2021
68
With increasing coefficients model becomes overconfident on predictions
Learned probabilities
12/01/2021
69
Quadratic features: learned probabilities
12/01/2021
70
Overfitting → overconfident predictions
12/01/2021
71
Quality metric → penelazing large coefficients
12/01/2021
72
Desired total cost format
12/01/2021
73
Measure of magnitude of logistic regression coefficients
12/01/2021
74
Visualizing effect of regularisation
12/01/2021
75
Effect of regularisation
12/01/2021
76
Visualizing effect of regularisation
12/01/2021
77
Sparse logistic regression
12/01/2021
78
L1 regularised logistic regression
12/01/2021
79
12/01/2021
80
Decision trees
What makes a loan risky?
12/01/2021
81
Classifier: decision trees
12/01/2021
82
Quality metric: Classification error
12/01/2021
83
Find the tree with lowest classification error
12/01/2021
84
How do we find the best tree?
12/01/2021
85
Simple (greedy) algorithm finds good tree
12/01/2021
86
Greedy decision tree learning
12/01/2021
87
How do we select the best feature to split on?
12/01/2021
88
Classification error
12/01/2021
89
Classification error
12/01/2021
90
Choice 1 vs Choise 2
12/01/2021
91
Greedy decision tree learning algorithm
12/01/2021
92
Greedy decision tree algorithm
12/01/2021
93
Decision trees vs logistic regression
12/01/2021
94
Decision trees vs logistic regression
12/01/2021
95
Decision tree vs logistic regression
12/01/2021
96
12/01/2021
97
Overfitting
in decision trees
Overfitting in decision tree
12/01/2021
98
Overfitting in decision tree
12/01/2021
99
Early stopping
12/01/2021
100
Greedy decision tree learning
12/01/2021
101
12/01/2021
102
Strategies for
handling missing data
Handling missing data
12/01/2021
103
Handling missing data
12/01/2021
104
Handling missing data
12/01/2021
105
Idea 3: addapt algorithm
12/01/2021
106
Feature split selection with missing data
12/01/2021
107
Idea 3: addapt algorithm
12/01/2021
108
12/01/2021
109
Ensemble classifiers
and boosting
Simple classifiers
12/01/2021
110
Simple classifiers
12/01/2021
111
Can they be combined?
12/01/2021
112
Ensemble methods
12/01/2021
113
Ensemble classifier
12/01/2021
114
Boosting
12/01/2021
115
Weighted data
12/01/2021
116
Weighted data
12/01/2021
117
Boosting = greedy learning ensembles from data
12/01/2021
118
Boosting convergence & overfitting
12/01/2021
119
Boosting convergence & overfitting
12/01/2021
120
Example
12/01/2021
121
Example
12/01/2021
122
Boosting: summary
12/01/2021
123
Boosting: summary
12/01/2021
124
Classification: summary
12/01/2021
125
12/01/2021
126
Details
Derivative of likelihood
for logistic regression
The log trick, often used in ML…
12/01/2021
127
Log-likelihood function
12/01/2021
128
Log-likelihood function
12/01/2021
129
Rewritting log-likelihood
12/01/2021
130
Indicator function
Logistic regression
12/01/2021
131
Logistic regression
12/01/2021
132
Logistic regression
12/01/2021
133
Logistic regression
12/01/2021
134
12/01/2021
135
Details
ADA boosting
AdaBoost: learning ensemble
10/11, 17/11, 24/11/2020
136
AdaBoost: Computing coefficients wt
10/11, 17/11, 24/11/2020
137
Weighted classification error
10/11, 17/11, 24/11/2020
138
AdaBoost formula
10/11, 17/11, 24/11/2020
139
AdaBoost: learning ensemble
10/11, 17/11, 24/11/2020
140
AdaBoost: updating weights ai
10/11, 17/11, 24/11/2020
141
AdaBoost: updating weights ai
10/11, 17/11, 24/11/2020
142
AdaBoost: learning ensemble
10/11, 17/11, 24/11/2020
143
AdaBoost: normlizing weights ai
10/11, 17/11, 24/11/2020
144
AdaBoost: learning ensemble
10/11, 17/11, 24/11/2020
145
AdaBoost: example
10/11, 17/11, 24/11/2020
146
AdaBoost: example
10/11, 17/11, 24/11/2020
147
AdaBoost: example
10/11, 17/11, 24/11/2020
148
AdaBoost: example
10/11, 17/11, 24/11/2020
149
AdaBoost: example
10/11, 17/11, 24/11/2020
150
AdaBoost: learning ensemple
10/11, 17/11, 24/11/2020
151
Boosted decision stumps
10/11, 17/11, 24/11/2020
152
Boosted decision stumps
10/11, 17/11, 24/11/2020
153
Boosted decision stumps
10/11, 17/11, 24/11/2020
154
Boosted decision stumps
10/11, 17/11, 24/11/2020
155
ai e-0.69 ai e0.69
=
=
,if ft(xi) = yi ,if ft(xi) ≠ yi