INTRODUCTION TO DATA SCIENCE
WFAiS UJ, Informatyka Stosowana I stopień studiów
1
10/11/2020
This lecture is
based on course by E. Fox and C. Guestrin, Univ of Washington
Classification
10/11/2020
2
An inteligent restaurant review system
What is a sentiment of the review
10/11/2020
3
Topic sentiments
10/11/2020
4
Inteligent restaurant review system
10/11/2020
5
Core building block
10/11/2020
6
Inteligent restaurant review system
10/11/2020
7
Classifier
10/11/2020
8
Multiclass classifier
10/11/2020
9
Spam filtering
10/11/2020
10
Image classification
10/11/2020
11
Personalized medical diagnosis
10/11/2020
12
Reading your mind
10/11/2020
13
Representing classifiers
10/11/2020
14
Simple threshold classifier
10/11/2020
15
Simple threshold classifier
10/11/2020
16
Problems with threshold classifier
10/11/2020
17
A (linear) classifier
10/11/2020
18
Scoring a sentence
10/11/2020
19
Simple linear classifier
10/11/2020
20
Suppose only two words had non-zero weight
10/11/2020
21
Decision boundary example
10/11/2020
22
Decision boundary
10/11/2020
23
Separates positive & negative predictions
Training a classifier = Learning the weights
10/11/2020
24
Classification error & accuracy
10/11/2020
25
What if you ignore the sentence and just guess?
10/11/2020
26
Is a classifier with 90% accuracy good?
Depends…
10/11/2020
27
What is a good accuracy?
10/11/2020
28
Types of mistakes
10/11/2020
29
True positive
True
negative False negative False
positive
Cost of mistakes
10/11/2020
30
Confusion matrix: binary classification
10/11/2020
31
Confusion matrix: multiclass classification
10/11/2020
32
How much data does a model need to learn?
10/11/2020
33
Learning curves
10/11/2020
34
Learning curves
10/11/2020
35
More complex models tend to have less bias…
10/11/2020
36
Classification based on bigrams
10/11/2020
37
How confident is your prediction?
10/11/2020
38
We have discussed how to
10/11/2020
39