INTRODUCTION TO DATA SCIENCE
WFAiS UJ, Informatyka Stosowana I stopień studiów
1
12.11, 19.11 2019
This lecture is
based on course by E. Fox and C. Guestrin, Univ of Washington
What is retrieval?
12.11, 19.11 2019
2
What is retrieval?
12.11, 19.11 2019
3
What is retrieval?
12.11, 19.11 2019
4
Retrieval applications
12.11, 19.11 2019
5
What is clustering?
12.11, 19.11 2019
6
Clustring applications
12.11, 19.11 2019
7
Clustering applications
12.11, 19.11 2019
8
Impact of retrieval & clustering
12.11, 19.11 2019
9
Overwiew of the extended content
12.11, 19.11 2019
10
12.11, 19.11 2019
11
Retrieval as
k-nearest neighbor search
1-NN search for retrieval
12.11, 19.11 2019
12
1-NN search for retrieval
12.11, 19.11 2019
13
1-NN search for retrieval
12.11, 19.11 2019
14
1-NN search for retrieval
12.11, 19.11 2019
15
1-NN algorithm
12.11, 19.11 2019
16
1-NN algorithm
12.11, 19.11 2019
17
k-NN algorithm
12.11, 19.11 2019
18
k-NN algorithm
12.11, 19.11 2019
19
Critical elements of NN search
12.11, 19.11 2019
20
Document representation
12.11, 19.11 2019
21
Document representation
12.11, 19.11 2019
22
Document representation
12.11, 19.11 2019
23
Document representation
12.11, 19.11 2019
24
Distance metrics:
12.11, 19.11 2019
25
Distance metrics:
12.11, 19.11 2019
26
Distance metrics:
12.11, 19.11 2019
27
Distance metrics:
12.11, 19.11 2019
28
Distance metrics:
12.11, 19.11 2019
29
Distance metrics:
12.11, 19.11 2019
30
Distance metrics:
12.11, 19.11 2019
31
Distance metrics:
12.11, 19.11 2019
32
Distance metrics:
12.11, 19.11 2019
33
Distance metrics:
12.11, 19.11 2019
34
Distance metrics:
12.11, 19.11 2019
35
Distance metrics
12.11, 19.11 2019
36
Distance metrics
12.11, 19.11 2019
37
Distance metrics
12.11, 19.11 2019
38
Distance metrics
12.11, 19.11 2019
39
Distance metrics
12.11, 19.11 2019
40
Distance metrics
12.11, 19.11 2019
41
Distance metrics
12.11, 19.11 2019
42
Combining distance metrics
12.11, 19.11 2019
43
12.11, 19.11 2019
44
Scaling up k-NN search
by storing data in a KD-tree
Complexity of brute-force search
12.11, 19.11 2019
45
KD-trees
12.11, 19.11 2019
46
KD-trees
12.11, 19.11 2019
47
KD-trees
12.11, 19.11 2019
48
KD-trees
12.11, 19.11 2019
49
KD-trees
12.11, 19.11 2019
50
KD-trees
12.11, 19.11 2019
51
KD-trees
12.11, 19.11 2019
52
KD-trees
12.11, 19.11 2019
53
Nearest neighbor with KD-trees
12.11, 19.11 2019
54
Nearest neighbor with KD-trees
12.11, 19.11 2019
55
Nearest neighbor with KD-trees
12.11, 19.11 2019
56
Nearest neighbor with KD-trees
12.11, 19.11 2019
57
Nearest neighbor with KD-trees
12.11, 19.11 2019
58
Nearest neighbor with KD-trees
12.11, 19.11 2019
59
Nearest neighbor with KD-trees
12.11, 19.11 2019
60
Nearest neighbor with KD-trees
12.11, 19.11 2019
61
Nearest neighbor with KD-trees
12.11, 19.11 2019
62
Nearest neighbor with KD-trees
12.11, 19.11 2019
63
Nearest neighbor with KD-trees
12.11, 19.11 2019
64
Nearest neighbor with KD-trees
12.11, 19.11 2019
65
Complexity for N queries
12.11, 19.11 2019
66
Complexity for N queries
12.11, 19.11 2019
67
k-NN with KD-trees
12.11, 19.11 2019
68
Approximate k-NN with KD-trees
12.11, 19.11 2019
69
Closing remarks on KD-trees
12.11, 19.11 2019
70
KD-tree in high dimmensions
12.11, 19.11 2019
71
Moving away from exact NN search
12.11, 19.11 2019
72
12.11, 19.11 2019
73
Locality Sensitive Hashing (LHS)
as alternative to KD-trees
Locality sensitive hashing
12.11, 19.11 2019
74
Locality sensitive hashing
12.11, 19.11 2019
75
Locality sensitive hashing
12.11, 19.11 2019
76
Locality sensitive hashing
12.11, 19.11 2019
77
Locality sensitive hashing
12.11, 19.11 2019
78
Locality sensitive hashing
12.11, 19.11 2019
79
Locality sensitive hashing
12.11, 19.11 2019
80
Locality sensitive hashing
12.11, 19.11 2019
81
Locality sensitive hashing
12.11, 19.11 2019
82
Locality sensitive hashing
12.11, 19.11 2019
83
LSH: improving efficiency
12.11, 19.11 2019
84
LSH: improving efficiency
12.11, 19.11 2019
85
LSH: improving efficiency
12.11, 19.11 2019
86
LSH: improving efficiency
12.11, 19.11 2019
87
LSH: improving efficiency
12.11, 19.11 2019
88
LSH: improving efficiency
12.11, 19.11 2019
89
LSH recap
12.11, 19.11 2019
90
LSH: moving to higher dimmensions d
12.11, 19.11 2019
91
LSH: moving to higher dimmensions d
12.11, 19.11 2019
92
What you can do now …
12.11, 19.11 2019
93
12.11, 19.11 2019
94
Clustering:
An unsupervised learning task
Motivation
12.11, 19.11 2019
95
Motivation
12.11, 19.11 2019
96
I dont’t just like sport!
Motivation
12.11, 19.11 2019
97
Clustering: a supervised learning
12.11, 19.11 2019
98
Custering: a supervised learning
12.11, 19.11 2019
99
Example of
supervised learning
Clustering: an unsupervised learning
12.11, 19.11 2019
100
An unsupervised learning task
What defines a cluster ?
12.11, 19.11 2019
101
Hope for unsupervised learning
12.11, 19.11 2019
102
Other (challenging!) clusters to discover
12.11, 19.11 2019
103
Analysed by your eyes
Other (challenging!) clusters to discover
12.11, 19.11 2019
104
Analysed by clustering algorithms
12.11, 19.11 2019
105
k-means
clustering algorithm
k-means clustering algorithm
12.11, 19.11 2019
106
k-means clustering algorithm
12.11, 19.11 2019
107
k-means clustering algorithm
12.11, 19.11 2019
108
k-means clustering algorithm
12.11, 19.11 2019
109
k-means clustering algorithm
12.11, 19.11 2019
110
k-means as coordinate descent algorithm
12.11, 19.11 2019
111
K-means as coordinate descent algorithm
12.11, 19.11 2019
112
Convergence of k-means
12.11, 19.11 2019
113
Because we can cast k-means as coordinate descent algorithm we know that we are converging to local optimum
Convergence of k-mans to local mode
12.11, 19.11 2019
114
Crosses: initialised centers
Convergence of k-mans to local mode
12.11, 19.11 2019
115
Crosses: initialised centers
Convergence of k-mans to local mode
12.11, 19.11 2019
116
Crosses: initialised centers
Assigment to which group has changed
k-means very sensitive to initiased centers
Smart initialisation: k-means++ overwiew
12.11, 19.11 2019
117
k-means++ visualised
12.11, 19.11 2019
118
k-means++ visualised
12.11, 19.11 2019
119
k-means++ visualised
12.11, 19.11 2019
120
k-means++ visualised
12.11, 19.11 2019
121
Smart initialisation: k-means++ overwiew
12.11, 19.11 2019
122
Assessing quality of the clustering
12.11, 19.11 2019
123
k-means objective
12.11, 19.11 2019
124
Cluster heterogeneity
12.11, 19.11 2019
125
What happens to heterogeneity as k increases?
12.11, 19.11 2019
126
How to choose k?
12.11, 19.11 2019
127
What you can do now …
12.11, 19.11 2019
128
12.11, 19.11 2019
129
Probabilistic approach:
mixture model
Why probabilistic approach?
12.11, 19.11 2019
130
Why probabilistic approach?
12.11, 19.11 2019
131
Why probabilistic approach?
12.11, 19.11 2019
132
Why probabilistic approach?
12.11, 19.11 2019
133
Mixture models
12.11, 19.11 2019
134
Application: clustering images
12.11, 19.11 2019
135
Application: clustering images
12.11, 19.11 2019
136
Single RGB vector per image
Application: clustering images
12.11, 19.11 2019
137
Application: clustering images
12.11, 19.11 2019
138
Application: clustering images
12.11, 19.11 2019
139
Application: clustering images
12.11, 19.11 2019
140
We see that they are grouping!
But not easy to distinguish between groups
Application: clustering images
12.11, 19.11 2019
141
In this dimmension separable groups!
Model for a given image type
12.11, 19.11 2019
142
Model for a given image type
12.11, 19.11 2019
143
Application: clustering images
12.11, 19.11 2019
144
Application: clustering images
12.11, 19.11 2019
145
Application: clustering images
12.11, 19.11 2019
146
Application: clustering images
12.11, 19.11 2019
147
Mixture of Gaussians
12.11, 19.11 2019
148
Mixture of Gaussians
12.11, 19.11 2019
149
Mixture of Gaussians
12.11, 19.11 2019
150
Mixture of Gaussians
12.11, 19.11 2019
151
Mixture of Gaussians
12.11, 19.11 2019
152
Mixture of Gaussians
12.11, 19.11 2019
153
Mixture of Gaussians
12.11, 19.11 2019
154
Application: clustering documents
12.11, 19.11 2019
155
Application: clustering documents
12.11, 19.11 2019
156
Application: clustering documents
12.11, 19.11 2019
157
Application: clustering documents
12.11, 19.11 2019
158
Application: clustering documents
12.11, 19.11 2019
159
Application: clustering documents
12.11, 19.11 2019
160
Application: clustering documents
12.11, 19.11 2019
161
12.11, 19.11 2019
162
Inferring soft assignments with
expectation maximization (EM)
Inferring cluster labels
12.11, 19.11 2019
163
12.11, 19.11 2019
164
12.11, 19.11 2019
165
12.11, 19.11 2019
166
12.11, 19.11 2019
167
12.11, 19.11 2019
168
12.11, 19.11 2019
169
12.11, 19.11 2019
170
Part 1: Summary
12.11, 19.11 2019
171
12.11, 19.11 2019
172
Then split into separate tables and consider them independently.
12.11, 19.11 2019
173
12.11, 19.11 2019
174
12.11, 19.11 2019
175
12.11, 19.11 2019
176
Part 2a : Summary
12.11, 19.11 2019
177
12.11, 19.11 2019
178
12.11, 19.11 2019
179
12.11, 19.11 2019
180
12.11, 19.11 2019
181
12.11, 19.11 2019
182
12.11, 19.11 2019
183
12.11, 19.11 2019
184
Part 2b: Summary
Expectation maximization (ME)
12.11, 19.11 2019
185
Expectation maximization (ME)
12.11, 19.11 2019
186
Expectation maximization (ME)
12.11, 19.11 2019
187
Expectation maximization (ME)
12.11, 19.11 2019
188
Expectation maximization (ME)
12.11, 19.11 2019
189
Expectation maximization (ME)
12.11, 19.11 2019
190
Expectation maximization (ME)
12.11, 19.11 2019
191
Expectation maximization (ME)
12.11, 19.11 2019
192
Expectation maximization (ME)
12.11, 19.11 2019
193
Expectation maximization (ME)
12.11, 19.11 2019
194
Expectation maximization (ME)
12.11, 19.11 2019
195
Expectation maximization (ME)
12.11, 19.11 2019
196
Expectation maximization (ME)
12.11, 19.11 2019
197
What you can do now …
12.11, 19.11 2019
198
12.11, 19.11 2019
199
Hierarchical clustering
Why hierarchical clustering
12.11, 19.11 2019
200