DATA SCIENCE WITH MACHINE LEARNING:
CLUSTERING
1
What is clustering?
19/01/2021
2
Clustring applications
3
Clustering applications
19/01/2021
4
Overwiew of content
5
19/01/2021
6
Clustering:
An unsupervised learning task
Motivation
7
Motivation
19/01/2021
8
I dont’t just like sport!
Motivation
9
Clustering: a supervised learning
19/01/2021
10
Custering: a supervised learning
11
Clustering: an unsupervised learning
19/01/2021
12
An unsupervised learning task
What defines a cluster ?
13
Hope for unsupervised learning
19/01/2021
14
Other (challenging!) clusters to discover
15
Analysed by your eyes
Other (challenging!) clusters to discover
19/01/2021
16
Analysed by clustering algorithms
17
k-means
clustering algorithm
k-means clustering algorithm
19/01/2021
18
k-means clustering algorithm
19
k-means clustering algorithm
19/01/2021
20
k-means clustering algorithm
21
k-means clustering algorithm
19/01/2021
22
K-means as coordinate descent algorithm
23
Convergence of k-means
19/01/2021
24
Because we can cast k-means as coordinate descent algorithm we know that we are converging to local optimum
Convergence of k-mans to local mode
25
Smart initialisation: k-means++ overwiew
19/01/2021
26
k-means++ visualised
27
k-means++ visualised
19/01/2021
28
k-means++ visualised
29
k-means++ visualised
19/01/2021
30
Smart initialisation: k-means++ overwiew
31
Assessing quality of the clustering
19/01/2021
32
K-means objective
33
Cluster heterogeneity
19/01/2021
34
What happens to heterogeneity as k increases?
35
How to choose k?
19/01/2021
36
37
Probabilistic approach:
mixture model
Why probabilistic approach?
19/01/2021
38
Why probabilistic approach?
39
Why probabilistic approach?
19/01/2021
40
Why probabilistic approach?
41
Mixture models
19/01/2021
42
Application: clustering images
43
Application: clustering images
19/01/2021
44
Single RGB vector per image
Application: clustering images
45
Application: clustering images
19/01/2021
46
Application: clustering images
47
Application: clustering images
19/01/2021
48
We see that they are grouping!
But not easy to distinguish between groups
Application: clustering images
49
In this dimmension
Model for a given image type
19/01/2021
50
Model for a given image type
51
Application: clustering images
19/01/2021
52
Application: clustering images
53
Application: clustering images
19/01/2021
54
Mixture of Gaussians
55
Mixture of Gaussians
19/01/2021
56
Mixture of Gaussians
57
Mixture of Gaussians
19/01/2021
58
Mixture of Gaussians
59
Mixture of Gaussians
19/01/2021
60
Mixture of Gaussians
61
Application: clustering documents
19/01/2021
62
Application: clustering documents
63
Application: clustering documents
19/01/2021
64
Application: clustering documents
65
Application: clustering documents
19/01/2021
66
Application: clustering documents
67
Application: clustering documents
19/01/2021
68
69
Inferring soft assignments with
expectation maximization (EM)
Inferring cluster labels
19/01/2021
70
71
19/01/2021
72
73
19/01/2021
74
75
19/01/2021
76
77
Part 1: Summary
19/01/2021
78
79
19/01/2021
80
81
19/01/2021
82
83
Part 2a : Summary
19/01/2021
84
85
19/01/2021
86
87
19/01/2021
88
89
19/01/2021
90
91
Part 2b: Summary
Expectation maximization (ME)
19/01/2021
92
Expectation maximization (ME)
93
Expectation maximization (ME)
19/01/2021
94
Expectation maximization (ME)
95
Expectation maximization (ME)
19/01/2021
96
Expectation maximization (ME)
97
Expectation maximization (ME)
19/01/2021
98
Expectation maximization (ME)
99
Expectation maximization (ME)
19/01/2021
100
Expectation maximization (ME)
101
Expectation maximization (ME)
19/01/2021
102
Expectation maximization (ME)
103
Expectation maximization (ME)
19/01/2021
104
105
Mixed membership models
for documents
Clustering model
19/01/2021
106
Clustering model
107
Clustering model
19/01/2021
108
Soft assignments
109
Soft assignments
19/01/2021
110
Soft assignments
111
Mixed membershio models
19/01/2021
112
Building alternative model
113
Building an alternative model
19/01/2021
114
Building an alternative model
115
Building an alternative model
19/01/2021
116
Model for „bag-of-words”
117
Model for „bag-of-words”
19/01/2021
118
Model for „bag-of-words”
119
Model for „bag-of-words”
19/01/2021
120
121
Hierarchical clustering
Why hierarchical clustering
19/01/2021
122
Why hierarchical clustering
123
Why hierarchical clustering
19/01/2021
124
Two main types of algorithms
125
Divisive clustering
19/01/2021
126
Divisive clustering
127
Divisive: Recursive k-means
19/01/2021
128
Divisive: Recursive k-means
129
Divisive: choices to be made
19/01/2021
130
Aglomerative: Single linkage
131
Aglomerative: Single linkage
19/01/2021
132
Aglomerative: Single linkage
133
Aglomerative: Single linkage
19/01/2021
134
Aglomerative: Single linkage
135
Cluster of clusters
19/01/2021
136
The dendrogram
137
Extracting a partition
19/01/2021
138
Agglomerative: choices to be made
139
More on cutting dendrogram
19/01/2021
140
Computational considerations
141