vii
Contents
1 Introduction 1
1.1 Problem Statement . . . 1
1.2 Objectives of the Research . . . 1
1.3 Research Contribution . . . 1
1.4 Motivation . . . 2
1.4.1 Relevance of the Research . . . 2
1.4.2 Applications . . . 3
1.4.3 Author’s Research in the Topic . . . 3
1.5 Outline of the Thesis . . . 4
2 Spoken Language Recognition 5 2.1 Spoken Language Recognition . . . 5
2.2 What is Language? . . . 6
2.2.1 Humans Ability to Recognize Languages . . . 6
2.2.2 Differentiating Between Languages . . . 7
2.3 Approaches to Language Recognition . . . 7
2.3.1 Syntax . . . 7
2.3.2 Words . . . 8
2.3.3 Prosody . . . 8
2.3.4 Phonotactics . . . 8
2.3.5 Acoustic Phonetics . . . 8
2.4 Review of Language Recognition Research . . . 8
2.5 I-vector Based Language Recognition System . . . 9
2.5.1 Features Extraction . . . 9
2.5.1.1 MFCC . . . 10
2.5.1.2 Shifted Delta Cepstra . . . 11
2.5.1.3 Voice Activity Detection . . . 12
2.5.1.4 Feature Normalisation Methods . . . 13
2.5.1.5 Summary of Feature Extraction . . . 13
2.5.2 I-vector Subspace Modeling . . . 13
2.5.2.1 Gaussian Mixture Model . . . 14
2.5.2.2 Universal Background Model . . . 15
2.5.2.3 Joint Factor Analysis . . . 15
2.5.2.4 I-vectors . . . 16
2.6 Language Recognition in the I-vector Space . . . 19
2.6.1 Cosine Distance Scoring . . . 19
2.6.2 Generative Gaussian Classifier . . . 19
2.6.3 Mixtures of von Mises-Fisher Distributions . . . 20
2.6.4 Support Vector Machines . . . 20
2.6.5 Logistic Regression . . . 22
2.6.6 Problabilistic Linear Discriminant Analysis . . . 22
2.6.7 Intersession Compensation Techniques . . . 22
viii
2.6.7.2 Whitening . . . 22
2.6.7.3 Within-Class Covariance Normalization . . . 23
2.6.7.4 Linear Discriminant Analysis . . . 24
2.6.8 Score Level and System Fusions . . . 24
2.7 Summary . . . 25
3 Spoken Language Clustering 27 3.1 Problem Definition . . . 27
3.1.1 Relation to Language Diarization . . . 27
3.2 Overview and Applications . . . 28
3.3 Language Clustering as an Unsupervised Learning Task . . . 28
3.4 Clustering Algorithms . . . 29
3.4.1 Spherical K-means . . . 30
3.4.2 Von Mises-Fisher Mixtures . . . 30
3.4.3 Mean Shift . . . 30
3.4.4 Agglomerative Hierarchical Clustering . . . 31
3.4.5 HDBSCAN . . . 32
3.5 Evaluation of Clustering . . . 33
3.5.1 External Quality Measures . . . 33
3.5.1.1 Impurity Measures . . . 33
3.5.1.2 Adjusted Rand Index . . . 34
3.5.1.3 BBN Metric and Clustering Efficiency . . . 35
3.5.2 Internal Quality Measures . . . 35
3.5.2.1 Nearest Neighbor Purity Estimator and Estimated Clustering Efficiency . . . 35
3.5.2.2 Silhouette . . . 36
3.6 Summary . . . 36
4 Language Database Description and Analysis 37 4.1 NIST 2015 Language Recognition I-vector Machine Learning Chal-lenge Database . . . 37
4.1.1 Utterance Duration . . . 39
4.1.2 Data Analysis . . . 39
4.1.2.1 Silhouette Plots . . . 39
4.1.2.2 2D Visualization of the Data . . . 39
4.2 Summary . . . 40
5 Language Clustering Experiments 45 5.1 I-vector Preprocessing . . . 46
5.2 Spherical K-means Clustering Experiments . . . 46
5.3 Von Mises-Fisher Clustering Experiments . . . 49
5.4 Agglomerative Hierarchical Clustering Experiments . . . 52
5.5 Mean Shift Clustering Experiments . . . 53
5.5.1 Pruning with Noise Redistribution . . . 54
5.5.2 Pruning with Noise Removal . . . 56
5.6 HDBSCAN Clustering Experiments . . . 59
5.7 Results Analysis . . . 61
ix
6 Language Recognition with Clustering-based Modeling Experiments 67
6.1 Cluster-based Modeling . . . 69
6.2 Language Recognition Experiments and Results . . . 71
6.2.1 Performance Evaluation . . . 71
6.2.1.1 NIST Cost Function . . . 72
6.2.1.2 Average Decision Cost Function Cavg . . . 72
6.2.1.3 I-vector Preprocessing . . . 73
6.2.2 Experiments and Results from Baseline Systems . . . 74
6.2.2.1 Cosine Distance Scoring . . . 74
6.2.2.2 Gaussian Classifier . . . 74
6.2.2.3 Von Mises-Fisher Classifier . . . 75
6.2.2.4 Support Vector Machine . . . 76
6.2.2.5 Logistic Regression . . . 77
6.2.3 Experiments and Results from Systems with Cluster-based Modeling . . . 78
6.2.3.1 Cosine Distance Scoring with Cluster-based Mod-eling . . . 79
6.2.3.2 Logistic Regression with Cluster-based Modeling . 80 6.2.4 Effect of Randomness . . . 80
6.2.5 Impact of Training Data Size and the Number of Languages 81 6.2.5.1 Impact of the Training Size . . . 81
6.2.5.2 Impact of the Number of Languages . . . 82
6.2.6 Score and System Level Fusion . . . 83
6.3 Results Analysis and Disscussion . . . 84
6.3.0.1 Statistical Significance of the Results . . . 87
6.3.0.2 Impact of Cluster-based Modeling on Computa-tional Complexity . . . 88
6.4 Conclusion . . . 88
7 Conclusions 89 7.1 Future Work . . . 90