Ostatnio wyszukiwane

Nie Znaleziono Wyników

Tagi

Nie Znaleziono Wyników

Dokument

Nie Znaleziono Wyników

Strona główna Szkoły Temat

Zaloguj się

Index of /rozprawy2/11638

Share "Index of /rozprawy2/11638"

N/A

N/A

Protected

Rok akademicki: 2021

Info

Protected

Academic year: 2021

Share "Index of /rozprawy2/11638"

Copied!

3

0

0

3

0

0

Ładowanie.... (Zobacz pełny tekst teraz)

Pobierz teraz ( 3 Stron )

Pełen tekst

(1)

vii

Contents

1 Introduction 1

1.1 Problem Statement . . . 1

1.2 Objectives of the Research . . . 1

1.3 Research Contribution . . . 1

1.4 Motivation . . . 2

1.4.1 Relevance of the Research . . . 2

1.4.2 Applications . . . 3

1.4.3 Author’s Research in the Topic . . . 3

1.5 Outline of the Thesis . . . 4

2 Spoken Language Recognition 5 2.1 Spoken Language Recognition . . . 5

2.2 What is Language? . . . 6

2.2.1 Humans Ability to Recognize Languages . . . 6

2.2.2 Differentiating Between Languages . . . 7

2.3 Approaches to Language Recognition . . . 7

2.3.1 Syntax . . . 7

2.3.2 Words . . . 8

2.3.3 Prosody . . . 8

2.3.4 Phonotactics . . . 8

2.3.5 Acoustic Phonetics . . . 8

2.4 Review of Language Recognition Research . . . 8

2.5 I-vector Based Language Recognition System . . . 9

2.5.1 Features Extraction . . . 9

2.5.1.1 MFCC . . . 10

2.5.1.2 Shifted Delta Cepstra . . . 11

2.5.1.3 Voice Activity Detection . . . 12

2.5.1.4 Feature Normalisation Methods . . . 13

2.5.1.5 Summary of Feature Extraction . . . 13

2.5.2 I-vector Subspace Modeling . . . 13

2.5.2.1 Gaussian Mixture Model . . . 14

2.5.2.2 Universal Background Model . . . 15

2.5.2.3 Joint Factor Analysis . . . 15

2.5.2.4 I-vectors . . . 16

2.6 Language Recognition in the I-vector Space . . . 19

2.6.1 Cosine Distance Scoring . . . 19

2.6.2 Generative Gaussian Classifier . . . 19

2.6.3 Mixtures of von Mises-Fisher Distributions . . . 20

2.6.4 Support Vector Machines . . . 20

2.6.5 Logistic Regression . . . 22

2.6.6 Problabilistic Linear Discriminant Analysis . . . 22

2.6.7 Intersession Compensation Techniques . . . 22

(2)

viii

2.6.7.2 Whitening . . . 22

2.6.7.3 Within-Class Covariance Normalization . . . 23

2.6.7.4 Linear Discriminant Analysis . . . 24

2.6.8 Score Level and System Fusions . . . 24

2.7 Summary . . . 25

3 Spoken Language Clustering 27 3.1 Problem Definition . . . 27

3.1.1 Relation to Language Diarization . . . 27

3.2 Overview and Applications . . . 28

3.3 Language Clustering as an Unsupervised Learning Task . . . 28

3.4 Clustering Algorithms . . . 29

3.4.1 Spherical K-means . . . 30

3.4.2 Von Mises-Fisher Mixtures . . . 30

3.4.3 Mean Shift . . . 30

3.4.4 Agglomerative Hierarchical Clustering . . . 31

3.4.5 HDBSCAN . . . 32

3.5 Evaluation of Clustering . . . 33

3.5.1 External Quality Measures . . . 33

3.5.1.1 Impurity Measures . . . 33

3.5.1.2 Adjusted Rand Index . . . 34

3.5.1.3 BBN Metric and Clustering Efficiency . . . 35

3.5.2 Internal Quality Measures . . . 35

3.5.2.1 Nearest Neighbor Purity Estimator and Estimated Clustering Efficiency . . . 35

3.5.2.2 Silhouette . . . 36

3.6 Summary . . . 36

4 Language Database Description and Analysis 37 4.1 NIST 2015 Language Recognition I-vector Machine Learning Chal-lenge Database . . . 37

4.1.1 Utterance Duration . . . 39

4.1.2 Data Analysis . . . 39

4.1.2.1 Silhouette Plots . . . 39

4.1.2.2 2D Visualization of the Data . . . 39

4.2 Summary . . . 40

5 Language Clustering Experiments 45 5.1 I-vector Preprocessing . . . 46

5.2 Spherical K-means Clustering Experiments . . . 46

5.3 Von Mises-Fisher Clustering Experiments . . . 49

5.4 Agglomerative Hierarchical Clustering Experiments . . . 52

5.5 Mean Shift Clustering Experiments . . . 53

5.5.1 Pruning with Noise Redistribution . . . 54

5.5.2 Pruning with Noise Removal . . . 56

5.6 HDBSCAN Clustering Experiments . . . 59

5.7 Results Analysis . . . 61

(3)

ix

6 Language Recognition with Clustering-based Modeling Experiments 67

6.1 Cluster-based Modeling . . . 69

6.2 Language Recognition Experiments and Results . . . 71

6.2.1 Performance Evaluation . . . 71

6.2.1.1 NIST Cost Function . . . 72

6.2.1.2 Average Decision Cost Function Cavg . . . 72

6.2.1.3 I-vector Preprocessing . . . 73

6.2.2 Experiments and Results from Baseline Systems . . . 74

6.2.2.1 Cosine Distance Scoring . . . 74

6.2.2.2 Gaussian Classifier . . . 74

6.2.2.3 Von Mises-Fisher Classifier . . . 75

6.2.2.4 Support Vector Machine . . . 76

6.2.2.5 Logistic Regression . . . 77

6.2.3 Experiments and Results from Systems with Cluster-based Modeling . . . 78

6.2.3.1 Cosine Distance Scoring with Cluster-based Mod-eling . . . 79

6.2.3.2 Logistic Regression with Cluster-based Modeling . 80 6.2.4 Effect of Randomness . . . 80

6.2.5 Impact of Training Data Size and the Number of Languages 81 6.2.5.1 Impact of the Training Size . . . 81

6.2.5.2 Impact of the Number of Languages . . . 82

6.2.6 Score and System Level Fusion . . . 83

6.3 Results Analysis and Disscussion . . . 84

6.3.0.1 Statistical Significance of the Results . . . 87

6.3.0.2 Impact of Cluster-based Modeling on Computa-tional Complexity . . . 88

6.4 Conclusion . . . 88

7 Conclusions 89 7.1 Future Work . . . 90

Cytaty

Pobierz teraz ( PDF - 3 Stron - 101.44 KB )

Powiązane dokumenty

Database Foundations

October 2005 Functional Programming for DB DB Foundations 1 data design separated from process

eXtensible Markup Language, cz. 4

eXtensible Markup

eXtensible Markup Language, cz. 4

• Definicje typów mogą wystąpić globalnie (z atrybutem name) albo wewnątrz deklaracji opisywanego elementu (anonimowo, celem

Sign language perception research for improving automatic sign language recognition

The vertical black ticks indicate the phases borders (with transition areas between the end of the preparation and the start of the stroke, and between the end of the stroke and

Nie wydany tekst Stanisława I. Witkiewicza

Wydaje się, iż w chwili gdy główne osoby w y stępujące w Sprawozdaniu już n ie żyją oraz w 'czasie kiedy W itkiewicz staje się klasykiem, wzgląd

Całkowicie rozbita została stfera św ięta arcydzieł, mówi się o nich obok kiczu.?. L ite ra tu ra jest przekraczaniem św iata, przekraczaniem tego, co

Language learning and the Chomskyan revolution

[r]

Database documentation

Natural gas statistics on production, total imports and exports, stock changes, stock levels, gross inland consumption and consumption in the transformation sector, energy

Powiązane dokumenty

Widok Mechanicyzm Roberta Boyle’a. Od chaosu „rzeki Heraklita” do piękna zegara ze Strasburga

Widok Mechanicyzm Roberta Boyle’a. Od chaosu „rzeki Heraklita” do piękna zegara ze Strasburga

10

0

0

Widok Simpatías por el Mal: El discurso persuasivo en “Sympathy for the Devil” de los Rolling Stones y “Jesucristo García” de Extremoduro

Widok Simpatías por el Mal: El discurso persuasivo en “Sympathy for the Devil” de los Rolling Stones y “Jesucristo García” de Extremoduro

17

0

0

Wybrane instrumenty wspierania rozwoju lokalnego

Wybrane instrumenty wspierania rozwoju lokalnego

15

0

0

Laboratory wave generation - A comparison of theoretical and experimental performance

Laboratory wave generation - A comparison of theoretical and experimental performance

8

0

0

Wykaz skrótów

Wykaz skrótów

3

0

0

Książki nadesłane

Książki nadesłane

3

0

0

Miocen z okolic Książa Wielkiego. (Das Miocän der Umgebung von Książ Wielki).

Miocen z okolic Książa Wielkiego. (Das Miocän der Umgebung von Książ Wielki).

18

0

0

Jak w Krakowie chrząszcz brzmi w trzcinie. „Oswajanie chrząszcza iv trzcinie, czyli o kształceniu cudzoziemców w Instytucie Polonijnym Uniwersytetu Jagiellońskiego”, red. W. Miodunka, J. Rokicki, UNIVERSITAS, Kraków 1999

Jak w Krakowie chrząszcz brzmi w trzcinie. „Oswajanie chrząszcza iv trzcinie, czyli o kształceniu cudzoziemców w Instytucie Polonijnym Uniwersytetu Jagiellońskiego”, red. W. Miodunka, J. Rokicki, UNIVERSITAS, Kraków 1999

5

0

0