SOCCEsRs: Open Challenge for Correcting Errors of Speech Recognition Systems
2019-04-09
Outline
1 Task description
2 Dataset
3 Evaluation
Task description
”Develop a method that improves the result of speech recognition
process on the basis of the (erroneous) output of the ASR system
and the correct human-made transcription”
Task description
Speech corpus
ASR response ASR
Error correction
system
Corrected ASR response
Transcription Recording
Reference
Rationale
Why such transformation is needed? It can be used in ASR system as postprocessing stage as:
I re-scoring of hypothesis produced by ASR using information not present at earlier stages of processing (i.e. one directrional LM)
I adaptation of a black box ASR system to some specific domain
Dataset
I 9000 sentences from Polish Wikinews I Recorded in studio
I Transcribed by human annotators
I Recognized by stat-of-the-art ASR system
I 8000:1000 train/test split
Dataset normalization
I all words are UPPERCASED I punctuation marks are removed
I numbers and special characters are replaced by their spoken
forms
Example lines
Example line from tsv file containing training dataset:
Evaluation
I average relative WER change I relative SER change
I CharMatch
Average relative WER change
I relative difference between:
I Word Error Rate of ASR hypothesis averaged over all test sentences
I Word Error Rate of hypothesis corrected by the proposed system averaged over all test sentences
Word Error Rate
W ER = S + D + I N = H + S + D
where S = number of substitutions, D = number of deletions, I =
number of insertions, H - number of hits, N - length of reference
sentence. See i.e. [?] for in-depth explanation.
Relative WER change
W ER
r∆ = 1 −
P|C|i=0 W ERi
W ER0i
|C|
where: |C| = number of sentences in corpora, W ER
i= W ER of i-th sentence in the corpora, processed by the system, W ER
0i= W ER of the i-th sentence in the corpora, returned by ASR.
In example: if average W ER
0of raw ASR system is 8 and after
processing sentences through the error correction system W ER is 6
then W ER
r∆ = 25%.
Relative SER change
I relative difference between Sentence Error Rate of ASR hypothesis and hypothesis corrected by the proposed system SER
I SER: ratio of number of sentences with W ER = 0 (correctly recognized sentences) to number of all sentences in the corpora.
I Relative SER reduction is defined similarly to W ER
r∆:
SER
r∆ = 1 − SER SER
0
CharMatch
Introduced in [?].
F
0.5-measure defined in as follows:
F
0.5 = (1 + 0.5
2) × P × R 0.5
2P + R Where: P is precision and R is recall:
P =
Pi
T
i Pi
d
L(h
i, s
i) , R =
Pi
T
i Pi