Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

10/06/2020
by   Michel Plüss, et al.
0

We present a forced sentence alignment procedure for Swiss German speech and Standard German text. It is able to create a speech-to-text corpus in a fully automatic fashion, given an audio recording and the corresponding unaligned transcript. Compared to a manual alignment, it achieves a mean IoU of 0.8401 with a sentence recall of 0.9491. When applying our IoU estimate filter, the mean IoU can be further improved to 0.9271 at the cost of a lower sentence recall of 0.4881. Using this procedure, we created the Swiss Parliaments Corpus, an automatically aligned Swiss German speech to Standard German text corpus. 65 audio-text-pairs, resulting in 293 hours of training data. We have made the corpus freely available for download.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset