Classifying topics in speech when all you have is crummy translations

08/29/2019
by   Sameer Bansal, et al.
0

Given a large amount of unannotated speech in a language with few resources, can we classify the speech utterances by topic? We show that this is possible if text translations are available for just a small amount of speech (less than 20 hours), using a recent model for direct speech-to-text translation. While the translations are poor, they are still good enough to correctly classify 1-minute speech segments over 70 majority-class baseline. Such a system might be useful for humanitarian applications like crisis response, where incoming speech must be quickly assessed for further action.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset