FUN! Fast, Universal, Non-Semantic Speech Embeddings

11/09/2020

∙

Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility in mobile settings where run-time performance is a significant bottleneck. We propose a class of lightweight universal speech embedding models based on MobileNet that are designed to run efficiently on mobile devices. These embeddings, which encapsulate speech non-semantics and thus can be re-used for several tasks, are trained via knowledge distillation. We show that these embedding models are fast enough to run in real-time on a variety of mobile devices and exhibit negligible performance degradation on most tasks in a recently published benchmark of non-semantic speech tasks. Furthermore, we demonstrate that these representations are useful for mobile health tasks such as mask detection during speech and non-speech human sounds detection.

READ FULL TEXT

FUN! Fast, Universal, Non-Semantic Speech Embeddings

Sign in with Google

Consider DeepAI Pro