Plug-and-Play Multilingual Few-shot Spoken Words Recognition

05/03/2023
by   Aaqib Saeed, et al.
0

As technology advances and digital devices become prevalent, seamless human-machine communication is increasingly gaining significance. The growing adoption of mobile, wearable, and other Internet of Things (IoT) devices has changed how we interact with these smart devices, making accurate spoken words recognition a crucial component for effective interaction. However, building robust spoken words detection system that can handle novel keywords remains challenging, especially for low-resource languages with limited training data. Here, we propose PLiX, a multilingual and plug-and-play keyword spotting system that leverages few-shot learning to harness massive real-world data and enable the recognition of unseen spoken words at test-time. Our few-shot deep models are learned with millions of one-second audio clips across 20 languages, achieving state-of-the-art performance while being highly efficient. Extensive evaluations show that PLiX can generalize to novel spoken words given as few as just one support example and performs well on unseen languages out of the box. We release models and inference code to serve as a foundation for future research and voice-enabled user interface development for emerging devices.

READ FULL TEXT

page 2

page 4

page 6

page 9

page 10

page 11

page 12

page 15

research
10/21/2022

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

While neural methods for text-to-speech (TTS) have shown great advances ...
research
04/17/2021

Multilingual and Cross-Lingual Intent Detection from Spoken Data

We present a systematic study on multilingual and cross-lingual intent d...
research
11/30/2022

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Automatic spoken language identification (LID) is a very important resea...
research
02/16/2022

ADIMA: Abuse Detection In Multilingual Audio

Abusive content detection in spoken text can be addressed by performing ...
research
06/02/2023

Efficient Spoken Language Recognition via Multilabel Classification

Spoken language recognition (SLR) is the task of automatically identifyi...
research
02/26/2020

Towards Zero-shot Learning for Automatic Phonemic Transcription

Automatic phonemic transcription tools are useful for low-resource langu...
research
11/14/2021

Binary classification of spoken words with passive elastic metastructures

Many electronic devices spend most of their time waiting for a wake-up e...

Please sign up or login with your details

Forgot password? Click here to reset