DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

04/23/2021
by   Shahin Amiriparian, et al.
0

Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilise them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition using pre-trained image convolutional neural networks (CNNs). The framework creates and augments Mel-spectrogram plots on-the-fly from raw audio signals which are then used to finetune specific pre-trained CNNs for the target classification task. Subsequently, the whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone. DeepSpectrumLite operates decentralised, eliminating the need for data upload for further processing. By obtaining state-of-the-art results on a set of paralinguistics tasks, we demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing, even when data is scarce. We provide an extensive command-line interface for users and developers which is comprehensively documented and publicly available at https://github.com/DeepSpectrum/DeepSpectrumLite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2021

Transfer Learning with Jukebox for Music Source Separation

In this work, we demonstrate how to adapt a publicly available pre-train...
research
03/12/2021

Real-time Timbre Transfer and Sound Synthesis using DDSP

Neural audio synthesis is an actively researched topic, having yielded a...
research
03/10/2021

EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition

In this manuscript, the topic of multi-corpus Speech Emotion Recognition...
research
01/28/2023

Cross-domain Neural Pitch and Periodicity Estimation

Pitch is a foundational aspect of our perception of audio signals. Pitch...
research
01/16/2020

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

Automatic speech synthesis is a challenging task that is becoming increa...
research
06/20/2023

Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Deploying deep learning models on embedded devices is an arduous task: o...
research
06/15/2023

Audio Tagging on an Embedded Hardware Platform

Convolutional neural networks (CNNs) have exhibited state-of-the-art per...

Please sign up or login with your details

Forgot password? Click here to reset