SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

02/27/2020
by   Arya D. McCarthy, et al.
0

We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English→French and English→Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English→French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

A single speaker is almost all you need for automatic speech recognition

We explore the use of speech synthesis and voice conversion applied to a...
research
10/27/2022

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on ...
research
07/10/2023

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-sp...
research
09/30/2021

SpliceOut: A Simple and Efficient Audio Augmentation Method

Time masking has become a de facto augmentation technique for speech and...
research
07/09/2018

Foreign English Accent Adjustment by Learning Phonetic Patterns

State-of-the-art automatic speech recognition (ASR) systems struggle wit...
research
06/09/2022

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

In this paper, we propose a neural end-to-end system for voice preservin...
research
04/03/2021

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

We propose an on-the-fly data augmentation method for automatic speech r...

Please sign up or login with your details

Forgot password? Click here to reset