Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

06/17/2022
by   Marc-Antoine Georges, et al.
0

The human perception system is often assumed to recruit motor knowledge when processing auditory speech inputs. Using articulatory modeling and deep learning, this study examines how this articulatory information can be used for discovering speech units in a self-supervised setting. We used vector-quantized variational autoencoders (VQ-VAE) to learn discrete representations from articulatory and acoustic speech data. In line with the zero-resource paradigm, an ABX test was then used to investigate how the extracted representations encode phonetically relevant properties. Experiments were conducted on three different corpora in English and French. We found that articulatory information rather organises the latent representations in terms of place of articulation whereas the speech acoustics mainly structure the latent space in terms of manner of articulation. We show that an optimal fusion of the two modalities can lead to a joint representation of these phonetic dimensions more accurate than each modality considered individually. Since articulatory information is usually not available in a practical situation, we finally investigate the benefit it provides when inferred from the speech acoustics in a self-supervised manner.

READ FULL TEXT

page 1

page 3

research
06/04/2023

An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

Self-supervised representation learning for speech often involves a quan...
research
08/12/2021

Mispronunciation Detection and Correction via Discrete Acoustic Units

Computer-Assisted Pronunciation Training (CAPT) plays an important role ...
research
10/27/2022

Opening the Black Box of wav2vec Feature Encoder

Self-supervised models, namely, wav2vec and its variants, have shown pro...
research
05/05/2023

A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning

In this paper, we present a multimodal and dynamical VAE (MDVAE) applied...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
10/29/2022

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models

While discrete latent variable models have had great success in self-sup...
research
02/03/2022

SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

For personalized medicines, very crucial intrinsic information is presen...

Please sign up or login with your details

Forgot password? Click here to reset