Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

by   Cheng Yu, et al.

Integrating modalities, such as video signals with speech, has been shown to provide a standard quality and intelligibility for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources, which may complicate the respective SE. By contrast, a bone-conducted speech signal has a moderate data size while it manifests speech-phoneme structures, and thus complements its air-conducted counterpart, benefiting the enhancement. In this study, we propose a novel multi-modal SE structure that leverages bone- and air-conducted signals. In addition, we examine two strategies, early fusion and late fusion (LF), to process the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results indicate that this newly presented multi-modal structure significantly outperforms the single-source SE counterparts (with a bone- or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.


page 1

page 2

page 3

page 4


Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform

In this paper, we propose a novel speech enhancement (SE) method by expl...

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

Multimodal learning has been proven to be an effective method to improve...

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Speech generation and enhancement based on articulatory movements facili...

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

Time-domain single-channel speech enhancement (SE) still remains challen...

SEANet: A Multi-modal Speech Enhancement Network

We explore the possibility of leveraging accelerometer data to perform s...

A Multi-modal Deformable Land-air Robot for Complex Environments

Single locomotion robots often struggle to adapt in highly variable or u...

Please sign up or login with your details

Forgot password? Click here to reset