End-to-end Neural Diarization: From Transformer to Conformer

06/14/2021
by   Yi-Chieh Liu, et al.
0

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech. We first show that data augmentation and convolutional subsampling layers enhance the original self-attentive EEND in the Transformer-based EEND, and then Conformer gives an additional gain over the Transformer-based EEND. However, we notice that the Conformer-based EEND does not generalize as well from simulated to real conversation data as the Transformer-based model. This leads us to quantify the mismatch between simulated data and real speaker behavior in terms of temporal statistics reflecting turn-taking between speakers, and investigate its correlation with diarization error. By mixing simulated and real data in EEND training, we mitigate the mismatch further, with Conformer-based EEND achieving 24 SA-EEND system, and 10 system, on two-speaker CALLHOME data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

This paper describes a speaker diarization model based on target speaker...
research
01/02/2020

Speaker-aware speech-transformer

Recently, end-to-end (E2E) models become a competitive alternative to th...
research
09/13/2023

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Deep neural network-based systems have significantly improved the perfor...
research
03/02/2023

Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads

Transformer-based end-to-end neural speaker diarization (EEND) models ut...
research
10/19/2020

Model-based Policy Optimization with Unsupervised Model Adaptation

Model-based reinforcement learning methods learn a dynamics model with r...
research
05/05/2021

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diariz...
research
11/12/2022

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end diarization presents an attractive alternative to standard ca...

Please sign up or login with your details

Forgot password? Click here to reset