Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads

03/02/2023
by   Ye-Rin Jeoung, et al.
0

Transformer-based end-to-end neural speaker diarization (EEND) models utilize the multi-head self-attention (SA) mechanism to enable accurate speaker label prediction in overlapped speech regions. In this study, to enhance the training effectiveness of SA-EEND models, we propose the use of auxiliary losses for the SA heads of the transformer layers. Specifically, we assume that the attention weight matrices of an SA layer are redundant if their patterns are similar to those of the identity matrix. We then explicitly constrain such matrices to exhibit specific speaker activity patterns relevant to voice activity detection or overlapped speech detection tasks. Consequently, we expect the proposed auxiliary losses to guide the transformer layers to exhibit more diverse patterns in the attention weights, thereby reducing the assumed redundancies in the SA heads. The effectiveness of the proposed method is demonstrated using the simulated and CALLHOME datasets for two-speaker diarization tasks, reducing the diarization error rate of the conventional SA-EEND model by 32.58 17.11

READ FULL TEXT
research
08/27/2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

This paper describes a speaker diarization model based on target speaker...
research
10/14/2021

Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization

End-to-end neural diarization (EEND) with self-attention directly predic...
research
04/08/2021

End-to-end speaker segmentation for overlap-aware resegmentation

Speaker segmentation consists in partitioning a conversation between one...
research
08/28/2023

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

For personalized speech generation, a neural text-to-speech (TTS) model ...
research
06/14/2021

End-to-end Neural Diarization: From Transformer to Conformer

We propose a new end-to-end neural diarization (EEND) system that is bas...
research
05/28/2021

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neur...
research
10/31/2022

DiaCorrect: End-to-end error correction for speaker diarization

In recent years, speaker diarization has attracted widespread attention....

Please sign up or login with your details

Forgot password? Click here to reset