Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

11/12/2022
by   Federico Landini, et al.
0

End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once. Many flavors of end-to-end models have been proposed but all of them require (so far non-existing) large amounts of annotated data for training. The compromise solution consists in generating synthetic data and the recently proposed simulated conversations (SC) have shown remarkable improvements over the original simulated mixtures (SM). In this work, we create SC with multiple speakers per conversation and show that they allow for substantially better performance than SM, also reducing the dependence on a fine-tuning stage. We also create SC with wide-band public audio sources and present an analysis on several evaluation sets. Together with this publication, we release the recipes for generating such data and models trained on public sets as well as the implementation to efficiently handle multiple speakers per conversation and an auxiliary voice activity detection loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2022

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end neural diarization (EEND) is nowadays one of the most promine...
research
04/24/2022

Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

This paper investigates a method for simulating natural conversation in ...
research
10/20/2017

Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models

Building a persona-based conversation agent is challenging owing to the ...
research
05/05/2021

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diariz...
research
10/26/2022

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

This paper describes the TSUP team's submission to the ISCSLP 2022 conve...
research
03/09/2023

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Target speech extraction (TSE) systems are designed to extract target sp...
research
06/14/2021

End-to-end Neural Diarization: From Transformer to Conformer

We propose a new end-to-end neural diarization (EEND) system that is bas...

Please sign up or login with your details

Forgot password? Click here to reset