The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

02/10/2022
by   Maokui He, et al.
1

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition. First, for data preparation and augmentation in training TS-VAD models, speech data containing both real meetings and simulated indoor conversations are used. Second, in refining results obtained after TS-VAD based decoding, we perform a series of post-processing steps to improve the VAD results needed to reduce diarization error rates (DERs). Tested on the ALIMEETING corpus, the newly released Mandarin meeting dataset used in M2MeT, we demonstrate that our proposed system can decrease the DER by up to 66.55/60.59 classical clustering based diarization on the Eval/Test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our submission to ICASSP 2022 Multi-channel Multi-p...
research
02/04/2022

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our speaker diarization system submitted to the Mul...
research
05/14/2020

Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Speaker diarization for real-life scenarios is an extremely challenging ...
research
09/29/2017

PLDA-Based Diarization of Telephone Conversations

This paper investigates the application of the probabilistic linear disc...
research
04/07/2019

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation

The Multi-target Challenge aims to assess how well current speech techno...
research
02/11/2022

The xmuspeech system for multi-channel multi-party meeting transcription challenge

This paper describes the system developed by the XMUSPEECH team for the ...
research
11/18/2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

Recently, hybrid systems of clustering and neural diarization models hav...

Please sign up or login with your details

Forgot password? Click here to reset