X-Vectors with Multi-Scale Aggregation for Speaker Diarization

05/16/2021
by   Myungjong Kim, et al.
0

Speaker diarization is the process of labeling different speakers in a speech signal. Deep speaker embeddings are generally extracted from short speech segments and clustered to determine the segments belong to same speaker identity. The x-vector, which embeds segment-level speaker characteristics by statistically pooling frame-level representations, is one of the most widely used deep speaker embeddings in speaker diarization. Multi-scale aggregation, which employs multi-scale representations from different layers, has recently successfully been used in short duration speaker verification. In this paper, we investigate a multi-scale aggregation approach in an x-vector embedding framework for speaker diarization by exploiting multiple statistics pooling layers from different frame-level layers. Thus, it is expected that x-vectors with multi-scale aggregation have the potential to capture meaningful speaker characteristics from short segments, effectively taking advantage of different information at multiple layers. Experimental evaluation on the CALLHOME dataset showed that our approach provides substantial improvement over the baseline x-vectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification

Currently, the most widely used approach for speaker verification is the...
research
10/07/2021

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-...
research
04/07/2020

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Currently, the most widely used approach for speaker verification is the...
research
02/22/2018

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like sp...
research
04/06/2020

Probabilistic embeddings for speaker diarization

Speaker embeddings (x-vectors) extracted from very short segments of spe...
research
11/08/2022

High-resolution embedding extractor for speaker diarisation

Speaker embedding extractors significantly influence the performance of ...
research
03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...

Please sign up or login with your details

Forgot password? Click here to reset