Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

by   Aleksei Gusev, et al.

Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical point of view, taking into account the increased interest in virtual assistants (such as Amazon Alexa, Google Home, AppleSiri, etc.), speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances. For these purposes, we considered deep neural network architectures based on TDNN (TimeDelay Neural Network) and ResNet (Residual Neural Network) blocks. We experimented with state-of-the-art embedding extractors and their training procedures. Obtained results confirm that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality for both long-duration and short-duration utterances. We also investigate the impact of speech activity detector, different scoring models, adaptation and score normalization techniques. The experimental results are presented for publicly available data and verification protocols for the VoxCeleb1, VoxCeleb2, and VOiCES datasets.


page 1

page 2

page 3

page 4


The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge

This paper describes the systems submitted by team HCCL to the Far-Field...

Crop Aggregating for short utterances speaker verification using raw waveforms

Most studies on speaker verification systems focus on long-duration utte...

Deep neural network based i-vector mapping for speaker verification using short utterances

Text-independent speaker recognition using short utterances is a highly ...

CN-CELEB: a challenging Chinese speaker recognition dataset

Recently, researchers set an ambitious goal of conducting speaker recogn...

Baselines and Protocols for Household Speaker Recognition

Speaker recognition on household devices, such as smart speakers, featur...

LEAP System for SRE19 Challenge – Improvements and Error Analysis

The NIST Speaker Recognition Evaluation - Conversational Telephone Speec...

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

Deep speaker embedding extractors have already become new state-of-the-a...

Please sign up or login with your details

Forgot password? Click here to reset