MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

by   Yue Li, et al.

This report describes the NPU-HC speaker verification system submitted to the O-COCOSDA Multi-lingual Speaker Verification (MSV) Challenge 2022, which focuses on developing speaker verification systems for low-resource Asian languages. We participate in the I-MSV track, which aims to develop speaker verification systems for various Indian languages. In this challenge, we first explore different neural network frameworks for low-resource speaker verification. Then we leverage vanilla fine-tuning and weight transfer fine-tuning to transfer the out-domain pre-trained models to the in-domain Indian dataset. Specifically, the weight transfer fine-tuning aims to constrain the distance of the weights between the pre-trained model and the fine-tuned model, which takes advantage of the previously acquired discriminative ability from the large-scale out-domain datasets and avoids catastrophic forgetting and overfitting at the same time. Finally, score fusion is adopted to further improve performance. Together with the above contributions, we obtain 0.223 EER on the public evaluation set, ranking 2nd place on the leaderboard. On the private evaluation set, the EER of our submitted system is 2.123 for the constrained and unconstrained sub-tasks of the I-MSV track, leading to the 1st and 3rd place in the ranking, respectively.


page 1

page 2

page 3

page 4


Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Recently, the pre-trained Transformer models have received a rising inte...

The SJTU System for Short-duration Speaker Verification Challenge 2021

This paper presents the SJTU system for both text-dependent and text-ind...

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

As pre-trained models automate many code intelligence tasks, a widely us...

Avoiding catastrophic forgetting in mitigating model biases in sentence-pair classification with elastic weight consolidation

The biases present in training datasets have been shown to be affecting ...

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Recent neural Text-to-Speech (TTS) models have been shown to perform ver...

EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Large datasets are very useful for training speaker recognition systems,...

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic...

Please sign up or login with your details

Forgot password? Click here to reset