HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation

10/08/2020
by   Rohan Kumar Das, et al.
0

This work describes the speaker verification system developed by Human Language Technology Laboratory, National University of Singapore (HLT-NUS) for 2019 NIST Multimedia Speaker Recognition Evaluation (SRE). The multimedia research has gained attention to a wide range of applications and speaker recognition is no exception to it. In contrast to the previous NIST SREs, the latest edition focuses on a multimedia track to recognize speakers with both audio and visual information. We developed separate systems for audio and visual inputs followed by a score level fusion of the systems from the two modalities to collectively use their information. The audio systems are based on x-vector based speaker embedding, whereas the face recognition systems are based on ResNet and InsightFace based face embeddings. With post evaluation studies and refinements, we obtain an equal error rate (EER) of 0.88 actual detection cost function (actDCF) of 0.026 on the evaluation set of 2019 NIST multimedia SRE corpus.

READ FULL TEXT
research
03/10/2022

EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Large datasets are very useful for training speaker recognition systems,...
research
04/21/2022

The 2021 NIST Speaker Recognition Evaluation

The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of ...
research
12/25/2019

THUEE system description for NIST 2019 SRE CTS Challenge

This paper describes the systems submitted by the department of electron...
research
04/03/2021

Multimedia Technology Applications and Algorithms: A Survey

Multimedia related research and development has evolved rapidly in the l...
research
08/13/2020

Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019

Fusion of scores is a cornerstone of multimodal biometric systems compos...
research
07/02/2018

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Speechreading or lipreading is the technique of understanding and gettin...
research
07/10/2022

Information-Theoretic Bounds for Steganography in Multimedia

Steganography in multimedia aims to embed secret data into an innocent l...

Please sign up or login with your details

Forgot password? Click here to reset