Triplet loss based embeddings for forensic speaker identification in Spanish

02/24/2021
by   Emmanuel Maqueda, et al.
3

With the advent of digital technology, it is more common that committed crimes or legal disputes involve some form of speech recording where the identity of a speaker is questioned [1]. In face of this situation, the field of forensic speaker identification has been looking to shed light on the problem by quantifying how much a speech recording belongs to a particular person in relation to a population. In this work, we explore the use of speech embeddings obtained by training a CNN using the triplet loss. In particular, we focus on the Spanish language which has not been extensively studies. We propose extracting the embeddings from speech spectrograms samples, then explore several configurations of such spectrograms, and finally, quantify the embeddings quality. We also show some limitations of our data setting which is predominantly composed by male speakers. At the end, we propose two approaches to calculate the Likelihood Radio given out speech embeddings and we show that triplet loss is a good alternative to create speech embeddings for forensic speaker identification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

In this work, we study the hypothesis that speaker identity embeddings e...
research
02/06/2023

Residual Information in Deep Speaker Embedding Architectures

Speaker embeddings represent a means to extract representative vectorial...
research
09/14/2016

TristouNet: Triplet Loss for Speaker Turn Embedding

TristouNet is a neural network architecture based on Long Short-Term Mem...
research
01/14/2020

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

In this work, a speaker embedding de-mixing approach is proposed. Instea...
research
08/09/2020

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Recent advancements in deep learning led to human-level performance in s...
research
12/03/2020

Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

We present several methods to improve the generalisation of language ide...
research
08/04/2018

Triplet Network with Attention for Speaker Diarization

In automatic speech processing systems, speaker diarization is a crucial...

Please sign up or login with your details

Forgot password? Click here to reset