Statistical Models in Forensic Voice Comparison

This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings / x-vectors), mismatch compensation, and score-to-likelihood-ratio conversion (aka calibration). Empirical validation of forensic-voice-comparison systems is also covered. The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic-speaker-recognition literature from which the signal-processing and statistical-modeling techniques are mostly drawn. Knowledge of the likelihood-ratio framework for the evaluation of forensic evidence is assumed. It is hoped that the material presented here will be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could potentially also be applied to data from other branches of forensic science.


page 1

page 2

page 3

page 4


Voice Conversion for Whispered Speech Synthesis

We present an approach to synthesize whisper by applying a handcrafted s...

Bayesian Strategies for Likelihood Ratio Computation in Forensic Voice Comparison with Automatic Systems

This paper explores several strategies for Forensic Voice Comparison (FV...

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

In forensic voice comparison the speaker embedding has become widely pop...

Spoof detection using x-vector and feature switching

Detecting spoofed utterances is a fundamental problem in voice-based bio...

V-Cloak: Intelligibility-, Naturalness- Timbre-Preserving Real-Time Voice Anonymization

Voice data generated on instant messaging or social media applications c...

Noninvasive Fetal Electrocardiography: Models, Technologies and Algorithms

The fetal electrocardiogram (fECG) was first recorded from the maternal ...

Voice Activity Detection Scheme by Combining DNN Model with GMM Model

Due to the superior modeling ability of deep neural network (DNN), it is...

Please sign up or login with your details

Forgot password? Click here to reset