A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

by   Tomi Kinnunen, et al.

Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expensive, and non-reproducible process, it hinders rapid prototyping of new VC technology. We address artifact assessment using an alternative, objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification. Therein, CMs are used for rejecting `fake' inputs such as replayed, synthetic or converted speech but their potential for automatic speech artifact assessment remains unknown. This study serves to fill that gap. As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts. Equal error rate (EER) of the CM, a confusability index of VC samples with real human speech, serves as our artifact measure. Two clusters of VCC'18 entries are identified: low-quality ones with detectable artifacts (low EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, however, is perfect: all EERs are < 30 preliminary findings suggest potential of CMs outside of their original application, as a supplemental optimization and benchmarking tool to enhance VC technology.


page 1

page 2

page 3

page 4


Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

The Voice Conversion Challenge 2020 is the third edition under its flags...

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Speaker identity is one of the important characteristics of human speech...

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuou...

Attentive activation function for improving end-to-end spoofing countermeasure systems

The main objective of the spoofing countermeasure system is to detect th...

Automatic Speech Verification Spoofing Detection

Automatic speech verification (ASV) is the technology to determine the i...

Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1

Speaker anonymization systems continue to improve their ability to obfus...

A Study on the Reliability of Automatic Dysarthric Speech Assessments

Automating dysarthria assessments offers the opportunity to develop effe...

Please sign up or login with your details

Forgot password? Click here to reset