Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

by   Haibin Wu, et al.

The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion models, and replay attacks. Recently, the first Audio Deep Synthesis Detection challenge (ADD 2022) extends the attack scenarios into more aspects. Also ADD 2022 is the first challenge to propose the partially fake audio detection task. Such brand new attacks are dangerous and how to tackle such attacks remains an open question. Thus, we propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios. The proposed fake span detection module tasks the anti-spoofing model to predict the start and end positions of the fake clip within the partially fake audio, address the model's attention into discovering the fake spans rather than other shortcuts with less generalization, and finally equips the model with the discrimination capacity between real and partially fake audios. Our submission ranked second in the partially fake audio detection track of ADD 2022.


page 1

page 2

page 3

page 4


ADD 2022: the First Audio Deep Synthesis Detection Challenge

Audio deepfake detection is an emerging topic, which was included in the...

FSD: An Initial Chinese Dataset for Fake Song Detection

Singing voice synthesis and singing voice conversion have significantly ...

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuou...

CAU_KU team's submission to ADD 2022 Challenge task 1: Low-quality fake audio detection through frequency feature masking

This technical report describes Chung-Ang University and Korea Universit...

DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices

With the recent advances in voice synthesis, AI-synthesized fake voices ...

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Audio Deepfake Detection (ADD) aims to detect the fake audio generated b...

TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

Thanks to recent advancements in end-to-end speech modeling technology, ...

Please sign up or login with your details

Forgot password? Click here to reset