Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

09/18/2023
by   Feiyang Xiao, et al.
0

Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio representation. Specifically, the text-to-audio generation model, i.e., AudioLDM, is used to generate synthetic audio signals with captions from an image captioning dataset. Our SynthAC expands the availability of well-annotated captions from the text-vision domain to audio captioning, thus enhancing text-audio representation by learning relations within synthetic text-audio pairs. Experiments demonstrate that our SynthAC framework can benefit audio captioning models by incorporating well-annotated text corpus from the text-vision domain, offering a promising solution to the challenge caused by data scarcity. Furthermore, SynthAC can be easily adapted to various state-of-the-art methods, leading to substantial performance improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

The field of audio captioning has seen significant advancements in recen...
research
09/14/2023

Training Audio Captioning Models without Audio

Automated Audio Captioning (AAC) is the task of generating natural langu...
research
09/18/2023

RECAP: Retrieval-Augmented Audio Captioning

We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and eff...
research
09/19/2019

Large-scale representation learning from visually grounded untranscribed speech

Systems that can associate images with their spoken audio captions are a...
research
10/13/2021

Diverse Audio Captioning via Adversarial Training

Audio captioning aims at generating natural language descriptions for au...
research
09/06/2023

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

There has been significant research on developing pretrained transformer...
research
06/05/2020

Audio Captioning using Gated Recurrent Units

Audio captioning is a recently proposed task for automatically generatin...

Please sign up or login with your details

Forgot password? Click here to reset