Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

05/30/2023
by   Jianyuan Sun, et al.
0

Automated audio captioning (AAC) which generates textual descriptions of audio content. Existing AAC models achieve good results but only use the high-dimensional representation of the encoder. There is always insufficient information learning of high-dimensional methods owing to high-dimensional representations having a large amount of information. In this paper, a new encoder-decoder model called the Low- and High-Dimensional Feature Fusion (LHDFF) is proposed. LHDFF uses a new PANNs encoder called Residual PANNs (RPANNs) to fuse low- and high-dimensional features. Low-dimensional features contain limited information about specific audio scenes. The fusion of low- and high-dimensional features can improve model performance by repeatedly emphasizing specific audio scene information. To fully exploit the fused features, LHDFF uses a dual transformer decoder structure to generate captions in parallel. Experimental results show that LHDFF outperforms existing audio captioning models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2022

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Automated audio captioning (AAC) aims to describe the content of an audi...
research
01/10/2022

Local Information Assisted Attention-free Decoder for Audio Captioning

Automated audio captioning (AAC) aims to describe audio data with captio...
research
06/27/2020

Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation

Automated audio captioning is machine listening task whose goal is to de...
research
06/25/2021

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

We address voice activity detection in acoustic environments of transien...
research
04/07/2023

Graph Attention for Automated Audio Captioning

State-of-the-art audio captioning methods typically use the encoder-deco...
research
11/02/2022

MAST: Multiscale Audio Spectrogram Transformers

We present Multiscale Audio Spectrogram Transformer (MAST) for audio cla...

Please sign up or login with your details

Forgot password? Click here to reset