Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

by   Yuanbo Hou, et al.

Most existing deep learning-based acoustic scene classification (ASC) approaches directly utilize representations extracted from spectrograms to identify target scenes. However, these approaches pay little attention to the audio events occurring in the scene despite they provide crucial semantic information. This paper conducts the first study that investigates whether real-life acoustic scenes can be reliably recognized based only on the features that describe a limited number of audio events. To model the task-specific relationships between coarse-grained acoustic scenes and fine-grained audio events, we propose an event relational graph representation learning (ERGL) framework for ASC. Specifically, ERGL learns a graph representation of an acoustic scene from the input audio, where the embedding of each event is treated as a node, while the relationship cues derived from each pair of event embeddings are described by a learned multidimensional edge feature. Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations. The validity of the proposed ERGL framework proves the feasibility of recognizing diverse acoustic scenes based on the event relational graph. Our code is available on our homepage (


page 1

page 2

page 3

page 4


Relation-guided acoustic scene classification aided with event embeddings

In real life, acoustic scenes and audio events are naturally correlated....

Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion

Previous works on scene classification are mainly based on audio or visu...

Event-related data conditioning for acoustic event classification

Models based on diverse attention mechanisms have recently shined in tas...

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Sound events in daily life carry rich information about the objective wo...

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Detection of common events and scenes from audio is useful for extractin...

Deep Learning Based Open Set Acoustic Scene Classification

In this work, we compare the performance of three selected techniques in...

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

This work investigates pretrained audio representations for few shot Sou...

Please sign up or login with your details

Forgot password? Click here to reset