A Study on Robustness to Perturbations for Representations of Environmental Sound

03/20/2022
by   Sangeeta Srivastava, et al.
4

Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions – commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent. Combined with the downstream performance, it helps us make a more informed prediction of how robust the embeddings are to the channel effects. We evaluate two embeddings – YAMNet, and OpenL^3 on monophonic (UrbanSound8K) and polyphonic (SONYC UST) datasets. We show that one distance measure does not suffice in such task-independent evaluation. Although Fréchet Audio Distance (FAD) correlates with the trend of the performance drop in the downstream task most accurately, we show that we need to study this in conjunction with the other distances to get a clear understanding of the overall effect of the perturbation. In terms of the embedding performance, we find OpenL^3 to be more robust to YAMNet, which aligns with the HEAR evaluation.

READ FULL TEXT
research
03/06/2022

HEAR 2021: Holistic Evaluation of Audio Representations

What audio embedding approach generalizes best to a wide range of downst...
research
11/23/2021

Towards Learning Universal Audio Representations

The ability to learn universal audio representations that can solve dive...
research
07/18/2022

Contrastive Environmental Sound Representation Learning

Machine hearing of the environmental sound is one of the important issue...
research
09/15/2023

Diverse Neural Audio Embeddings – Bringing Features back !

With the advent of modern AI architectures, a shift has happened towards...
research
04/15/2022

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

Pre-trained models are essential as feature extractors in modern machine...
research
02/12/2020

Improving automated segmentation of radio shows with audio embeddings

Audio features have been proven useful for increasing the performance of...
research
03/06/2017

Sound-Word2Vec: Learning Word Representations Grounded in Sounds

To be able to interact better with humans, it is crucial for machines to...

Please sign up or login with your details

Forgot password? Click here to reset