A Multimodal Prototypical Approach for Unsupervised Sound Classification

06/21/2023
by   Saksham Singh Kushwaha, et al.
0

In the context of environmental sound classification, the adaptability of systems is key: which sound classes are interesting depends on the context and the user's needs. Recent advances in text-to-audio retrieval allow for zero-shot audio classification, but performance compared to supervised models remains limited. This work proposes a multimodal prototypical approach that exploits local audio-text embeddings to provide more relevant answers to audio queries, augmenting the adaptability of sound detection in the wild. We do this by first using text to query a nearby community of audio embeddings that best characterize each query sound, and select the group's centroids as our prototypes. Second, we compare unseen audio to these prototypes for classification. We perform multiple ablation studies to understand the impact of the embedding models and prompts. Our unsupervised approach improves upon the zero-shot state-of-the-art in three sound recognition benchmarks by an average of 12

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2019

Zero-Shot Audio Classification Based on Class Label Embeddings

This paper proposes a zero-shot learning approach for audio classificati...
research
11/06/2017

Unsupervised Learning of Semantic Audio Representations

Even in the absence of any explicit semantic annotation, vast collection...
research
06/24/2021

AudioCLIP: Extending CLIP to Image, Text and Audio

In the past, the rapidly evolving field of sound classification greatly ...
research
09/17/2023

Zero- and Few-shot Sound Event Localization and Detection

Sound event localization and detection (SELD) systems estimate direction...
research
05/03/2023

Unsupervised Improvement of Audio-Text Cross-Modal Representations

Recent advances in using language models to obtain cross-modal audio-tex...
research
12/17/2021

Soundify: Matching Sound Effects to Video

In the art of video editing, sound is really half the story. A skilled v...
research
02/12/2020

Improving automated segmentation of radio shows with audio embeddings

Audio features have been proven useful for increasing the performance of...

Please sign up or login with your details

Forgot password? Click here to reset