Heinrich Dinkel

research

∙ 08/23/2023

CED: Consistent ensemble distillation for audio tagging

Augmentation and knowledge distillation (KD) are well-established techni...

0 Heinrich Dinkel, et al. ∙

research

∙ 06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...

0 Jiuxin Lin, et al. ∙

research

∙ 06/25/2023

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Visual information can serve as an effective cue for target speaker extr...

0 Jiuxin Lin, et al. ∙

research

∙ 05/30/2023

Understanding temporally weakly supervised training: A case study for keyword spotting

The currently most prominent algorithm to train keyword spotting (KWS) m...

0 Heinrich Dinkel, et al. ∙

research

∙ 05/29/2023

Streaming Audio Transformers for Online Audio Tagging

Transformers have emerged as a prominent model framework for audio taggi...

0 Heinrich Dinkel, et al. ∙

research

∙ 03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...

0 Heinrich Dinkel, et al. ∙

research

∙ 09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...

0 Heinrich Dinkel, et al. ∙

research

∙ 09/23/2022

UniKW-AT: Unified Keyword Spotting and Audio Tagging

Within the audio research community and the industry, keyword spotting (...

0 Heinrich Dinkel, et al. ∙

research

∙ 04/28/2022

Pseudo strong labels for large scale weakly supervised audio tagging

Large-scale audio tagging datasets inevitably contain imperfect labels, ...

0 Heinrich Dinkel, et al. ∙

research

∙ 05/10/2021

Voice activity detection in the wild: A data-driven approach using teacher-student training

Voice activity detection is an essential pre-processing component for sp...

0 Heinrich Dinkel, et al. ∙

research

∙ 02/23/2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Automated Audio Captioning is a cross-modal task, generating natural lan...

0 Xuenan Xu, et al. ∙

research

∙ 02/23/2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Automated audio captioning (AAC) aims at generating summarizing descript...

0 Xuenan Xu, et al. ∙

research

∙ 01/19/2021

Towards duration robust weakly supervised sound event detection

Sound event detection (SED) is the task of tagging the absence or presen...

0 Heinrich Dinkel, et al. ∙

research

∙ 07/26/2020

End-to-end spoofing detection with raw waveform CLDNNs

Albeit recent progress in speaker verification generates powerful models...

0 Heinrich Dinkel, et al. ∙

research

∙ 07/13/2020

Multiple Sound Sources Localization from Coarse to Fine

How to visually localize multiple sound sources in unconstrained videos ...

0 Rui Qian, et al. ∙

research

∙ 03/27/2020

Voice activity detection in the wild via weakly supervised sound event detection

Traditional supervised voice activity detection (VAD) methods work well ...

0 Heinrich Dinkel, et al. ∙

research

∙ 03/27/2020

GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection

Traditional voice activity detection (VAD) methods work well in clean an...

0 Heinrich Dinkel, et al. ∙

research

∙ 10/29/2019

Depa: Self-supervised audio embedding for depression detection

Depression detection research has increased over the last few decades as...

0 Heinrich Dinkel, et al. ∙

research

∙ 05/31/2019

What does a Car-ssette tape tell?

Captioning has attracted much attention in image and video understanding...

0 Xuenan Xu, et al. ∙

research

∙ 04/08/2019

Duration robust sound event detection

Task 4 of the Dcase2018 challenge demonstrated that substantially more r...

0 Heinrich Dinkel, et al. ∙

research

∙ 04/08/2019

Text-based Depression Detection: What Triggers An Alert

Recent advances in automatic depression detection mostly derive from mod...

0 Heinrich Dinkel, et al. ∙

research

∙ 02/25/2019

Audio Caption: Listen and Tell

Increasing amount of research has shed light on machine perception of au...

0 Mengyue Wu, et al. ∙

Heinrich Dinkel

Featured Co-authors

Sign in with Google

Consider DeepAI Pro