Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

10/27/2022
by   Annika Briegleb, et al.
0

In conventional multichannel audio signal enhancement, spatial and spectral filtering are often performed sequentially. In contrast, it has been shown that for neural spatial filtering a joint approach of spectro-spatial filtering is more beneficial. In this contribution, we investigate the influence of the training target on the spatial selectivity of such a time-varying spectro-spatial filter. We extend the recently proposed complex-valued spatial autoencoder (COSPA) for target speaker extraction by leveraging its interpretable structure and purposefully informing the network of the target speaker's position. Consequently, this approach uses a multichannel complex-valued neural network architecture that is capable of processing spatial and spectral information rendering informed COSPA (iCOSPA) an effective neural spatial filtering method. We train iCOSPA for several training targets that enforce different amounts of spatial processing and analyze the network's spatial filtering capacity. We find that the proposed architecture is indeed capable of learning different spatial selectivity patterns to attain the different training targets.

READ FULL TEXT

page 3

page 4

research
06/27/2022

Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement

The key advantage of using multiple microphones for speech enhancement i...
research
06/22/2022

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

Employing deep neural networks (DNNs) to directly learn filters for mult...
research
01/02/2020

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

Target speech separation refers to extracting the target speaker's speec...
research
03/14/2023

Localizing Spatial Information in Neural Spatiospectral Filters

Beamforming for multichannel speech enhancement relies on the estimation...
research
10/20/2022

Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

Telepresence aims to create an immersive but virtual experience of the f...
research
11/04/2022

Spatially Selective Deep Non-linear Filters for Speaker Extraction

In a scenario with multiple persons talking simultaneously, the spatial ...
research
04/22/2011

Intent Inference and Syntactic Tracking with GMTI Measurements

In conventional target tracking systems, human operators use the estimat...

Please sign up or login with your details

Forgot password? Click here to reset