Memory Time Span in LSTMs for Multi-Speaker Source Separation

08/24/2018
by   Jeroen Zegers, et al.
0

With deep learning approaches becoming state-of-the-art in many speech (as well as non-speech) related machine learning tasks, efforts are being taken to delve into the neural networks which are often considered as a black box. In this paper it is analyzed how recurrent neural network (RNNs) cope with temporal dependencies by determining the relevant memory time span in a long short-term memory (LSTM) cell. This is done by leaking the state variable with a controlled lifetime and evaluating the task performance. This technique can be used for any task to estimate the time span the LSTM exploits in that specific scenario. The focus in this paper is on the task of separating speakers from overlapping speech. We discern two effects: A long term effect, probably due to speaker characterization and a short term effect, probably exploiting phone-size formant tracks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

Analysis of memory in LSTM-RNNs for source separation

Long short-term memory recurrent neural networks (LSTM-RNNs) are conside...
research
02/13/2016

Learning Over Long Time Lags

The advantage of recurrent neural networks (RNNs) in learning dependenci...
research
09/08/2018

Dual-label Deep LSTM Dereverberation For Speaker Verification

In this paper, we present a reverberation removal approach for speaker v...
research
02/12/2018

Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control

In this work we introduce the application of black-box quantum control a...
research
12/19/2019

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

In recent years there have been many deep learning approaches towards th...
research
09/14/2016

TristouNet: Triplet Loss for Speaker Turn Embedding

TristouNet is a neural network architecture based on Long Short-Term Mem...
research
05/07/2018

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Deep neural networks have become an indispensable technique for audio so...

Please sign up or login with your details

Forgot password? Click here to reset