Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

08/28/2023
by   Yong Wang, et al.
0

In this paper, we propose a novel time-frequency joint learning method for speech emotion recognition, called Time-Frequency Transformer. Its advantage is that the Time-Frequency Transformer can excavate global emotion patterns in the time-frequency domain of speech signal while modeling the local emotional correlations in the time domain and frequency domain respectively. For the purpose, we first design a Time Transformer and Frequency Transformer to capture the local emotion patterns between frames and inside frequency bands respectively, so as to ensure the integrity of the emotion information modeling in both time and frequency domains. Then, a Time-Frequency Transformer is proposed to mine the time-frequency emotional correlations through the local time-domain and frequency-domain emotion features for learning more discriminative global speech emotion representation. The whole process is a time-frequency joint learning process implemented by a series of Transformer models. Experiments on IEMOCAP and CASIA databases indicate that our proposed method outdoes the state-of-the-art methods.

READ FULL TEXT

page 3

page 11

research
10/22/2022

Speech Emotion Recognition via an Attentive Time-Frequency Neural Network

Spectrogram is commonly used as the input feature of deep neural network...
research
06/02/2023

Learning Local to Global Feature Aggregation for Speech Emotion Recognition

Transformer has emerged in speech emotion recognition (SER) at present. ...
research
03/05/2023

Time-frequency Network for Robust Speaker Recognition

The wide deployment of speech-based biometric systems usually demands hi...
research
04/19/2021

A novel Time-frequency Transformer and its Application in Fault Diagnosis of Rolling Bearings

The scope of data-driven fault diagnosis models is greatly improved thro...
research
05/24/2023

A Joint Time-frequency Domain Transformer for Multivariate Time Series Forecasting

To enhance predicting performance while minimizing computational demands...
research
04/16/2019

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

Speech separation has been very successful with deep learning techniques...
research
08/07/2019

Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition

Convolutional neural networks (CNN) are widely used for speech emotion r...

Please sign up or login with your details

Forgot password? Click here to reset