Poformer: A simple pooling transformer for speaker verification

10/10/2021
by   Yufeng Ma, et al.
0

Most recent speaker verification systems are based on extracting speaker embeddings using a deep neural network. The pooling layer in the network aims to aggregate frame-level features extracted by the backbone. In this paper, we propose a new transformer based pooling structure called PoFormer to enhance the ability of the pooling layer to capture information along the whole time axis. Different from previous works that apply attention mechanism in a simple way or implement the multi-head mechanism in serial instead of in parallel, PoFormer follows the initial transformer structure with some minor modifications like a positional encoding generator, drop path and LayerScale to make the training procedure more stable and to prevent overfitting. Evaluated on various datasets, PoFormer outperforms the existing pooling system with at least a 13.00

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

This paper proposes a serialized multi-layer multi-head attention for ne...
research
08/21/2018

Exploring a Unified Attention-Based Pooling Framework for Speaker Verification

The pooling layer is an essential component in the neural network based ...
research
05/24/2023

P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification

Typically, the Time-Delay Neural Network (TDNN) and Transformer can serv...
research
07/07/2021

MACCIF-TDNN: Multi aspect aggregation of channel and context interdependence features in TDNN-based speaker verification

Most of the recent state-of-the-art results for speaker verification are...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...
research
06/26/2022

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Pooling is needed to aggregate frame-level features into utterance-level...
research
06/08/2023

Sequence-to-Sequence Model with Transformer-based Attention Mechanism and Temporal Pooling for Non-Intrusive Load Monitoring

This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based o...

Please sign up or login with your details

Forgot password? Click here to reset