Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection

05/24/2020
by   Murong Ma, et al.
0

In this paper, we propose a deep convolutional neural network-based acoustic word embedding system on code-switching query by example spoken term detection. Different from previous configurations, we combine audio data in two languages for training instead of only using one single language. We transform the acoustic features of keyword templates and searching content to fixed-dimensional vectors and calculate the distances between keyword segments and searching content segments obtained in a sliding manner. An auxiliary variability-invariant loss is also applied to training data within the same word but different speakers. This strategy is used to prevent the extractor from encoding undesired speaker- or accent-related information into the acoustic word embeddings. Experimental results show that our proposed system produces promising searching results in the code-switching test scenario. With the increased number of templates and the employment of variability-invariant loss, the searching performance is further enhanced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

We propose to learn acoustic word embeddings with temporal context for q...
research
08/28/2023

Neural approaches to spoken content embedding

Comparing spoken segments is a central operation to speech processing. T...
research
11/08/2016

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Acoustic word embeddings --- fixed-dimensional vector representations of...
research
06/24/2021

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

Acoustic word embedding models map variable duration speech segments to ...
research
06/12/2017

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

Query-by-example search often uses dynamic time warping (DTW) for compar...
research
11/07/2018

Learning acoustic word embeddings with phonetically associated triplet network

Previous researches on acoustic word embeddings used in query-by-example...
research
03/07/2021

CNN-based Spoken Term Detection and Localization without Dynamic Programming

In this paper, we propose a spoken term detection algorithm for simultan...

Please sign up or login with your details

Forgot password? Click here to reset