Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

07/17/2020
by   Xiaosu Tong, et al.
0

In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level prediction up to the current frame. In order to avoid redundant computation during online streaming inference, we use a caching mechanism for every convolution operation. Experimental results on a device-directed vs. non device-directed task show that the proposed model yields an equal error rate reduction of 41 on this task. Furthermore, we show that the proposed model is able to accurately predict earlier in time compared to the attention-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

In voice-enabled applications, a predetermined hotword isusually used to...
research
10/09/2021

Streaming on-device detection of device directed speech from voice and touch-based invocation

When interacting with smart devices such as mobile phones or wearables, ...
research
08/07/2018

Device-directed Utterance Detection

In this work, we propose a classifier for distinguishing device-directed...
research
10/20/2020

Knowledge Transfer for Efficient On-device False Trigger Mitigation

In this paper, we address the task of determining whether a given uttera...
research
05/14/2021

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

We present a unified and hardware efficient architecture for two stage v...
research
02/01/2019

Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

Voice controlled virtual assistants (VAs) are now available in smartphon...
research
06/10/2017

Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for

In earlier work, we have shown that articulation rate in Swedish child-d...

Please sign up or login with your details

Forgot password? Click here to reset