Unsupervised Spoken Utterance Classification

by   Shahab Jalalvand, et al.

An intelligent virtual assistant (IVA) enables effortless conversations in call routing through spoken utterance classification (SUC) which is a special form of spoken language understanding (SLU). Building a SUC system requires a large amount of supervised in-domain data that is not always available. In this paper, we introduce an unsupervised spoken utterance classification approach (USUC) that does not require any in-domain data except for the intent labels and a few para-phrases per intent. USUC is consisting of a KNN classifier (K=1) and a complex embedding model trained on a large amount of unsupervised customer service corpus. Among all embedding models, we demonstrate that Elmo works best for USUC. However, an Elmo model is too slow to be used at run-time for call routing. To resolve this issue, first, we compute the uni- and bi-gram embedding vectors offline and we build a lookup table of n-grams and their corresponding embedding vector. Then we use this table to compute sentence embedding vectors at run-time, along with back-off techniques for unseen n-grams. Experiments show that USUC outperforms the traditional utterance classification methods by reducing the classification error rate from 32.9 27.0 technique increases the processing speed from 16 utterances per second to 118 utterances per second.


page 1

page 2

page 3

page 4


Automatic Data Expansion for Customer-care Spoken Language Understanding

Spoken language understanding (SLU) systems are widely used in handling ...

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Current researches on spoken language understanding (SLU) heavily are li...

Incremental Online Spoken Language Understanding

Spoken Language Understanding (SLU) typically comprises of an automatic ...

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

While Word2Vec represents words (in text) as vectors carrying semantic i...

Strong and Simple Baselines for Multimodal Utterance Embeddings

Human language is a rich multimodal signal consisting of spoken words, f...

Utterance-level Intent Recognition from Keywords

This paper focuses on wake on intent (WOI) techniques for platforms with...

Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates

In domain classification for spoken dialog systems, correct detection of...

Please sign up or login with your details

Forgot password? Click here to reset