Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena

09/22/2017
by   Igor Shalyminov, et al.
0

Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenomena. The question then arises of how well systems trained on such clean data generalise to real spontaneous dialogue, or indeed whether they are trainable at all on naturally occurring dialogue data. To answer this question, we created a new corpus called bAbI+ by systematically adding natural spontaneous incremental dialogue phenomena such as restarts and self-corrections to the Facebook AI Research's bAbI dialogues dataset. We then explore the performance of a state-of-the-art retrieval model, MemN2N, on this more natural dataset. Results show that the semantic accuracy of the MemN2N model drops drastically; and that although it is in principle able to learn to process the constructions in bAbI+, it needs an impractical amount of training data to do so. Finally, we go on to show that an incremental, semantic parser -- DyLan -- shows 100 bAbI and bAbI+, highlighting the generalisation properties of linguistically informed dialogue models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2017

Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars

We investigate an end-to-end method for automatically inducing task-base...
research
10/08/2018

Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems

Spontaneous spoken dialogue is often disfluent, containing pauses, hesit...
research
12/01/2016

Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data

We present a method for inducing new dialogue systems from very small am...
research
09/29/2017

The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

We motivate and describe a new freely available human-human dialogue dat...
research
08/28/2018

Analysing the potential of seq-to-seq models for incremental interpretation in task-oriented dialogue

We investigate how encoder-decoder models trained on a synthetic dataset...
research
06/09/2023

I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

A simile is a figure of speech that compares two different things (calle...
research
01/09/2018

Denotation Extraction for Interactive Learning in Dialogue Systems

This paper presents a novel task using real user data obtained in human-...

Please sign up or login with your details

Forgot password? Click here to reset