Dual Semantic Knowledge Composed Multimodal Dialog Systems

by   Xiaolin Chen, et al.

Textual response generation is an essential task for multimodal task-oriented dialog systems.Although existing studies have achieved fruitful progress, they still suffer from two critical limitations: 1) focusing on the attribute knowledge but ignoring the relation knowledge that can reveal the correlations between different entities and hence promote the response generation, and 2) only conducting the cross-entropy loss based output-level supervision but lacking the representation-level regularization. To address these limitations, we devise a novel multimodal task-oriented dialog system (named MDS-S2). Specifically, MDS-S2 first simultaneously acquires the context related attribute and relation knowledge from the knowledge base, whereby the non-intuitive relation knowledge is extracted by the n-hop graph walk. Thereafter, considering that the attribute knowledge and relation knowledge can benefit the responding to different levels of questions, we design a multi-level knowledge composition module in MDS-S2 to obtain the latent composed response representation. Moreover, we devise a set of latent query variables to distill the semantic information from the composed response representation and the ground truth response representation, respectively, and thus conduct the representation-level semantic regularization. Extensive experiments on a public dataset have verified the superiority of our proposed MDS-S2. We have released the codes and parameters to facilitate the research community.


page 3

page 8


Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Text response generation for multimodal task-oriented dialog systems, wh...

Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Retrieving proper domain knowledge from an external database lies at the...

Multi-level Memory for Task Oriented Dialogs

Recent end-to-end task oriented dialog systems use memory architectures ...

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Natural Language Generation (NLG) represents a large collection of tasks...

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, whi...

Autoregressive Entity Generation for End-to-End Task-Oriented Dialog

Task-oriented dialog (TOD) systems often require interaction with an ext...

Please sign up or login with your details

Forgot password? Click here to reset