Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

02/20/2023
by   Yihao Feng, et al.
0

When learning task-oriented dialogue (ToD) agents, reinforcement learning (RL) techniques can naturally be utilized to train dialogue strategies to achieve user-specific goals. Prior works mainly focus on adopting advanced RL techniques to train the ToD agents, while the design of the reward function is not well studied. This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end (E2E) ToD agents. Specifically, we introduce two generalized objectives for reward-function learning, inspired by the classical learning-to-rank literature. Further, we utilize the learned reward function to guide the training of the E2E ToD agent. With the proposed techniques, we achieve competitive results on the E2E response-generation task on the Multiwoz 2.0 dataset. Source code and checkpoints are publicly released at https://github.com/Shentao-YANG/Fantastic_Reward_ICLR2023.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Better Rewards Yield Better Summaries: Learning to Summarise Without References

Reinforcement Learning (RL) based document summarisation systems yield s...
research
02/27/2023

Reward Design with Language Models

Reward design in reinforcement learning (RL) is challenging since specif...
research
01/13/2021

Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function

The impact of user satisfaction in policy learning task-oriented dialogu...
research
09/03/2016

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

This paper proposes KB-InfoBot -- a multi-turn dialogue agent which help...
research
06/16/2017

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Much of human dialogue occurs in semi-cooperative settings, where agents...
research
08/07/2019

Task-Oriented Optimal Sequencing of Visualization Charts

A chart sequence is used to describe a series of visualization charts ge...
research
05/15/2023

What Matters in Reinforcement Learning for Tractography

Recently, deep reinforcement learning (RL) has been proposed to learn th...

Please sign up or login with your details

Forgot password? Click here to reset