C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

06/27/2023
by   Liliang Ren, et al.
0

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 60.5 metric. Our code is publicly available at https://github.com/renll/C-PMI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

DynaEval: Unifying Turn and Dialogue Level Evaluation

A dialogue is essentially a multi-turn interaction among interlocutors. ...
research
10/25/2022

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

Recent model-based reference-free metrics for open-domain dialogue evalu...
research
04/09/2020

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

Non-task oriented dialogue systems have achieved great success in recent...
research
04/10/2020

Designing Precise and Robust Dialogue Response Evaluators

Automatic dialogue response evaluator has been proposed as an alternativ...
research
04/06/2020

PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems

Open-domain generative dialogue systems have attracted considerable atte...
research
04/07/2022

Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances

Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Ac...
research
05/26/2023

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information

The long-standing one-to-many issue of the open-domain dialogues poses s...

Please sign up or login with your details

Forgot password? Click here to reset