Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

10/08/2020
by   Peter Hase, et al.
7

Data collection for natural language (NL) understanding tasks has increasingly included human explanations alongside data points, allowing past works to introduce models that both perform a task and generate NL explanations for their outputs. Yet to date, model-generated explanations have been evaluated on the basis of surface-level similarities to human explanations, both through automatic metrics like BLEU and human evaluations. We argue that these evaluations are insufficient, since they fail to indicate whether explanations support actual model behavior (faithfulness), rather than simply match what a human would say (plausibility). In this work, we address the problem of evaluating explanations from the model simulatability perspective. Our contributions are as follows: (1) We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations, which measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output. We use a model as a proxy for a human observer, and validate this choice with two human subject experiments. (2) Using the CoS-E and e-SNLI datasets, we evaluate two existing generative graphical models and two new approaches; one rationalizing method we introduce achieves roughly human-level LAS scores. (3) Lastly, we frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage, which can improve LAS scores. We provide code for the experiments in this paper at https://github.com/peterbhase/LAS-NL-Explanations

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI

Although neural models have shown strong performance in datasets such as...
research
05/22/2023

MaNtLE: Model-agnostic Natural Language Explainer

Understanding the internal reasoning behind the predictions of machine l...
research
05/08/2021

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

Recently, an increasing number of works have introduced models capable o...
research
05/28/2023

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Recent research has focused on using large language models (LLMs) to gen...
research
12/24/2020

To what extent do human explanations of model behavior align with actual model behavior?

Given the increasingly prominent role NLP models (will) play in our live...
research
05/04/2020

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Algorithmic approaches to interpreting machine learning models have prol...
research
02/03/2021

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

Many methods now exist for conditioning model outputs on task instructio...

Please sign up or login with your details

Forgot password? Click here to reset