Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind

06/15/2023
by   Swarnadeep Saha, et al.
9

Large Language Models (LLMs) perform complex reasoning by generating explanations for their predictions. However, a complementary goal of explanations is to also communicate useful knowledge that improves weaker agents. Hence, we investigate whether LLMs also make good teachers for weaker agents. In particular, we consider a student-teacher framework between two LLM agents and study if, when, and how the teacher should intervene with natural language explanations to improve the student's performance. Since communication is expensive, we define a budget such that the teacher only communicates explanations for a fraction of the data, after which the student should perform well on its own. We decompose the teaching problem along four axes: (1) if teacher's test time intervention improve student predictions, (2) when it is worth explaining a data point, (3) how the teacher should personalize explanations to better teach the student, and (4) if teacher explanations also improve student performance on future unexplained data. We first show that teacher LLMs can indeed intervene on student reasoning to improve their performance. Next, we propose a Theory of Mind approach, in which the teacher builds two few-shot mental models of the student. The first model defines an Intervention Function that simulates the utility of an intervention, allowing the teacher to intervene when this utility is the highest and improving student performance at lower budgets. The second model enables the teacher to personalize explanations for a particular student and outperform unpersonalized teachers. We also demonstrate that in multi-turn interactions, teacher explanations generalize and learning from explained data improves student performance on future unexplained data. Finally, we also verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2020

Evaluating Explanations: How much do explanations from the teacher aid students?

While many methods purport to explain predictions by highlighting salien...
research
05/25/2023

Passive learning of active causal strategies in agents and language models

What can be learned about causality and experimentation from passive dat...
research
04/22/2022

Learning to Scaffold: Optimizing Model Explanations for Teaching

Modern machine learning models are opaque, and as a result there is a bu...
research
08/31/2023

Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures

In this work, we study the problem of semantic communication and inferen...
research
01/21/2019

Teaching and learning in uncertainty

We investigate a simple model for social learning with two agents: a tea...
research
05/16/2022

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

How can we test whether state-of-the-art generative models, such as Blen...
research
08/22/2018

Approximation Trees: Statistical Stability in Model Distillation

This paper examines the stability of learned explanations for black-box ...

Please sign up or login with your details

Forgot password? Click here to reset