ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

04/21/2023
by   Archiki Prasad, et al.
4

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them. Most existing methods focus solely on whether the reasoning chain leads to the correct conclusion, but this answer-oriented view may confound the quality of reasoning with other spurious shortcuts to predict the answer. To bridge this gap, we evaluate reasoning chains by viewing them as informal proofs that derive the final answer. Specifically, we propose ReCEval (Reasoning Chain Evaluation), a framework that evaluates reasoning chains through two key properties: (1) correctness, i.e., each step makes a valid inference based on the information contained within the step, preceding steps, and input context, and (2) informativeness, i.e., each step provides new information that is helpful towards deriving the generated answer. We implement ReCEval using natural language inference models and information-theoretic measures. On multiple datasets, ReCEval is highly effective in identifying different types of errors, resulting in notable improvements compared to prior methods. We demonstrate that our informativeness metric captures the expected flow of information in high-quality reasoning chains and we also analyze the impact of previous steps on evaluating correctness and informativeness. Finally, we show that scoring reasoning chains based on ReCEval can improve downstream performance of reasoning tasks. Our code is publicly available at: https://github.com/archiki/ReCEval

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2023

Deductive Verification of Chain-of-Thought Reasoning

Large Language Models (LLMs) significantly benefit from Chain-of-Thought...
research
02/02/2023

Multimodal Chain-of-Thought Reasoning in Language Models

Large language models (LLMs) have shown impressive performance on comple...
research
05/24/2023

Discriminator-Guided Multi-step Reasoning with Language Models

In the context of multi-step reasoning, language models (LMs) probabilit...
research
06/04/2023

Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Chain-of-thought prompting (CoT) and tool augmentation have been validat...
research
05/01/2023

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

We endow Large Language Models (LLMs) with fine-grained self-evaluation ...
research
12/15/2022

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Large language models show improved downstream task performance when pro...
research
12/20/2022

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Remarkable progress has been made on automated reasoning with knowledge ...

Please sign up or login with your details

Forgot password? Click here to reset