Generating Mathematical Derivations with Large Language Models

07/19/2023
by   Jordan Meadows, et al.
0

The derivation of mathematical results in specialised fields using Large Language Models (LLMs) is an emerging research direction that can help identify models' limitations, and potentially support mathematical discovery. In this paper, we leverage a symbolic engine to generate derivations of equations at scale, and investigate the capabilities of LLMs when deriving goal equations from premises. Specifically, we employ in-context learning for GPT and fine-tune a range of T5 models to compare the robustness and generalisation of pre-training strategies to specialised models. Empirical results show that fine-tuned FLAN-T5-large (MathT5) outperforms GPT models on all static and out-of-distribution test sets in terms of absolute performance. However, an in-depth analysis reveals that the fine-tuned models are more sensitive to perturbations involving unseen symbols and (to a lesser extent) changes to equation structure. In addition, we analyse 1.7K equations and over 200 derivations to highlight common reasoning errors such as the inclusion of incorrect, irrelevant, and redundant equations, along with the tendency to skip derivation steps. Finally, we explore the suitability of existing metrics for evaluating mathematical derivations finding evidence that, while they capture general properties such as sensitivity to perturbations, they fail to highlight fine-grained reasoning errors and essential differences between models. Overall, this work demonstrates that training models on synthetic data can improve their mathematical capabilities beyond larger architectures.

READ FULL TEXT
research
06/03/2021

Fingerprinting Fine-tuned Language Models in the Wild

There are concerns that the ability of language models (LMs) to generate...
research
09/21/2023

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Large language models (LLMs) have pushed the limits of natural language ...
research
07/26/2023

This is not correct! Negation-aware Evaluation of Language Generation Systems

Large language models underestimate the impact of negations on how much ...
research
05/21/2023

A Symbolic Framework for Systematic Evaluation of Mathematical Reasoning with Transformers

Whether Transformers can learn to apply symbolic rules and generalise to...
research
05/04/2020

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Large-scale pretrained language models are the major driving force behin...
research
11/03/2022

Fine-Tuning Language Models via Epistemic Neural Networks

Large language models are now part of a powerful new paradigm in machine...

Please sign up or login with your details

Forgot password? Click here to reset