A Symbolic Framework for Systematic Evaluation of Mathematical Reasoning with Transformers

05/21/2023
by   Jordan Meadows, et al.
0

Whether Transformers can learn to apply symbolic rules and generalise to out-of-distribution examples is an open research question. In this paper, we devise a data generation method for producing intricate mathematical derivations, and systematically perturb them with respect to syntax, structure, and semantics. Our task-agnostic approach generates equations, annotations, and inter-equation dependencies, employing symbolic algebra for scalable data production and augmentation. We then instantiate a general experimental framework on next-equation prediction, assessing systematic mathematical reasoning and generalisation of Transformer encoders on a total of 200K examples. The experiments reveal that perturbations heavily affect performance and can reduce F1 scores of 97% to below 17%, suggesting that inference is dominated by surface-level patterns unrelated to a deeper understanding of mathematical operators. These findings underscore the importance of rigorous, large-scale evaluation frameworks for revealing fundamental limitations of existing models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro