Probing Quantifier Comprehension in Large Language Models

06/12/2023
by   Akshat Gupta, et al.
0

With their increasing size, Large language models (LLMs) are becoming increasingly good at language understanding tasks. But even with high performance on specific downstream task, LLMs fail at simple linguistic tests for negation or quantifier understanding. Previous work on testing capability of LLMs on understanding quantifiers suggest that as the size of the models increase, they get better at understanding most-type quantifiers but get increasingly worse at understanding few-type quantifiers, thus presenting a case of an inverse-scaling law. In this paper, we question the claims of inverse scaling of few-type quantifier understanding in LLMs and show that it is a result of inappropriate testing methodology. We also present alternate methods to measure quantifier comprehension in LLMs and show that as the size of the models increase, these behaviours are different from what is shown in previous research. LLMs are consistently able to understand the difference between the meaning of few-type and most-type quantifiers, but when a quantifier is added to phrase, LLMs do not always take into account the meaning of the quantifier. We in fact see an inverse scaling law for most-type quantifiers, which is contrary to human psycho-linguistic experiments and previous work, where the model's understanding of most-type quantifier gets worse as the model size increases. We do this evaluation on models ranging from 125M-175B parameters, which suggests that LLMs do not do as well as expected with quantifiers and statistical co-occurrence of words still takes precedence over word meaning.

READ FULL TEXT
research
09/26/2022

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Previous work has shown that there exists a scaling law between the size...
research
12/16/2022

'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

Language Models appear to perform poorly on quantification. We ask how b...
research
05/24/2023

The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python

Large Language Models (LLMs) have successfully been applied to code gene...
research
12/04/2022

Understanding How Model Size Affects Few-shot Instruction Prompting

Large Language Models are affected by the phenomena of memorizing and fo...
research
02/23/2023

Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

Advances in computational methods and big data availability have recentl...
research
11/03/2022

Inverse scaling can become U-shaped

Although scaling language models improves performance on a range of task...
research
08/09/2016

A pragmatic theory of generic language

Generalizations about categories are central to human understanding, and...

Please sign up or login with your details

Forgot password? Click here to reset