Probing Quantifier Comprehension in Large Language Models

by   Akshat Gupta, et al.
J.P. Morgan

With their increasing size, Large language models (LLMs) are becoming increasingly good at language understanding tasks. But even with high performance on specific downstream task, LLMs fail at simple linguistic tests for negation or quantifier understanding. Previous work on testing capability of LLMs on understanding quantifiers suggest that as the size of the models increase, they get better at understanding most-type quantifiers but get increasingly worse at understanding few-type quantifiers, thus presenting a case of an inverse-scaling law. In this paper, we question the claims of inverse scaling of few-type quantifier understanding in LLMs and show that it is a result of inappropriate testing methodology. We also present alternate methods to measure quantifier comprehension in LLMs and show that as the size of the models increase, these behaviours are different from what is shown in previous research. LLMs are consistently able to understand the difference between the meaning of few-type and most-type quantifiers, but when a quantifier is added to phrase, LLMs do not always take into account the meaning of the quantifier. We in fact see an inverse scaling law for most-type quantifiers, which is contrary to human psycho-linguistic experiments and previous work, where the model's understanding of most-type quantifier gets worse as the model size increases. We do this evaluation on models ranging from 125M-175B parameters, which suggests that LLMs do not do as well as expected with quantifiers and statistical co-occurrence of words still takes precedence over word meaning.


Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Previous work has shown that there exists a scaling law between the size...

Understanding How Model Size Affects Few-shot Instruction Prompting

Large Language Models are affected by the phenomena of memorizing and fo...

Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

Advances in computational methods and big data availability have recentl...

Inverse scaling can become U-shaped

Although scaling language models improves performance on a range of task...

A pragmatic theory of generic language

Generalizations about categories are central to human understanding, and...

Please sign up or login with your details

Forgot password? Click here to reset