Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

02/23/2023
by   Vittoria Dentella, et al.
0

Advances in computational methods and big data availability have recently translated into breakthroughs in AI applications. With successes in bottom-up challenges partially overshadowing shortcomings, the 'human-like' performance of Large Language Models has raised the question of how linguistic performance is achieved by algorithms. Given systematic shortcomings in generalization across many AI systems, in this work we ask whether linguistic performance is indeed guided by language knowledge in Large Language Models. To this end, we prompt GPT-3 with a grammaticality judgement task and comprehension questions on less frequent constructions that are thus unlikely to form part of Large Language Models' training data. These included grammatical 'illusions', semantic anomalies, complex nested hierarchies and self-embeddings. GPT-3 failed for every prompt but one, often offering answers that show a critical lack of understanding even of high-frequency words used in these less frequent grammatical constructions. The present work sheds light on the boundaries of the alleged AI human-like linguistic competence and argues that, far from human-like, the next-word prediction abilities of LLMs may face issues of robustness, when pushed beyond training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

Integrating Linguistic Theory and Neural Language Models

Transformer-based language models have recently achieved remarkable resu...
research
06/11/2023

A blind spot for large language models: Supradiegetic linguistic information

Large Language Models (LLMs) like ChatGPT reflect profound changes in th...
research
06/12/2023

Probing Quantifier Comprehension in Large Language Models

With their increasing size, Large language models (LLMs) are becoming in...
research
05/01/2020

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

A standard approach to evaluating language models analyzes how models as...
research
07/22/2021

Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge

Prior research has explored the ability of computational models to predi...
research
08/05/2022

Meaning without reference in large language models

The widespread success of large language models (LLMs) has been met with...
research
07/04/2023

Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics

The advancement in generative AI could be boosted with more accessible m...

Please sign up or login with your details

Forgot password? Click here to reset