DeepAI AI Chat
Log In Sign Up

Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

by   Vittoria Dentella, et al.

Advances in computational methods and big data availability have recently translated into breakthroughs in AI applications. With successes in bottom-up challenges partially overshadowing shortcomings, the 'human-like' performance of Large Language Models has raised the question of how linguistic performance is achieved by algorithms. Given systematic shortcomings in generalization across many AI systems, in this work we ask whether linguistic performance is indeed guided by language knowledge in Large Language Models. To this end, we prompt GPT-3 with a grammaticality judgement task and comprehension questions on less frequent constructions that are thus unlikely to form part of Large Language Models' training data. These included grammatical 'illusions', semantic anomalies, complex nested hierarchies and self-embeddings. GPT-3 failed for every prompt but one, often offering answers that show a critical lack of understanding even of high-frequency words used in these less frequent grammatical constructions. The present work sheds light on the boundaries of the alleged AI human-like linguistic competence and argues that, far from human-like, the next-word prediction abilities of LLMs may face issues of robustness, when pushed beyond training data.


page 1

page 2

page 3

page 4


Integrating Linguistic Theory and Neural Language Models

Transformer-based language models have recently achieved remarkable resu...

A blind spot for large language models: Supradiegetic linguistic information

Large Language Models (LLMs) like ChatGPT reflect profound changes in th...

Probing Quantifier Comprehension in Large Language Models

With their increasing size, Large language models (LLMs) are becoming in...

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

A standard approach to evaluating language models analyzes how models as...

Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge

Prior research has explored the ability of computational models to predi...

Meaning without reference in large language models

The widespread success of large language models (LLMs) has been met with...

Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics

The advancement in generative AI could be boosted with more accessible m...