Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset

by   Saeid Naeini, et al.

The quest for human imitative AI has been an enduring topic in AI research since its inception. The technical evolution and emerging capabilities of the latest cohort of large language models (LLMs) have reinvigorated the subject beyond academia to the cultural zeitgeist. While recent NLP evaluation benchmark tasks test some aspects of human-imitative behaviour (e.g., BIG-bench's 'human-like behavior' tasks), few, if not none, examine creative problem solving abilities. Creative problem solving in humans is a well-studied topic in cognitive neuroscience with standardized tests that predominantly use the ability to associate (heterogeneous) connections among clue words as a metric for creativity. Exposure to misleading stimuli - distractors dubbed red herrings - impede human performance in such tasks via the fixation effect and Einstellung paradigm. In cognitive neuroscience studies, such fixations are experimentally induced by pre-exposing participants to orthographically similar incorrect words to subsequent word-fragments or clues. The popular British quiz show Only Connect's Connecting Wall segment essentially mimics Mednick's Remote Associates Test (RAT) formulation with built-in, deliberate red herrings, which makes it an ideal proxy dataset to explore and study fixation effect and Einstellung paradigm from cognitive neuroscience in LLMs. In addition to presenting the novel Only Connect Wall (OCW) dataset, we also report results from our evaluation of selected pre-trained language models and LLMs (including OpenAI's GPT series) on creative problem solving tasks like grouping clue words by heterogeneous connections, and identifying correct open knowledge domain connections in respective groups. The code and link to the dataset are available at


Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Deploying Large language models (LLMs) can pose hazards from harmful out...

In-Context Analogical Reasoning with Pre-Trained Language Models

Analogical reasoning is a fundamental capacity of human cognition that a...

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Human intelligence thrives on the concept of cognitive synergy, where co...

Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective

Large language models (LLMs), like ChatGPT, have shown some human-like c...

An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models

Large-scale Pre-Trained Language Models (PTLMs) capture knowledge from m...

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Working memory (WM), a fundamental cognitive process facilitating the te...

The Two Word Test: A Semantic Benchmark for Large Language Models

Large Language Models (LLMs) have shown remarkable abilities recently, i...

Please sign up or login with your details

Forgot password? Click here to reset