Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation From Deductive, Inductive and Abductive Views

by   Fangzhi Xu, et al.

Large Language Models (LLMs) have achieved great success in various natural language tasks. It has aroused much interest in evaluating the specific reasoning capability of LLMs, such as multilingual reasoning and mathematical reasoning. However, as one of the key reasoning perspectives, logical reasoning capability has not yet been thoroughly evaluated. In this work, we aim to bridge those gaps and provide comprehensive evaluations. Firstly, to offer systematic evaluations, this paper selects fifteen typical logical reasoning datasets and organizes them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include three representative LLMs (i.e., text-davinci-003, ChatGPT and BARD) and evaluate them on all selected datasets under zero-shot, one-shot and three-shot settings. Secondly, different from previous evaluations relying only on simple metrics (e.g., accuracy), we propose fine-level evaluations from objective and subjective manners, covering both answers and explanations. Also, to uncover the logical flaws of LLMs, bad cases will be attributed to five error types from two dimensions. Thirdly, to avoid the influences of knowledge bias and purely focus on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. It contains 3K samples and covers deductive, inductive and abductive reasoning settings. Based on the in-depth evaluations, this paper finally concludes the ability maps of logical reasoning capability from six dimensions (i.e., correct, rigorous, self-aware, active, oriented and no hallucination). It reflects the pros and cons of LLMs and gives guiding directions for future works.


page 4

page 7

page 13


Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Logical reasoning task involves diverse types of complex reasoning over ...

Logical Tasks for Measuring Extrapolation and Rule Comprehension

Logical reasoning is essential in a variety of human activities. A repre...

Zero-Shot Classification by Logical Reasoning on Natural Language Explanations

Humans can classify an unseen category by reasoning on its language expl...

Logical Fallacy Detection

Reasoning is central to human intelligence. However, fallacious argument...

MathAttack: Attacking Large Language Models Towards Math Solving Ability

With the boom of Large Language Models (LLMs), the research of solving M...

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

In this paper, we conduct a thorough investigation into the reasoning ca...

Case-Based Reasoning with Language Models for Classification of Logical Fallacies

The ease and the speed of spreading misinformation and propaganda on the...

Please sign up or login with your details

Forgot password? Click here to reset