Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

by   Qingyu Tan, et al.
National University of Singapore
Alibaba Group

Reasoning about time is of fundamental importance. Many facts are time-dependent. For example, athletes change teams from time to time, and different government officials are elected periodically. Previous time-dependent question answering (QA) datasets tend to be biased in either their coverage of time spans or question types. In this paper, we introduce a comprehensive probing dataset to evaluate the temporal reasoning capability of large language models. Our dataset includes questions of three temporal reasoning levels. In addition, we also propose a novel learning framework to improve the temporal reasoning capability of large language models, based on temporal span extraction and time-sensitive reinforcement learning. We conducted experiments in closed book QA, open book QA, and reasoning QA settings and demonstrated the effectiveness of our approach. Our code and data are released on


page 1

page 2

page 3

page 4


A Dataset for Answering Time-Sensitive Questions

Time is an important dimension in our physical world. Lots of facts can ...

Careful Selection of Knowledge to solve Open Book Question Answering

Open book question answering is a type of natural language based QA (NLQ...

Narrative Question Answering with Cutting-Edge Open-Domain QA Techniques: A Comprehensive Study

Recent advancements in open-domain question answering (ODQA), i.e., find...

Benchmarking Large Language Models on CMExam – A Comprehensive Chinese Medical Exam Dataset

Recent advancements in large language models (LLMs) have transformed the...

Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering

We propose a simple yet effective strategy to incorporate event knowledg...

Complex QA and language models hybrid architectures, Survey

This paper reviews the state-of-the-art of language models architectures...

Mitigating Temporal Misalignment by Discarding Outdated Facts

While large language models are able to retain vast amounts of world kno...

Please sign up or login with your details

Forgot password? Click here to reset