Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

by   Noam Wies, et al.

The field of Natural Language Processing (NLP) has experienced a dramatic leap in capabilities with the recent introduction of huge Language Models (LMs). Despite this success, natural language problems that involve several compounded steps are still practically unlearnable, even by the largest LMs. This complies with experimental failures for end-to-end learning of composite problems that were demonstrated in a variety of domains. A known mitigation is to introduce intermediate supervision for solving sub-tasks of the compounded problem. Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks' labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. We show that when concatenating intermediate supervision to the input and training a sequence-to-sequence model on this modified input, an unlearnable composite problem becomes learnable. We prove this for the notoriously unlearnable composite task of bit-subset parity, with the intermediate supervision being parity results of increasingly large bit-subsets. Beyond motivating contemporary empirical efforts for incorporating intermediate supervision in sequence-to-sequence language models, our positive theoretical result is the first of its kind in the landscape of results on the benefits of intermediate supervision: Until now, all theoretical results on the subject are negative, i.e., show cases where learning is impossible without intermediate supervision, while our result is positive, showing a case where learning is facilitated in the presence of intermediate supervision.


page 1

page 2

page 3

page 4


Generating Intermediate Steps for NLI with Next-Step Supervision

The Natural Language Inference (NLI) task often requires reasoning over ...

Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap

We present Text-to-OverpassQL, a task designed to facilitate a natural l...

String Transduction with Target Language Models and Insertion Handling

Many character-level tasks can be framed as sequence-to-sequence transdu...

Semantic Parsing Natural Language into Relational Algebra

Natural interface to database (NLIDB) has been researched a lot during t...

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Scaling large language models (LLMs) leads to an emergent capacity to le...

PDPP:Projected Diffusion for Procedure Planning in Instructional Videos

In this paper, we study the problem of procedure planning in instruction...

SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences

Distilling supervision signal from a long sequence to make predictions i...

Please sign up or login with your details

Forgot password? Click here to reset