Text-to-Table: A New Way of Information Extraction

by   Xueqing Wu, et al.

We study a new problem setting of information extraction (IE), referred to as text-to-table, which can be viewed as an inverse problem of the well-studied table-to-text. In text-to-table, given a text, one creates a table or several tables expressing the main content of the text, while the model is learned from text-table pair data. The problem setting differs from those of the existing methods for IE. First, the extraction can be carried out from long texts to large tables with complex structures. Second, the extraction is entirely data-driven, and there is no need to explicitly define the schemas. As far as we know, there has been no previous work that studies the problem. In this work, we formalize text-to-table as a sequence-to-sequence (seq2seq) problem. We first employ a seq2seq model fine-tuned from a pre-trained language model to perform the task. We also develop a new method within the seq2seq approach, exploiting two additional techniques in table generation: table constraint and table relation embeddings. We make use of four existing table-to-text datasets in our experiments on text-to-table. Experimental results show that the vanilla seq2seq model can outperform the baseline methods of using relation extraction and named entity extraction. The results also show that our method can further boost the performances of the vanilla seq2seq model. We further discuss the main challenges of the proposed task. The code and data will be made publicly available.


page 1

page 2

page 3

page 4


Table Retrieval May Not Necessitate Table-specific Model Design

Tables are an important form of structured data for both human and machi...

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Tables are widely used in several types of documents since they can brin...

STable: Table Generation Framework for Encoder-Decoder Models

The output structure of database-like tables, consisting of values struc...

Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Although remarkable progress on the neural table-to-text methods has bee...

Table Caption Generation in Scholarly Documents Leveraging Pre-trained Language Models

This paper addresses the problem of generating table captions for schola...

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether language models (LLMs)...

ASET: Ad-hoc Structured Exploration of Text Collections [Extended Abstract]

In this paper, we propose a new system called ASET that allows users to ...

Please sign up or login with your details

Forgot password? Click here to reset