Unified Text Structuralization with Instruction-tuned Language Models

03/27/2023
by   Xuanfan Ni, et al.
5

Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of manually annotated high-quality datasets from different domains and languages, which require specialized professional knowledge. In addition, most IE methods are designed for a specific type of structured data, e.g., entities, relations, and events, making them hard to generalize to others. In this work, we propose a simple and efficient approach to instruct large language model (LLM) to extract a variety of structures from texts. More concretely, we add a prefix and a suffix instruction to indicate the desired IE task and structure type, respectively, before feeding the text into a LLM. Experiments on two LLMs show that this approach can enable language models to perform comparable with other state-of-the-art methods on datasets of a variety of languages and knowledge, and can generalize to other IE sub-tasks via changing the content of instruction. Another benefit of our approach is that it can help researchers to build datasets in low-source and domain-specific scenarios, e.g., fields in finance and law, with low cost.

READ FULL TEXT
research
07/12/2023

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Large language models typically undergo two training stages, pretraining...
research
05/25/2023

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Large Language Models (LLMs) have demonstrated impressive zero-shot capa...
research
05/21/2023

Retrieving Texts based on Abstract Descriptions

In this work, we aim to connect two research areas: instruction models a...
research
05/29/2021

Constructing Flow Graphs from Procedural Cybersecurity Texts

Following procedural texts written in natural languages is challenging. ...
research
04/19/2023

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

A long standing goal of the data management community is to develop gene...
research
05/24/2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverage...
research
10/21/2022

Life is a Circus and We are the Clowns: Automatically Finding Analogies between Situations and Processes

Analogy-making gives rise to reasoning, abstraction, flexible categoriza...

Please sign up or login with your details

Forgot password? Click here to reset