Learnings from Data Integration for Augmented Language Models

by   Alon Halevy, et al.

One of the limitations of large language models is that they do not have access to up-to-date, proprietary or personal data. As a result, there are multiple efforts to extend language models with techniques for accessing external data. In that sense, LLMs share the vision of data integration systems whose goal is to provide seamless access to a large collection of heterogeneous data sources. While the details and the techniques of LLMs differ greatly from those of data integration, this paper shows that some of the lessons learned from research on data integration can elucidate the research path we are conducting today on language models.


page 1

page 2

page 3

page 4


Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics

Large language models are not detailed models of human linguistic proces...

Reimagining Retrieval Augmented Language Models for Answering Queries

We present a reality check on large language models and inspect the prom...

DTT: An Example-Driven Tabular Transformer by Leveraging Large Language Models

Many organizations rely on data from government and third-party sources,...

Benchmarking Large Language Models in Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a promising approach for mitigat...

Emergent and Predictable Memorization in Large Language Models

Memorization, or the tendency of large language models (LLMs) to output ...

External Language Model Integration for Factorized Neural Transducers

We propose an adaptation method for factorized neural transducers (FNT) ...

Federating and querying heterogeneous and distributed Web APIs and triple stores

Today's international corporations such as BASF, a leading company in th...

Please sign up or login with your details

Forgot password? Click here to reset