Generative Multimodal Entity Linking

06/22/2023
by   Senbao Shi, et al.
0

Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base (e.g., Wikipedia). Prior MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters, which can be prohibitively costly and difficult to scale in the era of Large Language Models (LLMs). In this work, we propose GEMEL, a simple yet effective Generative Multimodal Entity Linking method, which leverages the capabilities of LLMs from large-scale pre-training to directly generate target entity names. We keep the vision and language model frozen and only train a linear layer to enable cross-modality interactions. To adapt LLMs to the MEL task, we take advantage of the emerging in-context learning (ICL) capability of LLMs by retrieving multimodal instances as demonstrations. Extensive experiments show that with only  0.3 state-of-the-art results on two well-established MEL datasets (4.1 gains on WikiDiverse and 15.4 compatible with any off-the-shelf language model, paving the way towards an efficient and general solution for utilizing LLMs in the MEL task.

READ FULL TEXT

page 1

page 2

page 4

research
04/11/2022

Generative Biomedical Entity Linking via Knowledge Base-Guided Pre-training and Synonyms-Aware Fine-tuning

Entities lie in the heart of biomedical natural language understanding, ...
research
05/27/2023

Benchmarking Diverse-Modal Entity Linking with Generative Models

Entities can be expressed in diverse formats, such as texts, images, or ...
research
05/24/2023

AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes

We propose attribute-aware multimodal entity linking, where the input is...
research
11/05/2020

Entity Linking in 100 Languages

We propose a new formulation for multilingual entity linking, where lang...
research
02/08/2023

Prompting for Multimodal Hateful Meme Classification

Hateful meme classification is a challenging multimodal task that requir...
research
09/04/2019

Learning Dynamic Context Augmentation for Global Entity Linking

Despite of the recent success of collective entity linking (EL) methods,...
research
08/19/2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

In the era of Large Language Models (LLMs), tremendous strides have been...

Please sign up or login with your details

Forgot password? Click here to reset