Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

06/11/2021
by   Ludan Ruan, et al.
0

Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which limits the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to boost the whole system performance. For the caption generation, we propose a Unified Multi-modal Pre-training Model (UMPM) to generate event descriptions with rich objects for better localization. For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful. Our overall system achieves the state-of-the-art performances on both sub-tasks in Entities Object Localization challenge at Activitynet 2021, with 72.57 localization accuracy on the testing set of sub-task I and 0.2477 F1_all_per_sent on the hidden testing set of sub-task II.

READ FULL TEXT

page 3

page 5

research
06/12/2022

GLIPv2: Unifying Localization and Vision-Language Understanding

We present GLIPv2, a grounded VL understanding model, that serves both l...
research
09/08/2023

Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

3D visual grounding is the task of localizing the object in a 3D scene w...
research
11/05/2019

Contextual Grounding of Natural Language Entities in Images

In this paper, we introduce a contextual grounding approach that capture...
research
03/31/2022

FindIt: Generalized Localization with Natural Language Queries

We propose FindIt, a simple and versatile framework that unifies a varie...
research
05/09/2018

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

Visual grounding aims to localize an object in an image referred to by a...
research
05/26/2018

Using Syntax to Ground Referring Expressions in Natural Images

We introduce GroundNet, a neural network for referring expression recogn...
research
02/23/2022

Localizing Small Apples in Complex Apple Orchard Environments

The localization of fruits is an essential first step in automated agric...

Please sign up or login with your details

Forgot password? Click here to reset