UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training

02/14/2023
by   Biao Gong, et al.
0

This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, in-image, in-text, cross-image, cross-text, and image-text. Following this protocol, we collect, from public international news, a large-scale multimodal knowledge graph dataset that consists of 1,388,568 nodes (with 571,791 vision-related ones) and 3,673,817 triplets. The dataset is also annotated with rich event tags, including 96 coarse labels and 9,185 fine labels, expanding its potential usage. To further verify that UKnow can serve as a standard protocol, we set up an efficient pipeline to help reorganize existing datasets under UKnow format. Finally, we benchmark the performance of some widely-used baselines on the tasks of common-sense reasoning and vision-language pre-training. Results on both our new dataset and the reformatted public datasets demonstrate the effectiveness of UKnow in knowledge organization and method evaluation. Code, dataset, conversion tool, and baseline models will be made public.

READ FULL TEXT

page 4

page 5

research
10/17/2022

Contrastive Language-Image Pre-Training with Knowledge Graphs

Recent years have witnessed the fast development of large-scale pre-trai...
research
09/04/2023

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

The pre-training task is indispensable for the text-to-image person re-i...
research
06/10/2023

Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

With the availability of large-scale, comprehensive, and general-purpose...
research
08/31/2023

ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation

Vision-language pre-training (VLP) methods are blossoming recently, and ...
research
06/15/2022

Prefix Language Models are Unified Modal Learners

With the success of vision-language pre-training, we have witnessed the ...
research
06/05/2023

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

In this paper, we introduce a large Multi-Attribute and Language Search ...
research
07/15/2022

Reasoning about Actions over Visual and Linguistic Modalities: A Survey

'Actions' play a vital role in how humans interact with the world and en...

Please sign up or login with your details

Forgot password? Click here to reset