Multi-modal pre-training and knowledge discovery are two important resea...
In this paper, we propose a Multi-stage Vision-language Pre-TRaining (MV...
Matching model is essential for Image-Text Retrieval framework. Existing...
Existing research for image text retrieval mainly relies on sentence-lev...
Existing research for image captioning usually represents an image using...
In this paper, we focus on the problem of unsupervised image-sentence
ma...
Recognizing text in the wild is a really challenging task because of com...