Image-Text Retrieval with Binary and Continuous Label Supervision

10/20/2022
by   Zheng Li, et al.
0

Most image-text retrieval work adopts binary labels indicating whether a pair of image and text matches or not. Such a binary indicator covers only a limited subset of image-text semantic relations, which is insufficient to represent relevance degrees between images and texts described by continuous labels such as image captions. The visual-semantic embedding space obtained by learning binary labels is incoherent and cannot fully characterize the relevance degrees. In addition to the use of binary labels, this paper further incorporates continuous pseudo labels (generally approximated by text similarity between captions) to indicate the relevance degrees. To learn a coherent embedding space, we propose an image-text retrieval framework with Binary and Continuous Label Supervision (BCLS), where binary labels are used to guide the retrieval model to learn limited binary correlations, and continuous labels are complementary to the learning of image-text semantic relations. For the learning of binary labels, we improve the common Triplet ranking loss with Soft Negative mining (Triplet-SN) to improve convergence. For the learning of continuous labels, we design Kendall ranking loss inspired by Kendall rank correlation coefficient (Kendall), which improves the correlation between the similarity scores predicted by the retrieval model and the continuous labels. To mitigate the noise introduced by the continuous pseudo labels, we further design Sliding Window sampling and Hard Sample mining strategy (SW-HS) to alleviate the impact of noise and reduce the complexity of our framework to the same order of magnitude as the triplet ranking loss. Extensive experiments on two image-text retrieval benchmarks demonstrate that our method can improve the performance of state-of-the-art image-text retrieval models.

READ FULL TEXT

page 1

page 11

research
04/21/2019

Deep Metric Learning Beyond Binary Supervision

Metric Learning for visual similarity has mostly adopted binary supervis...
research
11/18/2019

Ladder Loss for Coherent Visual-Semantic Embedding

For visual-semantic embedding, the existing methods normally treat the r...
research
09/28/2022

Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval

There are two popular loss functions used for vision-language retrieval,...
research
05/26/2023

Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval

Image-Text Retrieval (ITR) is essentially a ranking problem. Given a que...
research
05/30/2018

Collaborative Human-AI (CHAI): Evidence-Based Interpretable Melanoma Classification in Dermoscopic Images

Automated dermoscopic image analysis has witnessed rapid growth in diagn...
research
03/21/2023

Data-efficient Large Scale Place Recognition with Graded Similarity Supervision

Visual place recognition (VPR) is a fundamental task of computer vision ...
research
02/14/2022

Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Visual place recognition (VPR) is a key issue for robotics and autonomou...

Please sign up or login with your details

Forgot password? Click here to reset