Prompt tuning and adapter tuning have shown great potential in transferr...
Recent vision transformers, large-kernel CNNs and MLPs have attained
rem...
U-Net, known for its simple yet efficient architecture, is widely utiliz...
The explosive growth of rumors with text and images on social media plat...
Video understanding tasks have traditionally been modeled by two separat...
Weakly supervised object localization (WSOL) is a challenging task aimin...
Grounding 3D object affordance seeks to locate objects' ”action
possibil...
This work presents two astonishing findings on neural networks learned f...
Weakly supervised Referring Expression Grounding (REG) aims to ground a
...
Low-light image enhancement is an inherently subjective process whose ta...
The rank of neural networks measures information flowing across layers. ...
Fake news spreads at an unprecedented speed, reaches global audiences an...
Graph neural architecture search has sparked much attention as Graph Neu...
Human can extrapolate well, generalize daily knowledge into unseen scena...
In this paper, we study the problem of stereo matching from a pair of im...
In recent years, creative content generations like style transfer and ne...
MLP-like models built entirely upon multi-layer perceptrons have recentl...
Camouflage is a common visual phenomenon, which refers to hiding the
for...
Non-exemplar class-incremental learning is to recognize both the old and...
Training a generative adversarial network (GAN) with limited data has be...
RGB-infrared person re-identification is an emerging cross-modality
re-i...
Generalizable person re-identification aims to learn a model with only
s...
Unsupervised domain adaptive person re-identification (ReID) has been
ex...
Existing disentangled-based methods for generalizable person
re-identifi...
Graph neural networks (GNNs) have been successfully applied to learning
...
Convolutional neural networks (CNN) are the dominant deep neural network...
We study the problem of localizing audio-visual events that are both aud...
Advanced self-supervised visual representation learning methods rely on ...
An important scenario for image quality assessment (IQA) is to evaluate ...
Occluded person re-identification (ReID) aims to match person images wit...
Detection transformers have recently shown promising object detection re...
Few-shot class-incremental learning is to recognize the new classes give...
Although existing person re-identification (Re-ID) methods have shown
im...
Video-based person re-identification aims to match pedestrians from vide...
Cross-modal video-text retrieval, a challenging task in the field of vis...
Graph neural networks (GNNs) emerged recently as a standard toolkit for
...
Unsupervised Domain Adaptive (UDA) person re-identification (ReID) aims ...
The capability of image semantic segmentation may be deteriorated due to...
This paper presents a self-supervised learning framework, named MGF, for...
Predicting future frames of video sequences is challenging due to the co...
Many unsupervised domain adaptive (UDA) person re-identification (ReID)
...
Video-based person re-identification aims to match a specific pedestrian...
In this paper, we explore the task of generating photo-realistic face im...
Increasing the visibility of nighttime hazy images is challenging becaus...
Generating natural language descriptions for videos, i.e., video caption...
Few-shot segmentation aims at assigning a category label to each image p...
Despite the success in still image recognition, deep neural networks for...
Person re-identification aims at identifying a certain pedestrian across...
Existing dominant approaches for cross-modal video-text retrieval task a...
Active learning is to design label-efficient algorithms by sampling the ...