Recent studies have shown that dense retrieval models, lacking dedicated...
The rising demand for creating lifelike avatars in the digital realm has...
Multimodal Large Language Models (MLLMs) have recently sparked significa...
In this work, we investigate the problem of out-of-distribution (OOD)
ge...
Visual relation extraction (VRE) aims to extract relations between entit...
Recent text-to-image generation models have shown promising results in
g...
We present an end-to-end diffusion-based method for editing videos with ...
This paper discusses the feasibility of continuously training the CLIP m...
We present : a question generation framework
with controllable comprehen...
Conventional multi-label classification (MLC) methods assume that all sa...
Scene Graph Generation (SGG) aims to extract <subject, predicate, object...
Prompt tuning is a parameter-efficient method, which freezes all PLM
par...
Dynamic early exiting has been proven to improve the inference speed of ...
Prompt tuning, a recently emerging paradigm, enables the powerful
vision...
Albeit having gained significant progress lately, large-scale graph
repr...
Generative transformers have shown their superiority in synthesizing
hig...
Recently, much exertion has been paid to design graph self-supervised me...
Temporal grounding is the task of locating a specific segment from an
un...
Novel category discovery aims at adapting models trained on known catego...
Logical rules, both transferable and explainable, are widely used as wea...
Predicting the impact of publications in science and technology has beco...
Large-scale vision-language pre-training has shown impressive advances i...
Understanding human emotions is a crucial ability for intelligent robots...
Content-Based Image Retrieval (CIR) aims to search for a target image by...
While annotating decent amounts of data to satisfy sophisticated learnin...
Recent years have seen a surge of interest in meta-learning techniques f...
Existing metrics for assessing question generation not only require cost...
Graph contrastive learning has gained significant progress recently. How...
Temporal grounding in videos aims to localize one target video segment t...
Machine Reading Comprehension (MRC) reveals the ability to understand a ...
Training deep models for RGB-D salient object detection (SOD) often requ...
The contemporary visual captioning models frequently hallucinate objects...
Grounded video description (GVD) encourages captioning models to attend ...
Existing Class Incremental Learning (CIL) methods are based on a supervi...
This paper investigates the feasibility of learning good representation ...
It is a consensus that small models perform quite poorly under the parad...
The challenge of the Class Incremental Learning (CIL) lies in difficulty...
Video-and-Language Inference is a recently proposed task for joint
video...
The journey of reducing noise from distant supervision (DS) generated
tr...
With recent advances in distantly supervised (DS) relation extraction (R...
In many real-world games, such as traders repeatedly bargaining with
cus...
The recent emerged weakly supervised object localization (WSOL) methods ...
In this paper, we propose a novel graph learning framework for phrase
gr...
Recently, a newly proposed self-supervised framework Bootstrap Your Own
...
In this paper, we investigate the problem of text-to-pedestrian synthesi...
When patients need to take medicine, particularly taking more than one k...
Visual Storytelling (VIST) is a task to tell a narrative story about a
c...
Infectious keratitis is the most common entities of corneal diseases, in...
This paper reviews the NTIRE 2020 challenge on real image denoising with...
Visualization-oriented natural language interfaces (V-NLIs) have been
ex...