Recent studies have shown that dense retrieval models, lacking dedicated...
Multimodal Large Language Models (MLLMs) have recently sparked significa...
Whole slide image (WSI) classification is an essential task in computati...
Deep learning (DL) has proven highly effective for ultrasound-based
comp...
This paper presents Sim-Suction, a robust object-aware suction grasp pol...
The multi-scale information among the whole slide images (WSIs) is essen...
Visual relation extraction (VRE) aims to extract relations between entit...
Recent text-to-image generation models have shown promising results in
g...
We present an end-to-end diffusion-based method for editing videos with ...
In this paper, we present Sim-MEES: a large-scale synthetic dataset that...
Fast and accurate MRI reconstruction is a key concern in modern clinical...
Transformer-based image denoising methods have achieved encouraging resu...
Stereo image super-resolution aims to boost the performance of image
sup...
Scene Graph Generation (SGG) aims to extract <subject, predicate, object...
Prompt tuning is a parameter-efficient method, which freezes all PLM
par...
Prompt tuning, a recently emerging paradigm, enables the powerful
vision...
In the past decade, convolutional neural networks (CNNs) have shown
prom...
Temporal grounding is the task of locating a specific segment from an
un...
Recently, great progress has been made in single-image super-resolution
...
Most of the work in auction design literature assumes that bidders behav...
Large-scale vision-language pre-training has shown impressive advances i...
Understanding human emotions is a crucial ability for intelligent robots...
This paper studies a simple extension of image-based Masked Autoencoders...
Image restoration under severe weather is a challenging task. Most of th...
Content-Based Image Retrieval (CIR) aims to search for a target image by...
With the development of deep learning, single image super-resolution (SI...
Single image denoising (SID) has achieved significant breakthroughs with...
Single-image super-resolution (SISR) has achieved significant breakthrou...
In this paper, we summarize the 1st NTIRE challenge on stereo image
supe...
Recently, deep convolution neural networks (CNNs) steered face
super-res...
Temporal grounding in videos aims to localize one target video segment t...
Convolutional neural networks based single-image super-resolution (SISR)...
Text-based image captioning (TextCap) requires simultaneous comprehensio...
Single-image super-resolution (SISR) is an important task in image
proce...
Real-time semantic segmentation, which can be visually understood as the...
Single image super-resolution task has witnessed great strides with the
...
Single image deraining is important for many high-level computer vision ...
Autonomous ground vehicles (AGVs) are receiving increasing attention, an...
Video-and-Language Inference is a recently proposed task for joint
video...
Under stereo settings, the problem of image super-resolution (SR) and
di...
Recently, the single image super-resolution (SISR) approaches with deep ...
Single image dehazing is a challenging ill-posed problem that has drawn
...
Crowd counting is an important task that shown great application value i...
In this paper, we present a novel approach to efficiently generate
colli...
In the problem of learning disentangled representations, one of the prom...
Convolutional neural networks have been proven to be of great benefit fo...
Visual Storytelling (VIST) is a task to tell a narrative story about a
c...
Pursuing realistic results according to human visual perception is the
c...
Planning a motion for inserting pegs remains an open problem. The diffic...
Multilingual models can improve language processing, particularly for lo...