This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tu...
Large language models (LLMs) have recently demonstrated remarkable
capab...
Large Vision-Language Models (LVLMs) have recently achieved remarkable
s...
Out-of-distribution (OOD) detection aims to detect "unknown" data whose
...
Vision-Language Pre-training (VLP) methods based on object detection enj...
With the rapid evolution of large language models (LLMs), there is a gro...
Document understanding refers to automatically extract, analyze and
comp...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
Existing knowledge-enhanced methods have achieved remarkable results in
...
Knowledge distillation is of key importance to launching multilingual
pr...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
In this paper, we propose a novel self-supervised motion estimator for
L...
In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...
Test-time task adaptation in few-shot learning aims to adapt a pre-train...
To navigate in an environment safely and autonomously, robots must accur...
Nowadays, the Hierarchical Storage System (HSS) is considered as an idea...
Recent years have witnessed a big convergence of language, vision, and
m...
Few-shot classification consists of a training phase where a model is le...
Video-language pre-training has advanced the performance of various
down...
Computing is a critical driving force in the development of human
civili...
CLIP (Contrastive Language-Image Pre-Training) has shown remarkable zero...
Multi-agent exploration of a bounded 3D environment with unknown initial...
Customer reviews usually contain much information about one's online sho...
Although the Conditional Variational AutoEncoder (CVAE) model can genera...
The visual camera is an attractive device in beyond visual line of sight...
Features, logits, and labels are the three primary data when a sample pa...
Although pre-trained language models (PLMs) have achieved state-of-the-a...
Under shared autonomy, wheelchair users expect vehicles to provide safe ...
Scene classification has established itself as a challenging research
pr...
We present the ALTO dataset, a vision-focused dataset for the developmen...
Video-text retrieval has been a crucial and fundamental task in multi-mo...
We present AutoMerge, a LiDAR data processing framework for assembling a...
Sketch-based 3D shape retrieval (SBSR) is an important yet challenging t...
Large-scale pretrained foundation models have been an emerging paradigm ...
For long-term autonomy, most place recognition methods are mainly evalua...
Text classification struggles to generalize to unseen classes with very ...
Contrastive learning (CL) has become a ubiquitous approach for several
n...
Visual grounding focuses on establishing fine-grained alignment between
...
Cache plays an important role to maintain high and stable performance (i...
Real-time semantic segmentation, which aims to achieve high segmentation...
The Visual Question Answering (VQA) task utilizes both visual image and
...
Autonomous Exploration Development Environment is an open-source reposit...
We present our work on a fast route planner based on visibility graph. T...
Knowledge enhanced pre-trained language models (K-PLMs) are shown to be
...
Live streaming is becoming an increasingly popular trend of sales in
E-c...
Existing data-driven methods can well handle short text generation. Howe...
Many generation tasks follow a one-to-many mapping relationship: each in...
Vision-and-language pretraining (VLP) aims to learn generic multimodal
r...
We present a method for localizing a single camera with respect to a poi...
A number of studies point out that current Visual Question Answering (VQ...