Referring video object segmentation (RVOS), as a supervised learning tas...
Pre-trained vision-language models, e.g., CLIP, working with manually
de...
By integrating complementary information from RGB image and depth map, t...
Image-to-text generation aims to describe images using natural language....
Vision-language pre-training (VLP) models have shown vulnerability to
ad...
3D anomaly detection is an emerging and vital computer vision task in
in...
The problem of how to assess cross-modality medical image synthesis has ...
Launchpad is a musical instrument that allows users to create and perfor...
Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (G...
Extremely large-scale Array (ELAA) promises to deliver ultra-high data r...
Recently, Transformers have emerged as the go-to architecture for both v...
Decentralized minimax optimization has been actively studied in the past...
Data augmentation is a promising technique for unsupervised anomaly dete...
The state of the arts in vision-language pretraining (VLP) achieves exem...
Existing audio-visual event localization (AVE) handles manually trimmed
...
Joint video-language learning has received increasing attention in recen...
Triplet learning, i.e. learning from triplet data, has attracted much
at...
Image anomaly detection (IAD) is an emerging and vital computer vision t...
In the area of fewshot anomaly detection (FSAD), efficient visual featur...
The recent rapid development of deep learning has laid a milestone in
in...
In recent years, RGB-T salient object detection (SOD) has attracted
cont...
Hashing that projects data into binary codes has shown extraordinary tal...
Advertisement video editing aims to automatically edit advertising video...
Existing methods for video-based person re-identification (ReID) mainly ...
Despite the recent efforts in accurate 3D annotations in hand and object...
Voice conversion is to generate a new speech with the source content and...
Generic Event Boundary Captioning (GEBC) aims to generate three sentence...
Existing vision-language pre-training (VLP) methods primarily rely on pa...
Invariance to diverse types of image corruption, such as noise, blurring...
This report describes the details of our approach for the event
dense-ca...
Modeling latent variables with priors and hyperpriors is an essential pr...
Recently, the scheme of model-X knockoffs was proposed as a promising
so...
Figure skating scoring is a challenging task because it requires judging...
Pseudo-label-based semi-supervised learning (SSL) has achieved great suc...
Visual sensory anomaly detection (AD) is an essential problem in compute...
The existence of completely aligned and paired multi-modal neuroimaging ...
The existence of completely aligned and paired multi-modal neuroimaging ...
Utilizing the paired multi-modal neuroimaging data has been proved to be...
Semi-supervised learning is a challenging problem which aims to construc...
When deploying person re-identification (ReID) model in safety-critical
...
RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensor...
Annotation burden has become one of the biggest barriers to semantic
seg...
Dense video captioning aims to generate multiple associated captions wit...
A combinatorial cost function for hierarchical clustering was introduced...
Most existing trackers based on deep learning perform tracking in a holi...
Generalized zero-shot learning (GZSL) has achieved significant progress,...
Compared with tedious per-pixel mask annotating, it is much easier to
an...
Semi-supervised learning is a challenging problem which aims to construc...
Recent advances in neuroscience have highlighted the effectiveness of
mu...
Due to limited computational cost and energy consumption, most neural ne...