Pre-trained vision-language models, e.g., CLIP, working with manually
de...
Image-to-text generation aims to describe images using natural language....
Vision-language pre-training (VLP) models have shown vulnerability to
ad...
Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (G...
Foundation models have achieved great advances in multi-task learning wi...
The state of the arts in vision-language pretraining (VLP) achieves exem...
Existing audio-visual event localization (AVE) handles manually trimmed
...
Joint video-language learning has received increasing attention in recen...
Scene graph generation (SGG) is a sophisticated task that suffers from b...
Advertisement video editing aims to automatically edit advertising video...
Generic Event Boundary Captioning (GEBC) aims to generate three sentence...
Existing vision-language pre-training (VLP) methods primarily rely on pa...
Ground-to-aerial geolocalization refers to localizing a ground-level que...
This report describes the details of our approach for the event
dense-ca...
Scene graph generation is a sophisticated task because there is no speci...
Dense video captioning aims to generate multiple associated captions wit...
Semantic segmentation from very fine resolution (VFR) urban scene images...
As cross-chain technologies make the interactions among different blockc...
Visual place recognition is one of the essential and challenging problem...
Misconfigurations have become the dominant causes of software failures i...
Collecting and analyzing massive data generated from smart devices have
...
This technical report presents a brief description of our submission to ...
Internet of Vehicles (IoV) is a promising branch of the Internet of Thin...
Differential privacy provides a rigorous framework to quantify data priv...
Local differential privacy (LDP) has been deemed as the de facto measure...
Local differential privacy (LDP) can provide each user with strong priva...
With the rapid development of artificial intelligence (AI), ethical issu...
With the rapid development of in-depth learning, neural network and deep...
Modern parallel filesystems such as Lustre are designed to provide high,...