With ever increasing parameters and computation, vision-language pre-tra...
In 3D Referring Expression Segmentation (3D-RES), the earlier approach a...
Face forgery techniques have advanced rapidly and pose serious security
...
In recent years, 3D representation learning has turned to 2D vision-lang...
Deepfakes are realistic face manipulations that can pose serious threats...
Deep neural networks often suffer from poor generalization due to comple...
Pre-trained language models (PLMs) have played an increasing role in
mul...
Recently, growing interest has been aroused in extending the multimodal
...
Text-driven 3D stylization is a complex and crucial task in the fields o...
In this paper, we study teacher-student learning from the perspective of...
Semi-supervised object detection (SSOD) is a research hot spot in comput...
Parameter-efficient transfer learning (PETL) is an emerging research spo...
In this paper, we study the local visual modeling with grid features for...
Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding ...
Deep neural networks often suffer from poor generalization caused by com...
Building a universal video-language model for solving various video
unde...
Video-text retrieval has been a crucial and fundamental task in multi-mo...
Most of the existing work in one-stage referring expression comprehensio...
Despite the exciting performance, Transformer is criticized for its exce...
Pixel synthesis is a promising research paradigm for image generation, w...
In this paper, we propose a simple yet universal network termed SeqTR fo...
Recently, automatic video captioning has attracted increasing attention,...
Referring expression comprehension (REC) aims to locate a certain object...
In this paper, we are committed to establishing an unified and end-to-en...
Descriptive region features extracted by object detection networks have
...
Transformer-based architectures have shown great success in image captio...
Online image hashing has received increasing research attention recently...
Referring expression comprehension (REC) and segmentation (RES) are two
...
Referring Expression Comprehension (REC) is an emerging research spot in...
Deep hashing methods have been proved to be effective and efficient for
...
As an approximate nearest neighbor search technique, hashing has been wi...
Inferring the 3D shape of an object from an RGB image has shown impressi...
Sketch recognition remains a significant challenge due to the limited
tr...
In this paper, we propose a novel deep framework for part-level semantic...
Image deblurring has achieved exciting progress in recent years. However...
Image captioning has attracted ever-increasing research attention in the...
In this paper, we address the problem of monocular depth estimation when...
Learning representations with diversified information remains an open
pr...
Online hashing has attracted extensive research attention when facing
st...
Online image hashing has received increasing research attention recently...
Recovering the 3D representation of an object from single-view or multi-...
When facing large-scale image datasets, online hashing serves as a promi...
In this paper, we proposed an integrated model of semantic-aware and
con...
We propose a novel deep neural network architecture for the challenging
...