Conventional feature extraction techniques in the face anti-spoofing dom...
Due to the growing availability of face anti-spoofing databases, researc...
Vision-language (VL) Pre-training (VLP) has shown to well generalize VL
...
Face presentation attacks (PA), also known as spoofing attacks, pose a
s...
The latest breakthroughs in large vision-language models, such as Bard a...
In this work, we propose a few-shot colorectal tissue image generation m...
Existing video instance segmentation (VIS) approaches generally follow a...
Accurate 3D mitochondria instance segmentation in electron microscopy (E...
The pose-guided person image generation task requires synthesizing
photo...
Weakly-supervised vision-language (V-L) pre-training (W-VLP) aims at lea...
Image Difference Captioning (IDC) aims at generating sentences to descri...
Creative sketching or doodling is an expressive activity, where imaginat...
Predicting a scene graph that captures visual entities and their interac...
Human-object interaction detection is an important and relatively new cl...
Sequential vision-to-language or visual storytelling has recently been o...
This paper describes the MeMAD project entry to the WMT Multimodal Machi...
Designing discriminative powerful texture features robust to realistic
i...
This paper revisits visual saliency prediction by evaluating the recent
...
To bridge the gap between humans and machines in image understanding and...
This manuscript introduces the problem of prominent object detection and...
This paper revisits recognition of natural image pleasantness by employi...
Most approaches to human attribute and action recognition in still image...
This paper presents a novel fixation prediction and saliency modeling
fr...
We present our submission to the Microsoft Video to Language Challenge o...
In this paper, we describe the system for generating textual description...
This paper describes PinView, a content-based image retrieval system tha...