Recently, large-scale pre-trained language-image models like CLIP have s...
Point-supervised Temporal Action Localization (PSTAL) is an emerging res...
Cross-domain few-shot classification (CD-FSC) aims to identify novel tar...
Modeling and synthesizing low-light raw noise is a fundamental problem f...
Pre-training has emerged as an effective technique for learning powerful...
Current state-of-the-art approaches for few-shot action recognition achi...
Learning from large-scale contrastive language-image pre-training like C...
Since the fully convolutional network has achieved great success in sema...
Human-Object Interaction (HOI) detection aims to learn how human interac...
Controllable person image synthesis task enables a wide range of applica...
Standard approaches for video recognition usually operate on the full in...
This technical report presents our first place winning solution for temp...
Existing GAN inversion methods fail to provide latent codes for reliable...
Recently, many approaches tackle the Unsupervised Domain Adaptive person...
This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR)
...
Deep learning-based methods for low-light image enhancement typically re...
The fully convolutional network (FCN) has achieved tremendous success in...
Supervised learning is dominant in person search, but it requires elabor...
The central idea of contrastive learning is to discriminate between diff...
Temporal action localization aims to localize starting and ending time w...
Most recent approaches for online action detection tend to apply Recurre...
This technical report presents our solution for temporal action detectio...
This technical report analyzes an egocentric video action detection meth...
We present an efficient high-resolution network, Lite-HRNet, for human p...
Self-supervised learning presents a remarkable performance to utilize
un...
Temporal action proposal generation aims to estimate temporal intervals ...
The goal of person search is to localize and match query persons from sc...
In the conventional person Re-ID setting, it is widely assumed that crop...
Non-local operation is widely explored to model the long-range dependenc...
Currently, one-stage frameworks have been widely applied for temporal ac...
Human pose estimation is the task of localizing body keypoints from stil...
In this report, we present our solution for the task of temporal action
...
This technical report analyzes a temporal action localization method we ...
Crowd counting is a concerned and challenging task in computer vision.
E...
Image dehazing using learning-based methods has achieved state-of-the-ar...
The low-level details and high-level semantics are both essential to the...
Recent works have widely explored the contextual dependencies to achieve...
Few-shot instance segmentation (FSIS) conjoins the few-shot learning par...
We propose a Generative Transfer Network (GTNet) for zero shot object
de...
Person search aims at localizing and identifying a query person from a
g...
Semantic segmentation requires both rich spatial information and sizeabl...
Most existing methods of semantic segmentation still suffer from two asp...
We present an effective blind image deblurring method based on a data-dr...