The task of lip synchronization (lip-sync) seeks to match the lips of hu...
In photo editing, it is common practice to remove visual distractions to...
We tackle the challenging task of unsupervised object localization in th...
Implicit neural representations store videos as neural networks and have...
We present a method for joint alignment of sparse in-the-wild image
coll...
We present FlexNeRF, a method for photorealistic freeviewpoint rendering...
Implicit neural representations (INR) have gained increasing attention i...
The goal of multimodal summarization is to extract the most important
in...
We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal
i...
Vision Transformers (ViTs) have gained significant popularity in recent ...
Compression and reconstruction of visual data have been widely studied i...
Modern retrieval system often requires recomputing the representation of...
By leveraging contrastive learning, clustering, and other pretext tasks,...
We study the problem of compositional zero-shot learning for object-attr...
We present Neural Space-filling Curves (SFCs), a data-driven approach to...
We introduce LilNetX, an end-to-end trainable technique for neural netwo...
Weakly-supervised temporal action localization aims to recognize and loc...
Recent advances in image editing techniques have posed serious challenge...
In the typical path planning pipeline for a ground robot, we build a map...
Video compression is a central feature of the modern internet powering
t...
Research shows a noticeable drop in performance of object detectors when...
The success of deep learning has enabled advances in multimodal tasks th...
In the context of online privacy, many methods propose complex privacy a...
We study a referential game (a type of signaling game) where two agents
...
Adversarial examples pose a unique challenge for deep learning systems.
...
We propose a novel neural representation for videos (NeRV) which encodes...
Incorporating relational reasoning in neural networks for object recogni...
Generating future frames given a few context (or past) frames is a
chall...
In this paper, we address the task of interacting with dynamic environme...
Visual attributes constitute a large portion of information contained in...
Recent advances in semi-supervised object detection (SSOD) are largely d...
With the recent progress in Generative Adversarial Networks (GANs), it i...
We tackle object category discovery, which is the problem of discovering...
We propose a novel approach for few-shot talking-head synthesis. While r...
We propose a novel approach for multi-modal Image-to-image (I2I) transla...
Deep learning relies on the availability of a large corpus of data (labe...
A neural network regularizer (e.g., weight decay) boosts performance by
...
This paper studies video inpainting detection, which localizes an inpain...
3D convolutional networks are prevalent for video recognition. While
ach...
Self-attention learns pairwise interactions via dot products to model
lo...
Recognition tasks, such as object recognition and keypoint estimation, h...
With the proliferation of deep learning methods, many computer vision
pr...
Recent literature has shown that features obtained from supervised train...
Most human action recognition systems typically consider static appearan...
We present a simple yet effective general-purpose framework for modeling...
Current action recognition systems require large amounts of training dat...
Semi-supervised domain adaptation (SSDA) aims to adapt models from a lab...
Pre-trained convolutional neural networks (CNNs) are powerful off-the-sh...
Retrieval networks are essential for searching and indexing. Compared to...
The performance of Multi-Source Unsupervised Domain Adaptation depends
s...