In recent years, many data augmentation techniques have been proposed to...
Frozen pretrained models have become a viable alternative to the
pretrai...
We present a novel architecture for dense correspondence. The current
st...
This paper presents a novel cost aggregation network, called Volumetric
...
We study the challenging problem of recovering detailed motion from a si...
The paper presents a scalable approach for learning distributed
represen...
We propose an unsupervised method for 3D geometry-aware representation
l...
Rolling shutter (RS) distortion can be interpreted as the result of pick...
This paper proposes a simple transfer learning baseline for sign languag...
Semi-supervised action recognition is a challenging but important task d...
For human action understanding, a popular research direction is to analy...
Humans can easily segment moving objects without knowing what they are. ...
We introduce MixTraining, a new training paradigm for object detection t...
A common problem in the task of human-object interaction (HOI) detection...
The vision community is witnessing a modeling shift from CNNs to
Transfo...
Image-level contrastive representation learning has proven to be highly
...
We present Neural Articulated Radiance Field (NARF), a novel deformable ...
This paper presents a new vision Transformer, called Swin Transformer, t...
Prior research on self-supervised learning has led to considerable progr...
We present an end-to-end joint training framework that explicitly models...
The Non-Local Network (NLNet) presents a pioneering approach for capturi...
Contrastive learning methods for unsupervised visual representation lear...
Human poses that are rare or unseen in a training set are challenging fo...
A common problem in human-object interaction (HOI) detection task is tha...
Verification and regression are two general methodologies for prediction...
A recent approach for object detection and human pose estimation is to
r...
The non-local block is a popular module for strengthening the context
mo...
Unsupervised visual pretraining based on the instance discrimination pre...
For high-level visual recognition, self-supervised learning defines and ...
In the feature maps of CNNs, there commonly exists considerable spatial
...
A well-known issue of Batch Normalization is its significantly reduced
e...
We present an object representation, called Dense RepPoints, for
flexibl...
We present an end-to-end joint training framework that explicitly models...
We present an unsupervised approach for factorizing object appearance in...
We address the problem of removing undesirable reflections from a single...
We present a method for decomposing the 3D scene flow observed from a mo...
Dashboard cameras capture a tremendous amount of driving scene video eac...
Multiview stereo aims to reconstruct scene depth from images acquired by...
The Non-Local Network (NLNet) presents a pioneering approach for capturi...
The convolution layer has been the dominant feature extractor in compute...
Modern object detectors rely heavily on rectangular bounding boxes, such...
Attention mechanisms have become a popular component in deep neural netw...
We present a method for compositing virtual objects into a photograph su...
Irreversible visual impairment is often caused by primary angle-closure
...
We study object recognition under the constraint that each object class ...
The superior performance of Deformable Convolutional Networks arises fro...
Accurate detection and tracking of objects is vital for effective video
...
We present a method for human pose tracking that learns explicitly about...
We present recurrent transformer networks (RTNs) for obtaining dense
cor...
For the ECCV 2018 PoseTrack Challenge, we present a 3D human pose estima...