3D perceptual representations are well suited for robot manipulation as ...
We present Analogical Networks, a model that encodes domain knowledge
ex...
Effective planning of long-horizon deformable object manipulation requir...
We introduce TIDEE, an embodied agent that tidies up a disordered scene ...
Building 3D perception systems for autonomous vehicles that do not rely ...
Tracking pixels in videos is typically studied as an optical flow estima...
We consider the problem of segmenting scenes into constituent entities, ...
This paper explores self-supervised learning of amodal 3D feature
repres...
We propose an unsupervised method for detecting and tracking moving obje...
We propose HyperDynamics, a dynamics meta-learning framework that condit...
We propose an action-conditioned dynamics model that predicts scene chan...
We present neural architectures that disentangle RGB-D images into objec...
We propose a system that learns to detect objects and infer their 3D pos...
We hypothesize that an agent that can look around in static scenes can l...
A common approach to localize 3D human joints in a synchronized and
cali...
Consider the utterance "the tomato is to the left of the pot." Humans ca...
We cast visual imitation as a visual correspondence problem. Our robotic...
Humans can effortlessly imagine the occluded side of objects in a photog...
Cross-domain image-to-image translation should satisfy two requirements:...
We integrate two powerful ideas, geometry and deep visual representation...
We propose an exploration method that incorporates look-ahead search ove...
We consider artificial agents that learn to jointly control their grippe...
We present recurrent geometry-aware neural networks that integrate visua...
Humans effortlessly "program" one another by communicating goals and des...
Current convolutional neural networks algorithms for video object tracki...
Current state-of-the-art solutions for motion capture from a single came...
Researchers have developed excellent feed-forward models that learn to m...
Given a visual history, multiple future outcomes for a video scene are
e...
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and...
Hierarchical feature extractors such as Convolutional Networks (ConvNets...
We segment moving objects in videos by ranking spatio-temporal segment
p...