As humans, we can modify our assumptions about a scene by imagining
alte...
Natural language guided embodied task completion is a challenging proble...
Demand for image editing has been increasing as users' desire for expres...
Interest in physical therapy and individual exercises such as yoga/dance...
For embodied agents, navigation is an important ability but not an isola...
Videos convey rich information. Dynamic spatio-temporal relationships be...
The Visual Dialog task requires a model to exploit both image and
conver...
Paragraph-style image captions describe diverse aspects of an image as
o...