Large Language Models (LLMs) have the capacity of performing complex
sch...
Starting from the resurgence of deep learning, vision-language models (V...
3D vision-language grounding (3D-VL) is an emerging field that aims to
c...
In this paper, we study the problem of planning in Minecraft, a popular,...
We study the problem of learning goal-conditioned policies in Minecraft,...
Current computer vision models, unlike the human visual system, cannot y...
We propose a new task to benchmark scene understanding of embodied agent...
Traffic demand forecasting by deep neural networks has attracted widespr...
A significant gap remains between today's visual pattern recognition mod...
Reasoning about visual relationships is central to how humans interpret ...
It has been a challenge to learning skills for an agent from long-horizo...
In this paper, we present a multimodal mobile teleoperation system that
...
Robust and accurate estimation of liquid height lies as an essential par...
Learning transferable knowledge across similar but different settings is...
In this paper, we study Reinforcement Learning from Demonstrations (RLfD...
This paper studies Learning from Observations (LfO) for imitation learni...
In this paper, we focus on the challenging perception problem in robotic...
In this paper, we present TeachNet, a novel neural network architecture ...
In this paper, we propose an end-to-end grasp evaluation model to addres...
Learning and inference movement is a very challenging problem due to its...
Task transfer is extremely important for reinforcement learning, since i...