Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

09/27/2022
by   Yao-Hung Hubert Tsai, et al.
3

The perception system in personalized mobile agents requires developing indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc. Nonetheless, this direction has not been well-explored in comparison with models for outdoor environments (e.g., the autonomous driving system that includes pedestrian prediction, car detection, traffic sign recognition, etc.). In this paper, we first discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments, and other challenges such as fusion between heterogeneous sources of information (e.g., RGB images and Lidar point clouds), modeling relationships between a diverse set of outputs (e.g., 3D object locations, depth estimation, and human poses), and computational efficiency. Then, we describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges. MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks. We show that MMISM performs on par or even better than single-task models; e.g., we improve the baseline 3D object detection results by 11.7 benchmark ARKitScenes dataset.

READ FULL TEXT
research
03/10/2018

Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

As the demand for enabling high-level autonomous driving has increased i...
research
11/22/2017

Frustum PointNets for 3D Object Detection from RGB-D Data

While object recognition on 2D images is getting more and more mature, 3...
research
05/29/2019

A survey of Object Classification and Detection based on 2D/3D data

Recently, by using deep neural network based algorithms, object classifi...
research
08/17/2021

Indoor Semantic Scene Understanding using Multi-modality Fusion

Seamless Human-Robot Interaction is the ultimate goal of developing serv...
research
11/17/2021

ARKitScenes – A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

Scene understanding is an active research area. Commercial depth sensors...
research
06/08/2023

Efficient Multi-Task Scene Analysis with RGB-D Transformers

Scene analysis is essential for enabling autonomous systems, such as mob...
research
08/01/2019

DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments

In recent years, there has been a rapid increase in the number of servic...

Please sign up or login with your details

Forgot password? Click here to reset