Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Point Clouds

by   Michael Danielczuk, et al.

The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive hand labeled datasets are available. As generating these datasets is time-consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can generalize well to the real world. We present a method for automated dataset generation and rapidly generate a training dataset of 50k depth images and 320k object masks synthetically using simulated scenes of 3D CAD models. We train a variant of Mask R-CNN on the generated dataset to perform category-agnostic instance segmentation without hand-labeled data. We evaluate the trained network, which we refer to as Synthetic Depth (SD) Mask R-CNN, on a set of real, high-resolution images of challenging, densely cluttered bins containing objects with highly-varied geometry. SD Mask R-CNN outperforms point cloud clustering baselines by an absolute 15 Recall, and achieves performance levels similar to a Mask RCNN trained on a massive, hand-labeled RGB dataset and fine-tuned on real images from the experimental setup. The network also generalizes well to a lower-resolution depth sensor. We deploy the model in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application. Code, the synthetic training dataset, and supplementary material are available at .


page 1

page 4

page 6


Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction

Instance segmentation of unknown objects from images is regarded as rele...

Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots

The ability to distinguish between the self and the background is of par...

Improving Fine-Grain Segmentation via Interpretable Modifications: A Case Study in Fossil Segmentation

Most interpretability research focuses on datasets containing thousands ...

Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics

State-of-the-art approaches in computer vision heavily rely on sufficien...

Unseen Object Instance Segmentation for Robotic Environments

In order to function in unstructured environments, robots need the abili...

Depth image hand tracking from an overhead perspective using partially labeled, unbalanced data: Development and real-world testing

We present the development and evaluation of a hand tracking algorithm b...

Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network

With more and more household objects built on planned obsolescence and c...

Please sign up or login with your details

Forgot password? Click here to reset