GAN Mask R-CNN:Instance semantic segmentation benefits from generativeadversarial networks
In designing instance segmentation ConvNets that reconstruct masks, segmentation is often taken as its literal definition – assigning label to every pixel – for defining the loss functions. That is, using losses that compute the difference between pixels in the predicted (reconstructed) mask and the ground truth mask – a template matching mechanism. However, any such instance segmentation ConvNet is a generator, so we can lay the problem of predicting masks as a GANs game framework: We can think the ground truth mask is drawn from the true distribution, and a ConvNet like Mask R-CNN is an implicit model that infers the true distribution. Then, designing a discriminator in front of this generator will close the loop of GANs concept and more importantly obtains a loss that is trained not hand-designed. We show this design outperforms the baseline when trying on, without extra settings, several different domains: cellphone recycling, autonomous driving, large-scale object detection, and medical glands. Further, we observe in general GANs yield masks that account for better boundaries, clutter, and small details.
READ FULL TEXT