ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation

06/14/2018
by   Francesco Lattari, et al.
2

We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on the object of interest at inference time. Generalizing to new objects not observed during training is known to be an hard task for supervised approaches that need to be retrained on the new instances. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features that can be adapted by the model itself through affine transformations conditioned on the object in the first frame of the sequence. This approach is simple, it can be trained end-to-end and does not require extra training steps at inference time. Our method shows comparable results on DAVIS2016 with respect to state-of-the art approaches that use online finetuning, and outperform them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 placing in 10-th position.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset