CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

03/26/2018
by   Linchao Bao, et al.
0

This paper addresses the problem of video object segmentation, where the initial object mask is given in the first frame of an input video. We propose a novel spatio-temporal Markov Random Field (MRF) model defined over pixels to handle this problem. Unlike conventional MRF models, the spatial dependencies among pixels in our model are encoded by a Convolutional Neural Network (CNN). Specifically, for a given object, the probability of a labeling to a set of spatially neighboring pixels can be predicted by a CNN trained for this specific object. As a result, higher-order, richer dependencies among pixels in the set can be implicitly modeled by the CNN. With temporal dependencies established by optical flow, the resulting MRF model combines both spatial and temporal cues for tackling video object segmentation. However, performing inference in the MRF model is very difficult due to the very high-order dependencies. To this end, we propose a novel CNN-embedded algorithm to perform approximate inference in the MRF. This algorithm proceeds by alternating between a temporal fusion step and a feed-forward CNN step. When initialized with an appearance-based one-shot segmentation CNN, our model outperforms the winning entries of the DAVIS 2017 Challenge, without resorting to model ensembling or any dedicated detectors.

READ FULL TEXT

page 6

page 8

research
03/28/2019

Fast video object segmentation with Spatio-Temporal GANs

Learning descriptive spatio-temporal object models from data is paramoun...
research
06/25/2020

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem...
research
12/06/2018

Tube-CNN: Modeling temporal evolution of appearance for object detection in video

Object detection in video is crucial for many applications. Compared to ...
research
10/05/2019

Object Segmentation Tracking from Generic Video Cues

We propose a light-weight variational framework for online tracking of o...
research
10/14/2017

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

We investigate video classification via a two-stream convolutional neura...
research
09/04/2019

Deep learning networks for selection of persistent scatterer pixels in multi-temporal SAR interferometric processing

In multi-temporal SAR interferometry (MT-InSAR), persistent scatterer (P...
research
06/05/2013

Distributed Bayesian inference for consistent labeling of tracked objects in non-overlapping camera networks

One of the fundamental requirements for visual surveillance using non-ov...

Please sign up or login with your details

Forgot password? Click here to reset