Beyond Semantic Image Segmentation : Exploring Efficient Inference in Video
We explore the efficiency of the CRF inference module beyond image level semantic segmentation. The key idea is to combine the best of two worlds of semantic co-labeling and exploiting more expressive models. Similar to [Alvarez14] our formulation enables us perform inference over ten thousand images within seconds. On the other hand, it can handle higher-order clique potentials similar to [vineet2014] in terms of region-level label consistency and context in terms of co-occurrences. We follow the mean-field updates for higher order potentials similar to [vineet2014] and extend the spatial smoothness and appearance kernels [DenseCRF13] to address video data inspired by [Alvarez14]; thus making the system amenable to perform video semantic segmentation most effectively.
READ FULL TEXT