STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

by   Mohsen Fayyaz, et al.

This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets.


Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework

Video deraining is an important task in computer vision as the unwanted ...

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation

Convolutional Neural Networks (CNNs) can be shifted across 2D images or ...

Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

Semantic video segmentation is a key challenge for various applications....

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

In this paper, we develop deep spatio-temporal neural networks to sequen...

Deep learning networks for selection of persistent scatterer pixels in multi-temporal SAR interferometric processing

In multi-temporal SAR interferometry (MT-InSAR), persistent scatterer (P...

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery

Representation learning of the task-oriented attention while tracking in...

Multi-stream CNN based Video Semantic Segmentation for Automated Driving

Majority of semantic segmentation algorithms operate on a single frame e...

Please sign up or login with your details

Forgot password? Click here to reset