Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training

by   Xiao Lu, et al.

It is challenging to annotate large-scale datasets for supervised video shadow detection methods. Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training. Specifically, we propose the Spatial and Temporal ICT, in which we define two new interpolation schemes, i.e., the spatial interpolation and the temporal interpolation. We then derive the spatial and temporal interpolation consistency constraints accordingly for enhancing generalization in the pixel-wise classification task and for encouraging temporal consistent predictions, respectively. In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images, and propose a scale-consistency constraint to minimize the discrepancy among the predictions at different scales. Our proposed approach is extensively validated on the ViSha dataset and a self-annotated dataset. Experimental results show that, even without video labels, our approach is better than most state of the art supervised, semi-supervised or unsupervised image/video shadow detection methods and other methods in related tasks. Code and dataset are available at <>.


page 3

page 7

page 8


ST-RAP: A Spatio-Temporal Framework for Real Estate Appraisal

In this paper, we introduce ST-RAP, a novel Spatio-Temporal framework fo...

Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

We endeavor on a rarely explored task named Insubstantial Object Detecti...

Learning Shadow Correspondence for Video Shadow Detection

Video shadow detection aims to generate consistent shadow predictions am...

Spatio-Temporal Structure Consistency for Semi-supervised Medical Image Classification

Intelligent medical diagnosis has shown remarkable progress based on the...

MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection

In this paper, we introduce MINTIME, a video deepfake detection approach...

Continuous conditional video synthesis by neural processes

We propose a unified model for multiple conditional video synthesis task...

STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection

Industrial smoke emissions present a serious threat to natural ecosystem...

Please sign up or login with your details

Forgot password? Click here to reset