Training Deep Networks to be Spatially Sensitive

by   Nicholas Kolkin, et al.

In many computer vision tasks, for example saliency prediction or semantic segmentation, the desired output is a foreground map that predicts pixels where some criteria is satisfied. Despite the inherently spatial nature of this task commonly used learning objectives do not incorporate the spatial relationships between misclassified pixels and the underlying ground truth. The Weighted F-measure, a recently proposed evaluation metric, does reweight errors spatially, and has been shown to closely correlate with human evaluation of quality, and stably rank predictions with respect to noisy ground truths (such as a sloppy human annotator might generate). However it suffers from computational complexity which makes it intractable as an optimization objective for gradient descent, which must be evaluated thousands or millions of times while learning a model's parameters. We propose a differentiable and efficient approximation of this metric. By incorporating spatial information into the objective we can use a simpler model than competing methods without sacrificing accuracy, resulting in faster inference speeds and alleviating the need for pre/post-processing. We match (or improve) performance on several tasks compared to prior state of the art by traditional metrics, and in many cases significantly improve performance by the weighted F-measure.


page 1

page 3

page 4

page 7


Optimizing Rank-based Metrics with Blackbox Differentiation

Rank-based metrics are some of the most widely used criteria for perform...

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

With the increased focus on visual attention (VA) in the last decade, a ...

Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks

Neural networks and especially convolutional neural networks are of grea...

Can Ground Truth Label Propagation from Video help Semantic Segmentation?

For state-of-the-art semantic segmentation task, training convolutional ...

Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression

Heatmap regression has became one of the mainstream approaches to locali...

ThreshNet: Segmentation Refinement Inspired by Region-Specific Thresholding

We present ThreshNet, a post-processing method to refine the output of n...

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

We propose 4 insights that help to significantly improve the performance...

Please sign up or login with your details

Forgot password? Click here to reset