Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

by   Ishan Misra, et al.

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy "human-centric" annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting "what's in the image" versus "what's worth saying." We demonstrate the algorithm's efficacy along a variety of metrics and datasets, including MS COCO and Yahoo Flickr 100M. We show significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.


page 1

page 7

page 8


ImageCaptioner^2: Image Captioner for Image Captioning Bias Amplification Assessment

Most pre-trained learning systems are known to suffer from bias, which t...

Understanding and Evaluating Racial Biases in Image Captioning

Image captioning is an important task for benchmarking visual reasoning ...

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

Many structured prediction problems (particularly in vision and language...

Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

High-quality data is necessary for modern machine learning. However, the...

Exposing and Correcting the Gender Bias in Image Captioning Datasets and Models

The task of image captioning implicitly involves gender identification. ...

TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments

Computer vision was long a tool used for aiding visually impaired people...

Please sign up or login with your details

Forgot password? Click here to reset