Importance Reweighting for Biquality Learning

10/19/2020
by   Pierre Nodet, et al.
0

The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of “supervision deficiencies”, namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label noise can be of different kinds, including completely-at-random, at-random or even not-at-random. All these kinds of label noise are addressed separately in the literature, leading to highly specialized approaches. This paper proposes an original view of Weakly Supervised Learning, to design generic approaches capable of dealing with any kind of label noise. For this purpose, an alternative setting called “Biquality data” is used. This setting assumes that a small trusted dataset of correctly labeled examples is available, in addition to the untrusted dataset of noisy examples. In this paper, we propose a new reweigthing scheme capable of identifying noncorrupted examples in the untrusted dataset. This allows one to learn classifiers using both datasets. Extensive experiments demonstrate that the proposed approach outperforms baselines and state-of-the-art approaches, by simulating several kinds of label noise and varying the quality and quantity of untrusted examples.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset