Detecting Human-Object Interaction with Mixed Supervision

Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet <human; verb; object>, requiring bounding boxes for human and object, and action between them for the task completion. In other words, this task requires strong supervision for training that is however hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their exact location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a naïve combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning that learns seamlessly across these two types of supervision. Moreover, in light of the annotation insufficiency in mixed supervision, we introduce an HOI element swapping technique to synthesize diverse and hard negatives across images and improve the robustness of the model. Our method is evaluated on the challenging HICO-DET dataset. It performs close to or even better than many fully-supervised methods by using a mixed amount of strong and weak annotations; furthermore, it outperforms representative state of the art weakly and fully-supervised methods under the same supervision.


page 1

page 5

page 7


Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

We introduce the task of weakly supervised learning for detecting human ...

Mixed supervision for surface-defect detection: from weakly to fully supervised learning

Deep-learning methods have recently started being employed for addressin...

Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection

Object detection has witnessed significant progress by relying on large,...

Learning by Fixing: Solving Math Word Problems with Weak Supervision

Previous neural solvers of math word problems (MWPs) are learned with fu...

Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks

Visual relationship detection is fundamental for holistic image understa...

Weakly-Supervised Arbitrary-Shaped Text Detection with Expectation-Maximization Algorithm

Arbitrary-shaped text detection is an important and challenging task in ...

Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

We address the problem of localisation of objects as bounding boxes in i...

Please sign up or login with your details

Forgot password? Click here to reset