We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

08/12/2020
by   Alex Andonian, et al.
8

Identifying common patterns among events is a key ability in human and machine perception, as it underlies intelligent decision making. We propose an approach for learning semantic relational set abstractions on videos, inspired by human learning. We combine visual features with natural language supervision to generate high-level representations of similarities across a set of videos. This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?), set completion (which new video goes well with the set?), and odd one out detection (which video does not belong to the set?). Experiments on two video benchmarks, Kinetics and Multi-Moments in Time, show that robust and versatile representations emerge when learning to recognize commonalities among sets. We compare our model to several baseline algorithms and show that significant improvements result from explicitly learning relational abstractions with semantic supervision.

READ FULL TEXT

page 12

page 13

page 14

research
06/21/2022

Automatic Concept Extraction for Concept Bottleneck-based Video Classification

Recent efforts in interpretable deep learning models have shown that con...
research
07/31/2020

DeepVA: Bridging Cognition and Computation through Semantic Interaction and Deep Learning

This paper examines how deep learning (DL) representations, in contrast ...
research
07/23/2021

Constellation: Learning relational abstractions over objects for compositional imagination

Learning structured representations of visual scenes is currently a majo...
research
10/10/2017

Deep Semantic Abstractions of Everyday Human Activities: On Commonsense Representations of Human Interactions

We propose a deep semantic characterization of space and motion categori...
research
12/16/2013

Abstraction in decision-makers with limited information processing capabilities

A distinctive property of human and animal intelligence is the ability t...
research
12/15/2022

EVAL: Explainable Video Anomaly Localization

We develop a novel framework for single-scene video anomaly localization...
research
03/29/2023

Ideal Abstractions for Decision-Focused Learning

We present a methodology for formulating simplifying abstractions in mac...

Please sign up or login with your details

Forgot password? Click here to reset