Video Event Extraction via Tracking Visual States of Arguments

11/03/2022
by   Guang Yang, et al.
0

Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring fine-grained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of arguments, we decompose them into changes in pixels within objects, displacements of objects, and interactions among multiple arguments. We further propose Object State Embedding, Object Motion-aware Embedding and Argument Interaction Embedding to encode and track these changes respectively. Experiments on various video event extraction tasks demonstrate significant improvements compared to state-of-the-art models. In particular, on verb classification, we achieve 3.49 Recognition.

READ FULL TEXT

page 1

page 7

page 10

research
10/16/2022

EventGraph: Event Extraction as Semantic Graph Parsing

Event extraction involves the detection and extraction of both the event...
research
06/23/2021

Reinforcement Learning-based Dialogue Guided Event Extraction to Exploit Argument Relations

Event extraction is a fundamental task for natural language processing. ...
research
10/10/2016

EM-Based Mixture Models Applied to Video Event Detection

Surveillance system (SS) development requires hi-tech support to prevail...
research
03/11/2021

On Improving Deep Learning Trace Analysis with System Call Arguments

Kernel traces are sequences of low-level events comprising a name and mu...
research
01/16/2020

Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video

With the proliferation of imaging sensors, the volume of multi-modal ima...
research
05/05/2020

Cross-media Structured Common Space for Multimedia Event Extraction

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims ...
research
06/24/2019

Event-Driven Models

In Reinforcement Learning we look for meaning in the flow of input/outpu...

Please sign up or login with your details

Forgot password? Click here to reset