Few-Shot Temporal Action Localization with Query Adaptive Transformer

10/20/2021
by   Sauradip Nag, et al.
2

Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the state of the art alternatives significantly in both single-domain and cross-domain scenarios. The source code can be found in https://github.com/sauradip/fewshotQAT

READ FULL TEXT

page 4

page 10

page 11

research
08/31/2020

Learning to Localize Actions from Moments

With the knowledge of action moments (i.e., trimmed video clips that eac...
research
11/27/2022

Multi-Modal Few-Shot Temporal Action Detection

Few-shot (FS) and zero-shot (ZS) learning are two different approaches f...
research
08/06/2021

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

A few-shot semantic segmentation model is typically composed of a CNN en...
research
04/06/2021

Few-Shot Transformation of Common Actions into Time and Space

This paper introduces the task of few-shot common action localization in...
research
08/13/2020

Localizing the Common Action Among a Few Videos

This paper strives to localize the temporal extent of an action in a lon...
research
11/19/2019

Cross-Class Relevance Learning for Temporal Concept Localization

We present a novel Cross-Class Relevance Learning approach for the task ...
research
05/11/2023

Undercover Deepfakes: Detecting Fake Segments in Videos

The recent renaissance in generative models, driven primarily by the adv...

Please sign up or login with your details

Forgot password? Click here to reset