Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

11/11/2022
by   Hyolim Kang, et al.
0

Temporal Action Localization (TAL) methods typically operate on top of feature sequences from a frozen snippet encoder that is pretrained with the Trimmed Action Classification (TAC) tasks, resulting in a task discrepancy problem. While existing TAL methods mitigate this issue either by retraining the encoder with a pretext task or by end-to-end fine-tuning, they commonly require an overload of high memory and computation. In this work, we introduce Soft-Landing (SoLa) strategy, an efficient yet effective framework to bridge the transferability gap between the pretrained encoder and the downstream tasks by incorporating a light-weight neural network, i.e., a SoLa module, on top of the frozen encoder. We also propose an unsupervised training scheme for the SoLa module; it learns with inter-frame Similarity Matching that uses the frame interval as its supervisory signal, eliminating the need for temporal annotations. Experimental evaluation on various benchmarks for downstream TAL tasks shows that our method effectively alleviates the task discrepancy problem with remarkable computational efficiency.

READ FULL TEXT

page 3

page 4

page 5

page 7

research
03/28/2021

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

Temporal action localization (TAL) is a fundamental yet challenging task...
research
09/15/2023

Fine-tune the pretrained ATST model for sound event detection

Sound event detection (SED) often suffers from the data deficiency probl...
research
11/26/2021

Self-supervised Pretraining with Classification Labels for Temporal Activity Detection

Temporal Activity Detection aims to predict activity classes per frame, ...
research
07/18/2022

STT: Soft Template Tuning for Few-Shot Adaptation

Prompt tuning has been an extremely effective tool to adapt a pre-traine...
research
11/25/2022

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Temporal action localization (TAL) requires long-form reasoning to predi...
research
12/06/2022

Fine-tuned CLIP Models are Efficient Video Learners

Large-scale multi-modal training with image-text pairs imparts strong ge...
research
02/11/2023

How to prepare your task head for finetuning

In deep learning, transferring information from a pretrained network to ...

Please sign up or login with your details

Forgot password? Click here to reset