Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation

07/08/2021
by   Lingyun Wu, et al.
0

Precise localization of polyp is crucial for early cancer screening in gastrointestinal endoscopy. Videos given by endoscopy bring both richer contextual information as well as more challenges than still images. The camera-moving situation, instead of the common camera-fixed-object-moving one, leads to significant background variation between frames. Severe internal artifacts (e.g. water flow in the human body, specular reflection by tissues) can make the quality of adjacent frames vary considerately. These factors hinder a video-based model to effectively aggregate features from neighborhood frames and give better predictions. In this paper, we present Spatial-Temporal Feature Transformation (STFT), a multi-frame collaborative framework to address these issues. Spatially, STFT mitigates inter-frame variations in the camera-moving situation with feature alignment by proposal-guided deformable convolutions. Temporally, STFT proposes a channel-aware attention module to simultaneously estimate the quality and correlation of adjacent frames for adaptive feature aggregation. Empirical studies and superior results demonstrate the effectiveness and stability of our method. For example, STFT improves the still image baseline FCOS by 10.6 F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3.6 and 8.0 <https://github.com/lingyunwu14/STFT>.

READ FULL TEXT

page 2

page 7

research
06/06/2023

YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

Accurate polyp detection is essential for assisting clinical rectal canc...
research
07/20/2020

Learning Joint Spatial-Temporal Transformations for Video Inpainting

High-quality video inpainting that completes missing regions in video fr...
research
09/09/2023

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Detecting breast lesion in videos is crucial for computer-aided diagnosi...
research
03/14/2022

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

The target of space-time video super-resolution (STVSR) is to increase t...
research
04/18/2021

Let's See Clearly: Contaminant Artifact Removal for Moving Cameras

Contaminants such as dust, dirt and moisture adhering to the camera lens...
research
06/14/2022

Stand-Alone Inter-Frame Attention in Video Models

Motion, as the uniqueness of a video, has been critical to the developme...
research
08/29/2019

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending

We propose the first multi-frame video object detection framework traine...

Please sign up or login with your details

Forgot password? Click here to reset