FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

by   Haoning Wu, et al.

Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and cropping. However, they obviously corrupt quality-related information in videos and are thus not optimal for learning good representations for VQA. Therefore, there is an eager need to design a new quality-retained sampling scheme for VQA. In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. These mini-patches are spliced and aligned temporally, named as fragments. We further build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10 high-resolution videos. The newly learned video-quality-related representations can also be transferred into smaller VQA datasets, boosting performance in these scenarios. Extensive experiments show that FAST-VQA has good performance on inputs of various resolutions while retaining high efficiency. We publish our code at


page 2

page 3

page 6

page 14


Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

The increased resolution of real-world videos presents a dilemma between...

MRET: Multi-resolution Transformer for Video Quality Assessment

No-reference video quality assessment (NR-VQA) for user generated conten...

FAVER: Blind Quality Prediction of Variable Frame Rate Videos

Video quality assessment (VQA) remains an important and challenging prob...

GMS-3DQA: Projection-based Grid Mini-patch Sampling for 3D Model Quality Assessment

Nowadays, most 3D model quality assessment (3DQA) methods have been aime...

Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Video quality assessment (VQA) aims to simulate the human perception of ...

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

Blind video quality assessment (BVQA) plays an indispensable role in mon...

Panoramic Vision Transformer for Saliency Detection in 360° Videos

360^∘ video saliency detection is one of the challenging benchmarks for ...

Please sign up or login with your details

Forgot password? Click here to reset