Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment

by   Haoning Wu, et al.

The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies. Contemporary supervised opinion-driven VQA strategies predominantly hinge on training from expensive human annotations for quality scores, which limited the scale and distribution of VQA datasets and consequently led to unsatisfactory generalization capacity of methods driven by these data. On the other hand, although several handcrafted zero-shot quality indices do not require training from human opinions, they are unable to account for the semantics of videos, rendering them ineffective in comprehending complex authentic distortions (e.g., white balance, exposure) and assessing the quality of semantic content within videos. To address these challenges, we introduce the text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) to ascertain the affinity between textual prompts and visual features, facilitating a comprehensive examination of semantic quality concerns without the reliance on human quality annotations. By amalgamating SAQI with existing low-level metrics, we propose the unified Blind Video Quality Index (BVQI) and its improved version, BVQI-Local, which demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. Moreover, we devise an efficient fine-tuning scheme for BVQI-Local that jointly optimizes text prompts and final fusion weights, resulting in state-of-the-art performance and superior generalization ability in comparison to prevalent opinion-driven VQA methods. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.


page 1

page 3

page 4

page 5

page 8

page 9

page 10


Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

Recent learning-based video quality assessment (VQA) algorithms are expe...

Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training

Video quality assessment (VQA) is an important problem in computer visio...

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

The proliferation of in-the-wild videos has greatly expanded the Video Q...

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

Digital humans have witnessed extensive applications in various domains,...

No-Reference Video Quality Assessment using Multi-Level Spatially Pooled Features

Video Quality Assessment (VQA) methods have been designed with a focus o...

Making Video Quality Assessment Models Robust to Bit Depth

We introduce a novel feature set, which we call HDRMAX features, that wh...

DCVQE: A Hierarchical Transformer for Video Quality Assessment

The explosion of user-generated videos stimulates a great demand for no-...

Please sign up or login with your details

Forgot password? Click here to reset