Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

Cao, Linhan; Sun, Wei; Zhang, Kaiwei; Peng, Yicong; Zhai, Guangtao; Min, Xiongkuo

Abstract:Video quality assessment (VQA) is essential for quantifying perceptual quality in various video processing workflows, spanning from camera capture systems to over-the-top streaming platforms. While recent supervised VQA models have made substantial progress, the reliance on manually annotated datasets -- a process that is labor-intensive, costly, and difficult to scale up -- has hindered further optimization of their generalization to unseen video content and distortions. To bridge this gap, we introduce a self-supervised learning framework for VQA to learn quality assessment capabilities from large-scale, unlabeled web videos. Our approach leverages a \textbf{learning-to-rank} paradigm to train a large multimodal model (LMM) on video pairs automatically labeled via two manners, including quality pseudo-labeling by existing VQA models and relative quality ranking based on synthetic distortion simulations. Furthermore, we introduce a novel \textbf{iterative self-improvement training strategy}, where the trained model acts an improved annotator to iteratively refine the annotation quality of training data. By training on a dataset $10\times$ larger than the existing VQA benchmarks, our model: (1) achieves zero-shot performance on in-domain VQA benchmarks that matches or surpasses supervised models; (2) demonstrates superior out-of-distribution (OOD) generalization across diverse video content and distortions; and (3) sets a new state-of-the-art when fine-tuned on human-labeled datasets. Extensive experimental results validate the effectiveness of our self-supervised approach in training generalized VQA models. The datasets and code will be publicly released to facilitate future research.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.03631 [cs.CV]
	(or arXiv:2505.03631v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.03631

Computer Science > Computer Vision and Pattern Recognition

Title:Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators