YouTube UGC Dataset for Video Compression Research
Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today's video sharing applications. However, there are few public UGC datasets available for video compression and quality assessment research. This paper introduces a large scale UGC dataset (1500 20 sec video clips) sampled from millions of YouTube videos. The dataset covers popular categories like Gaming, Sports, and new features like High Dynamic Range (HDR). Besides a novel sampling method based on features extracted from encoding, challenges for UGC compression and quality evaluation are also discussed. The dataset also includes three no-reference objective quality metrics (Noise, Banding, and SLEEQ) for these clips. These metrics overcome certain shortcomings of traditional reference-based metrics on UGC.
READ FULL TEXT