Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

by   Pulkit Tandon, et al.

Video represents the majority of internet traffic today leading to a continuous technological arms race between generating higher quality content, transmitting larger file sizes and supporting network infrastructure. Adding to this is the recent COVID-19 pandemic fueled surge in the use of video conferencing tools. Since videos take up substantial bandwidth ( 100 Kbps to few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. In this work, we present a novel video compression pipeline, called Txt2Vid, which substantially reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n=242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git.


page 1

page 2


Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction

Video compression is a central feature of the modern internet powering t...

LSCD: A Large-Scale Screen Content Dataset for Video Compression

Multimedia compression allows us to watch videos, see pictures and hear ...

Tools for online tutorials: comparing capture devices, tutorial representations, and access devices

Tutorials are one of the most fundamental means of conveying knowledge. ...

Realistic Video Sequences for Subjective QoE Analysis

Multimedia streaming over the Internet (live and on demand) is the corne...

Quality of Service (QoS): Measurements of Video Streaming

Nowadays video streaming is growing over the social clouds, where end-us...

Judging a video by its bitstream cover

Classifying videos into distinct categories, such as Sport and Music Vid...

Ultra-low bitrate video conferencing using deep image animation

In this work we propose a novel deep learning approach for ultra-low bit...

Please sign up or login with your details

Forgot password? Click here to reset