Harmonizer: Learning to Perform White-Box Image and Video Harmonization

by   Zhanghan Ke, et al.

Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution. Code and models are available at: https://github.com/ZHKKKe/Harmonizer.


page 6

page 8

page 12

page 13

page 14

page 15


RSFNet: A White-Box Image Retouching Approach using Region-Specific Color Filters

Retouching images is an essential aspect of enhancing the visual appeal ...

Dense Pixel-to-Pixel Harmonization via Continuous Image Representation

High-resolution (HR) image harmonization is of great significance in rea...

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization

Image harmonization aims to modify the color of the composited region wi...

High-Resolution Daytime Translation Without Domain Labels

Modeling daytime changes in high resolution photographs, e.g., re-render...

JSI-GAN: GAN-Based Joint Super-Resolution and Inverse Tone-Mapping with Pixel-Wise Task-Specific Filters for UHD HDR Video

Joint learning of super-resolution (SR) and inverse tone-mapping (ITM) h...

Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis

We describe a novel attribution method which is grounded in Sensitivity ...

Visual Debates

The natural way of obtaining different perspectives on any given topic i...

Please sign up or login with your details

Forgot password? Click here to reset