OCR-VQGAN: Taming Text-within-Image Generation

by   Juan A. Rodriguez, et al.

Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure. To explore our approach, we introduce the Paper2Fig100k dataset, with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them. We demonstrate the effectiveness of OCR-VQGAN by conducting several experiments on the task of figure reconstruction. Additionally, we explore the qualitative and quantitative impact of weighting different perceptual metrics in the overall loss function. We release code, models, and dataset at https://github.com/joanrod/ocr-vqgan.


page 4

page 7

page 12

page 13

page 14

page 15


DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

This paper introduces DreamDiffusion, a novel method for generating high...

Semantically Invariant Text-to-Image Generation

Image captioning has demonstrated models that are capable of generating ...

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Recent text-to-image generation models have demonstrated impressive capa...

RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing

Reaction diagram parsing is the task of extracting reaction schemes from...

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

Diagram object detection is the key basis of practical applications such...

FigGen: Text to Scientific Figure Generation

The generative modeling landscape has experienced tremendous growth in r...

Extracting Formal Models from Normative Texts

We are concerned with the analysis of normative texts - documents based ...

Please sign up or login with your details

Forgot password? Click here to reset