MolMiner: You only look once for chemical structure recognition

by   Youjun Xu, et al.

Molecular structures are always depicted as 2D printed form in scientific documents like journal papers and patents. However, these 2D depictions are not machine-readable. Due to a backlog of decades and an increasing amount of these printed literature, there is a high demand for the translation of printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades follow a rule-based approach where the key step of vectorization of the depiction is based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software MolMiner, which is primarily built up using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with distance-based construction algorithm. We carefully evaluate our software on four benchmark datasets with the state-of-the-art performance. Various real application scenarios are also tested, yielding satisfactory outcomes. The free download links of Mac and Windows versions are available: Mac: and Windows:


page 1

page 2

page 3

page 4


Molecular Structure Extraction From Documents Using Deep Learning

Chemical structure extraction from documents remains a hard problem due ...

Image-to-Graph Transformers for Chemical Structure Recognition

For several decades, chemical knowledge has been published in written te...

MolGrapher: Graph-based Visual Recognition of Chemical Structures

The automatic analysis of chemical literature has immense potential to a...

Electronic Visualisation in Chemistry: From Alchemy to Art

Chemists now routinely use software as part of their work. For example, ...

Icospherical Chemical Objects (ICOs) allow for chemical data augmentation and maintain rotational, translation and permutation invariance

Dataset augmentation is a common way to deal with small datasets; Chemis...

ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning

In drug discovery, knowledge of the graph structure of chemical compound...

Please sign up or login with your details

Forgot password? Click here to reset