SELFIES and the future of molecular string representations

03/31/2022
by   Mario Krenn, et al.
0

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings – most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.

READ FULL TEXT

page 3

page 5

research
11/23/2022

Group SELFIES: A Robust Fragment-Based Molecular String Representation

We introduce Group SELFIES, a molecular string representation that lever...
research
12/22/2022

Realizing Molecular Machine Learning through Communications for Biological AI: Future Directions and Challenges

Artificial Intelligence (AI) and Machine Learning (ML) are weaving their...
research
05/31/2019

SELFIES: a robust representation of semantically constrained graphs with an example application in chemistry

Graphs are ideal representations of complex, relational information. The...
research
02/07/2023

Recent advances in the Self-Referencing Embedding Strings (SELFIES) library

String-based molecular representations play a crucial role in cheminform...
research
05/28/2020

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Researchers across the globe are seeking to rapidly repurpose existing d...
research
05/30/2019

All SMILES VAE

Variational autoencoders (VAEs) defined over SMILES string and graph-bas...
research
12/09/2021

Bringing Atomistic Deep Learning to Prime Time

Artificial intelligence has not yet revolutionized the design of materia...

Please sign up or login with your details

Forgot password? Click here to reset