Recent advances in the Self-Referencing Embedding Strings (SELFIES) library
String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencIng Embedded Strings (SELFIES), was proposed that is inherently 100 implementation. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of , where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of (version 2.1.1) in this manuscript.
READ FULL TEXT