Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

by   Jiatong Li, et al.

Molecule discovery plays a crucial role in various scientific fields, advancing the design of tailored materials and drugs. Traditional methods for molecule discovery follow a trial-and-error process, which are both time-consuming and costly, while computational approaches such as artificial intelligence (AI) have emerged as revolutionary tools to expedite various tasks, like molecule-caption translation. Despite the importance of molecule-caption translation for molecule discovery, most of the existing methods heavily rely on domain experts, require excessive computational cost, and suffer from poor performance. On the other hand, Large Language Models (LLMs), like ChatGPT, have shown remarkable performance in various cross-modal tasks due to their great powerful capabilities in natural language understanding, generalization, and reasoning, which provides unprecedented opportunities to advance molecule discovery. To address the above limitations, in this work, we propose a novel LLMs-based framework (MolReGPT) for molecule-caption translation, where a retrieval-based prompt paradigm is introduced to empower molecule discovery with LLMs like ChatGPT without fine-tuning. More specifically, MolReGPT leverages the principle of molecular similarity to retrieve similar molecules and their text descriptions from a local database to ground the generation of LLMs through in-context few-shot molecule learning. We evaluate the effectiveness of MolReGPT via molecule-caption translation, which includes molecule understanding and text-based molecule generation. Experimental results show that MolReGPT outperforms fine-tuned models like MolT5-base without any additional training. To the best of our knowledge, MolReGPT is the first work to leverage LLMs in molecule-caption translation for advancing molecule discovery.


page 5

page 17


Translation between Molecules and Natural Language

Joint representations between images and text have been deeply investiga...

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Generative Large Language Models (LLMs) have achieved remarkable advance...

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation

Recently, pretrained language models (PLMs) have made exceptional succes...

From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

Molecule discovery serves as a cornerstone in numerous scientific domain...

ChemCrow: Augmenting large-language models with chemistry tools

Large-language models (LLMs) have recently shown strong performance in t...

Large Language Models as Data Preprocessors

Large Language Models (LLMs), typified by OpenAI's GPT series and Meta's...

What Makes Good In-Context Examples for GPT-3?

GPT-3 has attracted lots of attention due to its superior performance ac...

Please sign up or login with your details

Forgot password? Click here to reset