Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

by   Bingqian Lin, et al.

Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks, which is very practical in the medical domain. It can significantly reduce the requirement of large amounts of task-specific data by sufficiently sharing medical knowledge among different tasks. However, due to the challenges of designing strongly generalizable models with limited and complex medical data, most existing approaches tend to develop task-specific models. To take a step towards MAGI, we propose a new paradigm called Medical-knOwledge-enhanced mulTimOdal pretRaining (MOTOR). In MOTOR, we combine two kinds of basic medical knowledge, i.e., general and specific knowledge, in a complementary manner to boost the general pretraining process. As a result, the foundation model with comprehensive basic knowledge can learn compact representations from pretraining radiographic data for better cross-modal alignment. MOTOR unifies the understanding and generation, which are two kinds of core intelligence of an AI system, into a single medical foundation model, to flexibly handle more diverse medical tasks. To enable a comprehensive evaluation and facilitate further research, we construct a medical multimodal benchmark including a wide range of downstream tasks, such as chest x-ray report generation and medical visual question answering. Extensive experiments on our benchmark show that MOTOR obtains promising results through simple task-oriented adaptation. The visualization shows that the injected knowledge successfully highlights key information in the medical data, demonstrating the excellent interpretability of MOTOR. Our MOTOR successfully mimics the human practice of fulfilling a "medical student" to accelerate the process of becoming a "specialist". We believe that our work makes a significant stride in realizing MAGI.


page 2

page 4

page 6

page 8


On the Challenges and Perspectives of Foundation Models for Medical Image Analysis

This article discusses the opportunities, applications and future direct...

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

This paper presents OmniVL, a new foundation model to support both image...

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

With recent progress in joint modeling of visual and textual representat...

Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare

In healthcare, multimodal data is prevalent and requires to be comprehen...

Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost

Medical artificial general intelligence (AGI) is an emerging field that ...

That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

Pretraining multimodal models on Electronic Health Records (EHRs) provid...

Risk of Bias in Chest X-ray Foundation Models

Foundation models are considered a breakthrough in all applications of A...

Please sign up or login with your details

Forgot password? Click here to reset