Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation

by   Yinda Chen, et al.

Vision-Language Pretraining (VLP) has demonstrated remarkable capabilities in learning visual representations from textual descriptions of images without annotations. Yet, effective VLP demands large-scale image-text pairs, a resource that suffers scarcity in the medical domain. Moreover, conventional VLP is limited to 2D images while medical images encompass diverse modalities, often in 3D, making the learning process more challenging. To address these challenges, we present Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation (GTGM), a framework that extends of VLP to 3D medical images without relying on paired textual descriptions. Specifically, GTGM utilizes large language models (LLM) to generate medical-style text from 3D medical images. This synthetic text is then used to supervise 3D visual representation learning. Furthermore, a negative-free contrastive learning objective strategy is introduced to cultivate consistent visual representations between augmented 3D medical image patches, which effectively mitigates the biases associated with strict positive-negative sample pairings. We evaluate GTGM on three imaging modalities - Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and electron microscopy (EM) over 13 datasets. GTGM's superior performance across various medical image segmentation tasks underscores its effectiveness and versatility, by enabling VLP extension into 3D medical imagery while bypassing the need for paired text.


page 4

page 9

page 15

page 16

page 17


Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Medical Image Segmentation is crucial in various clinical applications w...

Contrastive Learning of Medical Visual Representations from Paired Images and Text

Learning visual representations of medical images is core to medical ima...

MEDIMP: Medical Images and Prompts for renal transplant representation learning

Renal transplantation emerges as the most effective solution for end-sta...

Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation

Medical reports with substantial information can be naturally complement...

Increasing Textual Context Size Boosts Medical Image-Text Matching

This short technical report demonstrates a simple technique that yields ...

M3D-NCA: Robust 3D Segmentation with Built-in Quality Control

Medical image segmentation relies heavily on large-scale deep learning m...

Modern Convex Optimization to Medical Image Analysis

Recently, diagnosis, therapy and monitoring of human diseases involve a ...

Please sign up or login with your details

Forgot password? Click here to reset