Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays

03/30/2021
by   Xiaosong Wang, et al.
0

Pre-trained models, e.g., from ImageNet, have proven to be effective in boosting the performance of many downstream applications. It is too demanding to acquire large-scale annotations to build such models for medical imaging. Meanwhile, there are numerous clinical data (in the form of images and text reports) stored in the hospital information systems. The paired image-text data from the same patient study could be utilized for the pre-training task in a weakly supervised manner. However, the integrity, accessibility, and amount of such raw data vary across different institutes, e.g., paired vs. unpaired (image-only or text-only). In this work, we introduce an image-text pre-training framework that can learn from these raw data with mixed data inputs, i.e., paired image-text data, a mixture of paired and unpaired data. The unpaired data can be sourced from one or multiple institutes (e.g., images from one institute coupled with texts from another). Specifically, we propose a transformer-based training framework for jointly learning the representation of both the image and text data. In addition to the existing masked language modeling, multi-scale masked vision modeling is introduced as a self-supervised training task for image patch regeneration. We not only demonstrate the feasibility of pre-training across mixed data inputs but also illustrate the benefits of adopting such pre-trained models in 3 chest X-ray applications, i.e., classification, retrieval, and image regeneration. Superior results are reported in comparison to prior art using MIMIC-CXR, NIH14-CXR, and OpenI-CXR datasets.

READ FULL TEXT
research
10/24/2020

Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions

Pre-trained contextual vision-and-language (V L) models have brought i...
research
08/15/2023

Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images

Pre-training datasets, like ImageNet, have become the gold standard in m...
research
03/30/2023

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

This paper explores training medical vision-language models (VLMs) – whe...
research
09/04/2021

Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

Self-supervised learning provides an opportunity to explore unlabeled ch...
research
01/05/2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

In this paper, we consider the problem of enhancing self-supervised visu...
research
01/05/2023

Event Camera Data Pre-training

This paper proposes a pre-trained neural network for handling event came...
research
04/11/2023

ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity

Deep learning has shown great potential in assisting radiologists in rea...

Please sign up or login with your details

Forgot password? Click here to reset