X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation

02/22/2023
by   Tom van Sonsbeek, et al.
0

An important component of human analysis of medical images and their context is the ability to relate newly seen things to related instances in our memory. In this paper we mimic this ability by using multi-modal retrieval augmentation and apply it to several tasks in chest X-ray analysis. By retrieving similar images and/or radiology reports we expand and regularize the case at hand with additional knowledge, while maintaining factual knowledge consistency. The method consists of two components. First, vision and language modalities are aligned using a pre-trained CLIP model. To enforce that the retrieval focus will be on detailed disease-related content instead of global visual appearance it is fine-tuned using disease class information. Subsequently, we construct a non-parametric retrieval index, which reaches state-of-the-art retrieval levels. We use this index in our downstream tasks to augment image representations through multi-head attention for disease classification and report retrieval. We show that retrieval augmentation gives considerable improvements on these tasks. Our downstream report retrieval even shows to be competitive with dedicated report generation methods, paving the path for this method in medical imaging.

READ FULL TEXT
research
05/24/2021

Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Recently a number of studies demonstrated impressive performance on dive...
research
05/13/2023

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training

In recent years, the growing demand for medical imaging diagnosis has pl...
research
03/21/2023

LIMITR: Leveraging Local Information for Medical Image-Text Representation

Medical imaging analysis plays a critical role in the diagnosis and trea...
research
04/16/2021

Cross-Modal Retrieval Augmentation for Multi-Modal Classification

Recent advances in using retrieval components over external knowledge so...
research
12/30/2021

Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment

In clinics, a radiology report is crucial for guiding a patient's treatm...
research
11/21/2018

Unsupervised Multimodal Representation Learning across Medical Images and Reports

Joint embeddings between medical imaging modalities and associated radio...
research
08/10/2021

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Vision-and-language(V L) models take image and text as input and learn...

Please sign up or login with your details

Forgot password? Click here to reset