PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

05/17/2023
by   Xiaoman Zhang, et al.
10

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve.

READ FULL TEXT

page 2

page 5

page 6

page 13

page 14

research
09/20/2023

Visual Question Answering in the Medical Domain

Medical visual question answering (Med-VQA) is a machine learning task t...
research
03/10/2023

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Medical Visual Question Answering (VQA) is an important challenge, as it...
research
09/27/2022

RepsNet: Combining Vision with Language for Automated Medical Reports

Writing reports by analyzing medical images is error-prone for inexperie...
research
03/09/2023

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

State recognition of objects and environment in robots has been conducte...
research
02/18/2021

SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering

Medical visual question answering (Med-VQA) has tremendous potential in ...
research
06/08/2023

Modular Visual Question Answering via Code Generation

We present a framework that formulates visual question answering as modu...
research
02/25/2023

Medical visual question answering using joint self-supervised learning

Visual Question Answering (VQA) becomes one of the most active research ...

Please sign up or login with your details

Forgot password? Click here to reset