MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering

11/11/2022
by   Shanshan Song, et al.
0

There is a key problem in the medical visual question answering task that how to effectively realize the feature fusion of language and medical images with limited datasets. In order to better utilize multi-scale information of medical images, previous methods directly embed the multi-stage visual feature maps as tokens of same size respectively and fuse them with text representation. However, this will cause the confusion of visual features at different stages. To this end, we propose a simple but powerful multi-stage feature fusion method, MF2-MVQA, which stage-wise fuses multi-level visual features with textual semantics. MF2-MVQA achieves the State-Of-The-Art performance on VQA-Med 2019 and VQA-RAD dataset. The results of visualization also verify that our model outperforms previous work.

READ FULL TEXT

page 2

page 4

research
07/07/2021

MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering

Medical Visual Question Answering (VQA) is a multi-modal challenging tas...
research
08/08/2018

Question-Guided Hybrid Convolution for Visual Question Answering

In this paper, we propose a novel Question-Guided Hybrid Convolution (QG...
research
06/02/2022

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

This paper revisits visual representation in knowledge-based visual ques...
research
03/10/2023

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Medical Visual Question Answering (VQA) is an important challenge, as it...
research
02/19/2023

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically releva...
research
01/20/2020

Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

Visual Question Answering (VQA) has emerged as a Visual Turing Test to v...
research
12/27/2021

Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

Contrastive Language–Image Pre-training (CLIP) has shown remarkable succ...

Please sign up or login with your details

Forgot password? Click here to reset