Visio-Linguistic Brain Encoding

04/18/2022
by   Subba Reddy Oota, et al.
3

Enabling effective brain-computer interfaces requires understanding how the human brain encodes stimuli across modalities such as visual, language (or text), etc. Brain encoding aims at constructing fMRI brain activity given a stimulus. There exists a plethora of neural encoding models which study brain encoding for single mode stimuli: visual (pretrained CNNs) or text (pretrained language models). Few recent papers have also obtained separate visual and text representation models and performed late-fusion using simple heuristics. However, previous work has failed to explore: (a) the effectiveness of image Transformer models for encoding visual stimuli, and (b) co-attentive multi-modal modeling for visual and text reasoning. In this paper, we systematically explore the efficacy of image Transformers (ViT, DEiT, and BEiT) and multi-modal Transformers (VisualBERT, LXMERT, and CLIP) for brain encoding. Extensive experiments on two popular datasets, BOLD5000 and Pereira, provide the following insights. (1) To the best of our knowledge, we are the first to investigate the effectiveness of image and multi-modal Transformers for brain encoding. (2) We find that VisualBERT, a multi-modal Transformer, significantly outperforms previously proposed single-mode CNNs, image Transformers as well as other previously proposed multi-modal models, thereby establishing new state-of-the-art. The supremacy of visio-linguistic models raises the question of whether the responses elicited in the visual regions are affected implicitly by linguistic processing even when passively viewing images. Future fMRI tasks can verify this computational insight in an appropriate experimental setting.

READ FULL TEXT

page 6

page 7

page 12

page 14

page 17

page 18

research
05/03/2022

Neural Language Taskonomy: Which NLP Tasks are the most Predictive of fMRI Brain Activity?

Several popular Transformer based language models have been found to be ...
research
02/25/2023

BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI

Reconstructing perceived natural images or decoding their categories fro...
research
05/20/2023

Brain encoding models based on multimodal transformers can transfer across language and vision

Encoding models have been used to assess how the human brain represents ...
research
04/04/2019

Robust Evaluation of Language-Brain Encoding Experiments

Language-brain encoding experiments evaluate the ability of language mod...
research
12/11/2021

Multimodal neural networks better explain multivoxel patterns in the hippocampus

The human hippocampus possesses "concept cells", neurons that fire when ...
research
08/02/2023

Memory Encoding Model

We explore a new class of brain encoding model by adding memory-related ...
research
05/24/2022

Highly Accurate FMRI ADHD Classification using time distributed multi modal 3D CNNs

This work proposes an algorithm for fMRI data analysis for the classific...

Please sign up or login with your details

Forgot password? Click here to reset