UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

04/01/2022
by   Ali Hatamizadeh, et al.
0

Vision Transformers (ViT)s have recently become popular due to their outstanding modeling capabilities, in particular for capturing long-range information, and scalability to dataset and model sizes which has led to state-of-the-art performance in various computer vision and medical image analysis tasks. In this work, we introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Convolutional Neural Network (CNN) and transformer-based decoders. In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision. The design of proposed architecture allows for meeting a wide range of trade-off requirements between accuracy and computational cost. In addition, we present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked volumetric tokens using contextual information of visible tokens. We pre-train our framework on a cohort of 5050 CT images, gathered from publicly available CT datasets, and present a systematic investigation of various components such as masking ratio and patch size that affect the representation learning capability and performance of downstream tasks. We validate the effectiveness of our pre-training approach by fine-tuning and testing our model on liver and liver tumor segmentation task using the Medical Segmentation Decathlon (MSD) dataset and achieve state-of-the-art performance in terms of various segmentation metrics. To demonstrate its generalizability, we train and test the model on BraTS 21 dataset for brain tumor segmentation using MRI images and outperform other methods in terms of Dice score. Code: https://github.com/Project-MONAI/research-contributions

READ FULL TEXT

page 8

page 9

research
11/29/2021

Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Vision Transformers (ViT)s have shown great performance in self-supervis...
research
08/31/2023

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation

Detection of tumors in metastatic colorectal cancer (mCRC) plays an esse...
research
05/20/2022

Self-supervised 3D anatomy segmentation using self-distilled masked image transformer (SMIT)

Vision transformers, with their ability to more efficiently model long-r...
research
11/16/2022

Prompt Tuning for Parameter-efficient Medical Image Segmentation

Neural networks pre-trained on a self-supervision scheme have become the...
research
10/14/2022

Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation

The adaptation of transformers to computer vision is not straightforward...
research
12/13/2022

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

The combination of transformers and masked image modeling (MIM) pre-trai...
research
06/29/2023

Learning Nuclei Representations with Masked Image Modelling

Masked image modelling (MIM) is a powerful self-supervised representatio...

Please sign up or login with your details

Forgot password? Click here to reset