HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image

by   Zhuchen Shao, et al.

Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL). Due to the vast amount of data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped property of WSI, it is difficult to fully explore spatial, contextual, and hierarchical interaction in the patient-level bag. Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag. In this work, we propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information, strengthen WSI-level context-aware communication, and establish patient-level hierarchical interaction. Firstly, we design a feature pre-processing strategy, including feature rearrangement and random window masking. Then, we devise three layers to progressively obtain patient-level representation, including a local-level interaction layer adopting Manhattan distance, a WSI-level interaction layer employing spatial shuffle, and a patient-level interaction layer using attention pooling. Moreover, the design of hierarchical network helps the model become more computationally efficient. Finally, we validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30 the prior weakly supervised methods over 6 TCGA datasets. Ablation study and attention visualization further verify the superiority of the proposed HVTSurv. Implementation is available at: https://github.com/szc19990412/HVTSurv.


page 3

page 7

page 11


Whole Slide Images are 2D Point Clouds: Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks

Cancer prognostication is a challenging task in computational pathology ...

Whole Slide Images based Cancer Survival Prediction using Attention Guided Deep Multiple Instance Learning Networks

Traditional image-based survival prediction models rely on discriminativ...

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

Learning good representation of giga-pixel level whole slide pathology i...

AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

Processing giga-pixel whole slide histopathology images (WSI) is a compu...

EOCSA: Predicting Prognosis of Epithelial Ovarian Cancer with Whole Slide Histopathological Images

Ovarian cancer is one of the most serious cancers that threaten women ar...

Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction

Survival outcome assessment is challenging and inherently associated wit...

MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

The prediction of microsatellite instability (MSI) and microsatellite st...

Please sign up or login with your details

Forgot password? Click here to reset