Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification

11/14/2022

∙

In digital pathology, Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem. Although transformer-based architectures have been used for WSI classification, these methods require modifications to adapt them to specific challenges of this type of image data. Despite their power across domains, reference transformer models in classical Computer Vision (CV) and Natural Language Processing (NLP) tasks are not used for pathology slide analysis. In this work we demonstrate the use of standard, frozen, text-pretrained, transformer language models in application to WSI classification. We propose SeqShort, a multi-head attention-based sequence reduction input layer to summarize each WSI in a fixed and short size sequence of instances. This allows us to reduce the computational costs of self-attention on long sequences, and to include positional information that is unavailable in other MIL approaches. We demonstrate the effectiveness of our methods in the task of cancer subtype classification, without the need of designing a WSI-specific transformer or performing in-domain self-supervised pretraining, while keeping a reduced compute budget and number of trainable parameters.

READ FULL TEXT

Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification

Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress

Visualizing Attention in Transformer-Based Language models

Automated essay scoring using efficient transformer-based language models

Relational Learning between Multiple Pulmonary Nodules via Deep Set Attention Transformers

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Transformer Interpretability Beyond Attention Visualization

Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification

Related Research

Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress

Visualizing Attention in Transformer-Based Language models

Automated essay scoring using efficient transformer-based language models

Relational Learning between Multiple Pulmonary Nodules via Deep Set Attention Transformers

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Transformer Interpretability Beyond Attention Visualization