Automatic Personalized Impression Generation for PET Reports Using Large Language Models

by   Xin Tie, et al.

Purpose: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Materials and Methods: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Results: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's rho correlations (0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89 of 4.08/5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). Conclusion: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting.


page 6

page 10

page 11

page 12

page 13

page 26

page 27

page 28


Domain-adapted large language models for classifying nuclear medicine reports

With the growing use of transformer-based language models in medicine, i...

This is not correct! Negation-aware Evaluation of Language Generation Systems

Large language models underestimate the impact of negations on how much ...

Artificial Interrogation for Attributing Language Models

This paper presents solutions to the Machine Learning Model Attribution ...

Assessing the efficacy of large language models in generating accurate teacher responses

(Tack et al., 2023) organized the shared task hosted by the 18th Worksho...

A Survey of Spanish Clinical Language Models

This survey focuses in encoder Language Models for solving tasks in the ...

Local Large Language Models for Complex Structured Medical Tasks

This paper introduces an approach that combines the language reasoning c...

Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models

All public companies are required by federal securities law to disclose ...

Please sign up or login with your details

Forgot password? Click here to reset