Cautionary lessons from real-world testing of GPT-4.1 AI for pediatric foreign body aspiration

  • Sholem Hack
  • , Rebecca Attal
  • , Dana Elazar
  • , Yaniv Alon
  • , Raphael Meyuchas
  • , Adva Livne
  • , Ory Madgar
  • , Mor Saban

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: To evaluate the feasibility and diagnostic performance of a multimodal large language model (GPT-4.1) in detecting pediatric airway foreign body aspiration (FBA) using real-world clinical and radiographic data. Methods: This retrospective cohort study included 58 pediatric patients evaluated for suspected airway FBA at a tertiary academic hospital between 2015 and 2024. Each case combined structured clinical data and chest radiographs obtained at the time of emergency-department presentation, with bronchoscopy serving as the diagnostic reference standard. GPT-4.1, a vision-enabled large language model, classified cases as right-bronchus aspiration, left-bronchus aspiration, or no aspiration. Model performance was assessed using accuracy, precision, recall, and F1-score. Results: The model achieved an overall accuracy of 62.3%, with precision of 23.3%, recall of 19.0%, and an F1-score of 0.21. While it correctly identified 34 of 46 cases without aspiration, it detected only 4 of 12 confirmed bronchial-aspiration cases and missed all left-bronchus aspirations. Conclusions: This proof-of-concept feasibility study highlights substantial limitations of a general-purpose multimodal AI model in pediatric airway triage. The low recall and high misclassification rates suggest that vision-enabled language models require task-specific training and rigorous validation before clinical implementation. Nevertheless, when used as an adjunct rather than a replacement for bronchoscopy, such models may eventually support triage decisions in resource-limited settings if further optimized and prospectively validated.

Original languageEnglish
JournalEuropean Archives of Oto-Rhino-Laryngology
DOIs
StateAccepted/In press - 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.

Keywords

  • Bronchoscopy
  • Chest radiograph interpretation
  • Diagnostic accuracy
  • Foreign body aspiration
  • Large language model
  • Pediatric airway

ASJC Scopus subject areas

  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'Cautionary lessons from real-world testing of GPT-4.1 AI for pediatric foreign body aspiration'. Together they form a unique fingerprint.

Cite this