Abstract
Purpose: To evaluate the feasibility and diagnostic performance of a multimodal large language model (GPT-4.1) in detecting pediatric airway foreign body aspiration (FBA) using real-world clinical and radiographic data. Methods: This retrospective cohort study included 58 pediatric patients evaluated for suspected airway FBA at a tertiary academic hospital between 2015 and 2024. Each case combined structured clinical data and chest radiographs obtained at the time of emergency-department presentation, with bronchoscopy serving as the diagnostic reference standard. GPT-4.1, a vision-enabled large language model, classified cases as right-bronchus aspiration, left-bronchus aspiration, or no aspiration. Model performance was assessed using accuracy, precision, recall, and F1-score. Results: The model achieved an overall accuracy of 62.3%, with precision of 23.3%, recall of 19.0%, and an F1-score of 0.21. While it correctly identified 34 of 46 cases without aspiration, it detected only 4 of 12 confirmed bronchial-aspiration cases and missed all left-bronchus aspirations. Conclusions: This proof-of-concept feasibility study highlights substantial limitations of a general-purpose multimodal AI model in pediatric airway triage. The low recall and high misclassification rates suggest that vision-enabled language models require task-specific training and rigorous validation before clinical implementation. Nevertheless, when used as an adjunct rather than a replacement for bronchoscopy, such models may eventually support triage decisions in resource-limited settings if further optimized and prospectively validated.
| Original language | English |
|---|---|
| Journal | European Archives of Oto-Rhino-Laryngology |
| DOIs | |
| State | Accepted/In press - 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
Keywords
- Bronchoscopy
- Chest radiograph interpretation
- Diagnostic accuracy
- Foreign body aspiration
- Large language model
- Pediatric airway
ASJC Scopus subject areas
- Otorhinolaryngology
Fingerprint
Dive into the research topics of 'Cautionary lessons from real-world testing of GPT-4.1 AI for pediatric foreign body aspiration'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver