An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support

Chedva Levin, Tehilla Kagan, Shani Rosen, Mor Saban

Research output: Contribution to journalArticlepeer-review

Abstract

Aim: To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. Design: A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. Participants: 32 neonatal intensive care nurses with 5–10 years of experience working in the neonatal intensive care units of three medical centers. Methods: Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. Results: Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. Conclusions: While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. Impact: The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.

Original languageEnglish
Article number104771
JournalInternational Journal of Nursing Studies
Volume155
DOIs
StatePublished - Jul 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Keywords

  • Artificial Intelligence
  • ChatGPT
  • Claude
  • Clinical reasoning
  • Neonatal care

ASJC Scopus subject areas

  • General Nursing

Fingerprint

Dive into the research topics of 'An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support'. Together they form a unique fingerprint.

Cite this