Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study

Zohar Elyoseph, Inbar Levkovich

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The current paradigm in mental health care focuses on clinical recovery and symptom remission. This model's efficacy is influenced by therapist trust in patient recovery potential and the depth of the therapeutic relationship. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the health care field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective: This study aimed to evaluate the ability of large language models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without professional treatment and the long-term positive and negative outcomes. Methods: Vignettes were inputted into LLMs interfaces and assessed 10 times by 4 AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists, and mental health nurses) and the general public think about schizophrenia prognosis with and without professional treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results: For the prognosis of schizophrenia with professional treatment, ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude, and Bard aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional treatment. For long-term outcomes, ChatGPT-4 and Claude predicted more negative outcomes than Bard and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more pessimistic than Bard and ChatGPT-4. Conclusions: The finding that 3 out of the 4 LLMs aligned closely with the predictions of mental health professionals when considering the "with treatment"condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT-3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, although LLMs hold promise in augmenting health care, their application necessitates rigorous validation and a harmonious blend with human expertise.

Original languageEnglish
Article numbere53043
JournalJMIR Mental Health
Volume11
Issue number1
DOIs
StatePublished - 18 Mar 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© Zohar Elyoseph, Inbar Levkovich.

Keywords

  • artificial intelligence
  • ChatGPT
  • Generative Pre-trained Transformers
  • GPT
  • language model
  • language models
  • large language models
  • LLM
  • LLMs
  • mental
  • natural language processing
  • NLP
  • outcome
  • outcomes
  • prognosis
  • prognostic
  • prognostics
  • recovery
  • schizophrenia
  • vignette
  • vignettes

ASJC Scopus subject areas

  • Psychiatry and Mental health

Fingerprint

Dive into the research topics of 'Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study'. Together they form a unique fingerprint.

Cite this