Abstract
Background: Over the last decade, natural language processing (NLP) has provided various solutions for information extraction (IE) from textual clinical data. In recent years, the use of NLP in cancer research has gained considerable attention, with numerous studies exploring the effectiveness of various NLP techniques for identifying and extracting cancer-related entities from clinical text data. Objective: We aimed to summarize the performance differences between various NLP models for IE within the context of cancer to provide an overview of the relative performance of existing models. Methods: This systematic literature review was conducted using 3 databases (PubMed, Scopus, and Web of Science) to search for articles extracting cancer-related entities from clinical texts. In total, 33 articles were eligible for inclusion. We extracted NLP models and their performance by F1-scores. Each model was categorized into the following categories: rule-based, traditional machine learning, conditional random field-based, neural network, and bidirectional transformer (BT). The average of the performance difference for each combination of categorizations was calculated across all articles. Results: The articles covered various scenarios, with the best performance for each article ranging from 0.355 to 0.985 in F1-score. Examining the overall relative performances, the BT category outperformed every other category (average F1-score between 0.2335 and 0.0439). The percentage of articles on implementing BTs has increased over the years. Conclusions: NLP has demonstrated the ability to identify and extract cancer-related entities from unstructured textual data. Generally, more advanced models outperform less advanced ones. The BT category performed the best.
| Original language | English |
|---|---|
| Article number | e68707 |
| Journal | JMIR Medical Informatics |
| Volume | 13 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© Simon Dahl, Martin Bøgsted, Tomer Sagi, Charles Vesteghem.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- F-score
- bidirectional transformer
- clinical textual data
- information extraction
- natural language processing
- neural network
- performance
- review
- rule-based solutions
- traditional machine learning
ASJC Scopus subject areas
- Health Informatics
- Health Information Management
Fingerprint
Dive into the research topics of 'Performance of Natural Language Processing for Information Extraction From Electronic Health Records Within Cancer: Systematic Review'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver