Perspective - (2023) Volume 15, Issue 3
Received: 01-Mar-2023, Manuscript No. ipaom-23-13644; Editor assigned: 03-Mar-2023, Pre QC No. P-13644; Reviewed: 15-Mar-2023, QC No. Q-13644; Revised: 20-Mar-2023, Manuscript No. R-13644; Published: 27-Mar-2023
The degree of neurologic disability is a significant consequence in numerous research investigations based on electronic health records (EHR). Although structured EHR data can be extracted automatically, semi-structured or unstructured clinical notes kept by doctors, physical therapists, occupational therapists, and other healthcare professionals are usually reviewed manually to learn about neurologic outcomes. The majority of EHR-based neurologic outcome studies, however, are limited in their breadth by the laborious nature of chart review [1].
Clinical regular language handling (NLP) research means to foster mechanized ways to deal with EHR data extraction. The use of NLP in medical research is expanding rapidly. EHR data have been used to detect adverse medical events, adverse drug reactions, drug safety surveillance, and colorectal cancer. Information extracted from cancer pathology reports can also be used to extract medical issues for disease management. ICD-9 code assignment early prediction of diagnostic-related groups and estimation of hospital costs early prediction of acute kidney injury in the critical care setting early prediction of postoperative hospital stay based on operative reports in neurosurgery [2].
Clinical entity recognition and clinical entity relation extraction, temporal relation temporal matching semantic representation, de-identification of medical questions and answers, and dealing with text ambiguity, such as abbreviation disambiguation prediction of ambiguous terms, are among the tasks that can be accomplished using NLP and machine learning. In this section, we talk about how we came up with an NLP method to automatically extract neurological outcomes from notes from physical therapy, occupational therapy, and hospital discharge summaries. The one-vs-rest multi-classification scheme is used to develop multiclass logistic regression models, and LASSO regularization is used to reduce dimensionality. On two widely used scales, our models assign neurologic outcomes: The modified Rankin Scale (mRS) and the Glasgow Coma Scale [3].
The models are designed to classify GOS into four categories: good recovery; moderate disability; severe disability; death; and mRS into seven categories: No symptoms; no significant disability; slight disability; moderate disability; severe disability; moderately severe disability; death. The models are developed with a balanced class weight due to the classes' imbalance on both scales. We demonstrate that our NLP algorithm is a useful tool for large-scale EHR-based research on neurologic outcomes by demonstrating that the model performs with acceptable accuracy.
To account for situations where notes are only captured in the system after discharge, all notes were extracted for the time between each patient admission and three days after hospital discharge. We chose the physical therapy and occupational therapy records with the corresponding date that was the closest to the discharge date for each patient, together with the discharge summary. Afterwards, for analysis, these notes were combined into one. In this part, we also discuss the various kinds of reports that can be found in both physical therapy and occupational therapy notes [4].
We demonstrate that our NLP algorithm is a useful tool for large-scale EHR-based research on neurologic outcomes by demonstrating that the model performs with acceptable accuracy. After lowercasing the notes, punctuation, visit dates, birth dates, special characters, empty spaces, and numerical digits were removed. Similar to what was done in a previous study we produced reduced versions for discharge summaries by utilizing additional preprocessing to extract meaningful information from these lengthy the occupational and physical therapy report notes for each patient were then combined with the reduced discharge summaries.
After that, the merged notes were tokenized, removing patient names, addresses, healthcare facilities, and hospital units' names as well as single letters to make room for only words. The notes were then lemmatized using WordNetLemmatizer in Python with a POS tag that was specified as verb from the NLTK library. In other words, the various forms of the same word were reduced to a common root, or "lemma." Finally, a brief list of frequently used clinical terms was subjected to spelling correction and abbreviation expansion, as shown in Supplementary [5].
The pre-processed merged notes were divided into train and test sets, and the training vocabulary was made from notes in the training set. Using the Python function CountVectorizer, a BoW model was used to represent each patient's notes as a binary vector, disregarding grammar and word order, indicating the presence of a particular n-gram (single word or sequence of two or three words). Finally, sparsifying the model with multi-class logistic regression with the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996) was used to reduce dimensionality by only taking into account words that were present in at least 10% of the notes in the training set. Notes from the test set were transformed into feature vectors using the same method. Note that the training data set served as the sole foundation for the feature extraction procedure.
None
None
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at
Google Scholar, Crossref, Indexed at