Background The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. Objective We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. Methods The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. Results The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. Conclusions The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence–driven CDS. Trial Registration ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480
Protecting the integrity of medical records is critical to patients, medical professionals, governments, insurance companies, and hospital systems. Integrity of medical records is protecting medical data against accidental or fraudulent changes. Owing to the onset of electronic medical records in physician offices and health-care facilities, increasingly mandated or incentivized by federal Acts in the USA, ensuring integrity of records in an electronic environment has become far more important than in the days of paper-based records. In this research, we propose an innovative Merkle tree-based approach to protecting the integrity of medical records and describe its implementation. The software architecture closely mimics Blockchain technology and is designed to be deployed in a private network setting. The salient features of our approach include simplification of the Blockchain technology by avoiding the use of mining, and as a replacement of traditional audit-trails by its cryptographically secure counterpart. This paper discusses in detail the design and implementation of the application and a prototype testing on a subset of MIMIC-III database, which comprises de-identified medical records of 40,000 critical-care patients. Experimental results show that Merkle-tree based approach to storing medical records is very robust, protects against various kinds of changes (intentional or accidental) and has little overhead when compared with other approaches to ensuring integrity.
BACKGROUND The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery. OBJECTIVE We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool. METHODS The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan. RESULTS The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR. CONCLUSIONS The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence–driven CDS. CLINICALTRIAL ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480
Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.
Abstract Background Opioid misuse screening in hospitals is resource-intensive and rarely done. Many hospitalized patients are never offered opioid treatment. An automated approach leveraging routinely captured electronic health record (EHR) data may be easier for hospitals to institute. We previously derived and internally validated an opioid classifier in a separate hospital setting. The aim is to externally validate our previously published and open-source machine-learning classifier at a different hospital for identifying cases of opioid misuse. Methods An observational cohort of 56,227 adult hospitalizations was examined between October 2017 and December 2019 during a hospital-wide substance use screening program with manual screening. Manually completed Drug Abuse Screening Test served as the reference standard to validate a convolutional neural network (CNN) classifier with coded word embedding features from the clinical notes of the EHR. The opioid classifier utilized all notes in the EHR and sensitivity analysis was also performed on the first 24 h of notes. Calibration was performed to account for the lower prevalence than in the original cohort. Results Manual screening for substance misuse was completed in 67.8% (n = 56,227) with 1.1% (n = 628) identified with opioid misuse. The data for external validation included 2,482,900 notes with 67,969 unique clinical concept features. The opioid classifier had an AUC of 0.99 (95% CI 0.99–0.99) across the encounter and 0.98 (95% CI 0.98–0.99) using only the first 24 h of notes. In the calibrated classifier, the sensitivity and positive predictive value were 0.81 (95% CI 0.77–0.84) and 0.72 (95% CI 0.68–0.75). For the first 24 h, they were 0.75 (95% CI 0.71–0.78) and 0.61 (95% CI 0.57–0.64). Conclusions Our opioid misuse classifier had good discrimination during external validation. Our model may provide a comprehensive and automated approach to opioid misuse identification that augments current workflows and overcomes manual screening barriers.
Opioid misuse is a major public health
problem in the world. In 2016, 11.3 million people were reported to
misuse opioids in the US only. Opioid-related inpatient and
emergency department visits have increased by 64 percent and the
rate of opioid-related visits has nearly doubled between 2009 and
2014. It is thus critical for healthcare systems to detect opioid
misuse cases. Patients hospitalized for consequences of their
opioid misuse present an opportunity for intervention but better
screening and surveillance methods are needed to guide providers.
The current screening methods with self-report questionnaire data
are time-consuming and difficult to perform in hospitalized
patients. In this work, I explore the use of convolutional neural
networks for detecting opioid misuse cases using the text of
electronic health records as input. The performance of these models
is compared to the performance of a more traditional logistic
regression model. Different architectures of a convolutional neural
network were trained and evaluated using the area under the ROC
curve. A convolutional neural network performed better by producing
a score of 93.4% whereas the score produced by logistic regression
was 91.4% on the test data. Different advantages and disadvantages
of using a convolutional neural network over the baseline logistic
regression model were also discussed.
Abstract Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. Methods An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling ( n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Results Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. Conclusions We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
Background: Substance misuse is a heterogeneous and complex set of behavioral conditions that are highly prevalent in hospital settings and frequently co-occur. Few solutions exist to comprehensively and reliably identify these conditions hospital-wide to prioritize care and guide treatment. The aim is apply natural language processing (NLP) to admission notes in the electronic health record (EHR) to accurately screen for substance misuse.Methods: The reference dataset was derived from a hospital-wide program that used structured diagnostic interviews to manually screen admitted patients over 26 months (n=54,915). Temporal validation was provided over the subsequent 12 months (n=16,917) and external validation at a separate health system (n=1,991). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 hours of notes in the EHR were mapped to standardized medical vocabulary and fed into neural network models. The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating curve (AUROC).Findings: The model was trained on a cohort that had 3.5% (n=1,921) with any type of substance misuse. Nearly 11% of patients with substance misuse had more than one type of misuse. The multi-label convolutional neural network classifier had an average AUROC of 0.97 (95% CI: 0.96, 0.98) during temporal validation for all types of substance misuse. The model was well calibrated and demonstrated good face validity with model features containing explicit mentions of aberrant drug-taking behavior. The false-negative and false-positive rates were similar between non-Hispanic Black and non-Hispanic White groups. In external validation, the AUROC for alcohol and opioid misuse remained above 0.85.Interpretation: We developed a novel and accurate approach to leveraging the first 24 hours of EHR notes for screening multiple types of substance misuse.Funding Information: Research reported in this publication was supported by the National Institute On Drug Abuse of the National Institutes of Health under Award Numbers R01-DA051464 (MA), K23-AA024503 (MA), UL1-TR002389 (NK), KL2-TR002387 (NK), R01-DA041071 (NK), UG1-DA049467 (NK), R01-LM010090 (DD), R01-LM012973 (DD), K12-HS-026385 (HT), and R01-GM123193 (MMC).Declaration of Interests: Dr. Churpek has a patent pending (ARCD. P0535US.P2) for risk stratification algorithms for hospitalized patients, and has received research support from EarlySense (Tel Aviv, Israel). All other authors have nothing to declare. Ethics Approval Statement: This study was approved by the Institutional Review Board at RUMC and LUMC.