Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping
2019
Abstract Objectives Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Methods Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Results Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. Conclusion This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
39
References
1
Citations
NaN
KQI