Evaluation of data quality of multisite electronic health record data for secondary analysis

2015 
Currently, a large amount of data is amassed in electronic health records (EHRs). However, EHR systems are largely information silos, that is, uses of these systems are often confined to management of patient information and analytics specific to a clinician's practice. A growing trend in healthcare is combining multiple databases to support epidemiological research. The College Health Surveillance Network is the first national data warehouse containing EHR data from 31 different student health centers. Each member university contributes to the data warehouse by uploading select EHR data including patient demographics, diagnoses, and procedures to a common server on a monthly basis. In this paper, we focus on the data quality dimensions from a subsample of the data comprised of over 5.7 million patient visits for approximately 980,000 patients with 4,465 unique diagnoses from 23 of those universities. We examine the data for measures of completeness, consistency, and availability for secondary use for epidemiological research. Additionally, clinical documentation practices and EHR vendor were evaluated as potential contributors to data quality. We found that overall about 70% of the data in the data warehouse is available for secondary use, and identified clinical documentation practices that are correlated to a reduction in data quality. This suggests that automated quality control and proactive clinical documentation support could reduce ad-hoc data cleaning needs resulting in greater data availability for secondary use.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    13
    Citations
    NaN
    KQI
    []