An Atomic Approach to the Design and Implementation of a Research Data Warehouse

2021 
ObjectiveAs a long-standing Clinical and Translational Science Awards (CTSA) Program hub, the University of Pittsburgh and the University of Pittsburgh Medical Center (UPMC) developed and implemented a modern research data warehouse (RDW) to efficiently provision electronic patient data for clinical and translational research. MethodsBecause UPMC is one of the largest health care systems in the US with multiple vendors electronic health record (EHR) systems, we designed and implemented an RDW named Neptune to serve the specific needs of our CTSA. Neptune uses an atomic design where data is stored at a high level of granularity as represented in source systems. Neptune contains robust patient identity management tailored for research; integrates patient data from multiple sources, including EHRs, health plans, and research studies; and includes knowledge for mapping to standard terminologies. Neptune enables efficient provisioning of data to large analytics-oriented data models and to individual investigators. ResultsNeptune contains data for more than 5 million patients longitudinally organized as HIPAA Limited Data with dates and includes structured EHR data, clinical documents, health insurance claims, and research data. Neptune is used as a source for patient data for hundreds of IRB-approved research projects by local investigators and for national projects such as the Accrual to Clinical Trials (ACT) network, the All of Us Research Program, and the National Patient-Centered Clinical Research Network. DiscussionThe design of Neptune was heavily influenced by the large size of UPMC, the varied data sources, and the rich partnership between the University and the healthcare system. It features several desiderata of an RDW, including robust protected health information management, an extensible information storage model, and binding to standard terminologies at the time of data delivery. It also includes several unique aspects, including the physical warehouse straddling the University of Pittsburgh and UPMC networks and management under a HIPAA Business Associates Agreement. ConclusionWe describe the design and implementation of an RDW at a large academic health care system that uses a distinctive atomic design where data is stored at a high level of granularity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []