A Delayed Instantiation Approach to Template-Driven Provenance for Electronic Health Record Phenotyping

2020 
Provenance templates are an established methodology for the capture of provenance data. Each template defines the provenance of a domain-specific action in abstract form, which may then be instantiated as required by a single call to a given service interface. This approach, whilst simplifying the process of recording provenance for the user, introduces computational and storage demands on the capture process, particularly when used by clients with write-intensive provenance requirements such as other service-based software. To address these issues, we adopt a new approach based upon delayed instantiation and present a revised, two-part paradigm for template-driven provenance, in which we separate capture and query functionality to improve the overall efficiency of the model. A dedicated capture service is first employed to record template service requests in a relational database in the form of a meta-level description detailing the construction of each document. These low-overhead records are then accessed by an independent query service to construct views of concrete provenance documents for specific time frames as and when required by the user. These views may subsequently be analysed using query templates, a new technique defined here whereby templates can also be used to search for any matching subgraphs within a document and return the respective instantiating substitutions. We evaluate the performance gains of our new system in the context of Phenoflow, an electronic health record (EHR) phenotyping platform.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []