Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics

2018 
Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients’ records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    1
    Citations
    NaN
    KQI
    []