An algorithm to identify medical practices common to both the General Practice Research Database and The Health Improvement Network database

2012 
Purpose To identify practices common to both the General Practice Research Database and The Health Improvement Network database for purposes of combining the databases for analysis without duplicate records. Methods We developed two independent algorithms to identify practices common to the two databases. The first used the total number of patients in the therapy and clinical data sets and the total number of etoricoxib and celecoxib users each year during the study period. The second used the total number of patients stratified by gender and four different categories of birth year. Further checking of potential matched practice pairs identified by the two algorithms was performed by comparing the patient-level medical records by birth year, dates of clinical visits, and diagnosis codes. Results Three hundred twelve potential matched pairs of practices were found by both algorithms. Fifteen additional potential pairs were matched by only one algorithm: 13 by algorithm 1 (A1) only and 2 by algorithm 2 (A2) only. The examination of the patient-level visit dates and diagnosis codes for the matches revealed that all of the 327 potential pairs of duplicate practices were in fact the same practice in the two databases. Conclusions The two algorithms successfully found the practices common to the two different databases without de-identifying the practices. The identification of the common practices allows for combining the two databases without duplicate records to create a larger data set for analysis, with 168 more practices than when using the General Practice Research Database alone, or with 268 more practices than when using The Health Improvement Network alone. Copyright © 2012 John Wiley & Sons, Ltd.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    23
    Citations
    NaN
    KQI
    []