Collapsing Corporate Confusion Leveraging Network Structures for Effective Entity Resolution in Relational Corporate Data

2017 
In this paper, we introduce a novel battery of classifiers to resolve artificial inconsistencies among entity names within large datasets. Using data on the corporate sector, we describe the logic underlying a relational approach to entity resolution, and its importance for data acquisition, feature extraction, and integration. We subsequently leverage the relational structure of BoardEx employment data to assess the efficacy of these methods as compared to a ground-truth sample of coded name inconsistencies. We show that these methods hold significant promise for cleaning artificial distinctions in entity names via enrichment from integration with external data, and further demonstrate the effect of such resolution on the accuracy of extracted network topology features. We conclude with implications for existing findings and steps for future work.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []