Implementation of an extended Fellegi-Sunter probabilistic record linkage method using the Jaro-Winkler string comparator
2014
Record linkage is the task of identifying which records from one or more data sources refer to the same person. Often, records do not have a common key and may contain typographical variations in identifier fields, in such a case, the Fellegi-Sunter probabilistic record linkage is a method commonly used. In this method, a weight is assigned for each pair of records. Record pairs with weights above a given threshold are considered as matches. Winkler introduced an extension of the Fellegi-Sunter method that takes into account field similarity in the calculation of weight, and proved its outperformance. The implementation of the Fellegi-Sunter method is frequently presented in the literature, however, the application of Winkler method is rarely mentioned. This paper presents brief backgrounds of these two record linkage methods, and describes in details how to implement the Winkler method. We formalized and then estimated the required parameters of the Winkler method using the expectation-maximization (EM) algorithm. Simulated data sets-with known truth of the matches-were used to assess parameters' estimation and to compare Winkler and Fellegi-Sunter methods regarding their ability to reduce the rates of false matches and false non-matches.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
12
References
13
Citations
NaN
KQI