Traces through Time: a Case-study of Applying Statistical Methods to Refine Algorithms for Linking Biographical Data.

2015 
The Traces through Time project, which ran at The UK National Archives in 2015, developed algorithms and tools to link people appearing in historical records and to assign robust measures of confidence to the connections that are made. The method has application across the digital humanities, including for biographical research. Fuzzy matching relies on the availability of background statistics on the population, the distribution of data values, data quality and the type and frequency of errors. This paper describes work to refine the original algorithms through implementation of a learning approach in which insights arising from one analysis are fed back into the algorithm to improve the baseline statistics for subsequent analyses. We find that this iterative approach delivers significant improvements over 'rawscoring mechanisms. It enables us to carefully target the type and degree of fuzzy matching to be applied and can help balance the poor precision that results from allowing increased ‘fuzziness’ against the poor recall that arises from a more restrictive approach. Future work will extend the approach beyond names and dates of birth, and will embed these enhancements into the Traces through Time framework and tools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    4
    Citations
    NaN
    KQI
    []