How linkage error affects hidden Markov models: a sensitivity analysis

2019 
Latent class models (LCM) are increasingly used to estimate and correct for classification error in categorical data, without the need for a “gold standard”, error-free, data source. To accomplish this, LCMs require multiple indicators of the same phenomenon within one data collection wave – “latent structure model” – or multiple observations over time on a single indicator – “hidden Markov model (HMM) ” – and assume that the errors in these indicators are conditionally independent. Unfortunately, this “local independence” assumption is often unrealistic, untestable, and a source of serious bias. Linking independent data sources can solve this problem by making the local independence assumption plausible across sources, while potentially allowing for local dependence within sources. However, record linkage introduces a new problem: the records may be erroneously linked. In this paper we investigate the effects of linkage error on HMM estimates of employment contract types. Our data come from linking a labor force survey to administrative employer records; this linkage yields two indicators per time point that are plausibly conditionally independent. Our results indicate that false-negative linkage error (exclusion) turns out to be problematic only if it is large and highly correlated with the dependent variable. Moreover, under many conditions, false-positive linkage error (mislinkage) turns out to act as another source of misclassification that the HMM can absorb into the error-rate estimates, leaving the latent transition estimates unbiased. In these cases, measurement error modeling already accounts for linkage error. Our results also indicate where these conditions break down and more complex methods would be needed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    3
    Citations
    NaN
    KQI
    []