Adapting Standard External Clustering Metrics for Repetitive, Noisy Observations

2019 
Clustering for data analysis often makes use of external metrics to evaluate how closely clustering assignments match a gold standard. In order to use external clustering metrics, explicit noise points are usually removed or treated as a single cluster. This modification reduces the relevancy of external metrics as a predictor of performance on unlabeled data, where it is not possible to identify noise points. We propose a modification of standard external metrics to explicitly handle noise points in experimental data. We illustrate the effect of this explicit treatment of noise on clustering evaluation using several examples of common noisy clustering problems as well as a real data set from mass spectrometry. We demonstrate that (external) clustering metrics that explicitly treat noise are more robust than standard (external) clustering metrics in the presence of noise.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []