Estimating precisions for multiple binary classifiers under limited samples

2020 
Machine learning classifiers often require regular tracking of performance measures such as precision, recall, F1-score, etc., for model improvement and diagnostics. The population over which accuracy metrics are evaluated can be too large for a full ground-truth assessment and so only small random samples are chosen for estimation. Ground-truthing often requires human review, which is expensive. Moreover, in some business applications, it may be preferable to minimize human contact with the data in order to improve privacy safeguards. Thus, sampling methods that can provide estimates with low margin of error, high confidence, and small sample size are highly desirable. With an ensemble of multiple binary classifiers, choosing the right sampling method with these desired properties and small size for the collective sample becomes even more important. We propose a sampling method to estimate the precisions of multiple binary classifiers that exploits the overlaps between their prediction sets. We provide theoretical guarantees that our estimators are unbiased and empirically demonstrate that the precision metrics estimated from our sampling technique are as good (in terms of variance and confidence interval) as those obtained from a uniform random sample.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []