Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes

2021 
Appropriate training data is a requirement for building good machine-learned models. In this paper, we study the notion of coverage for ordinal and continuous-valued attributes, by formalizing the intuition that the learned model can accurately predict only at data points for which there are "enough" similar data points in the training data set. We develop an efficient algorithm to identify uncovered regions in low-dimensional attribute feature space, by making a connection to Voronoi diagrams. We also develop a randomized approximation algorithm for use in high-dimensional attribute space. We evaluate our algorithms through extensive experiments on real datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    2
    Citations
    NaN
    KQI
    []