Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes

Abolfazl Asudeh,Nima Shahbazi,Zhongjun Jin,Hosagrahar V. Jagadish

Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes

2021

Abolfazl Asudeh
Nima Shahbazi
Zhongjun Jin
Hosagrahar V. Jagadish

Appropriate training data is a requirement for building good machine-learned models. In this paper, we study the notion of coverage for ordinal and continuous-valued attributes, by formalizing the intuition that the learned model can accurately predict only at data points for which there are "enough" similar data points in the training data set. We develop an efficient algorithm to identify uncovered regions in low-dimensional attribute feature space, by making a connection to Voronoi diagrams. We also develop a randomized approximation algorithm for use in high-dimensional attribute space. We evaluate our algorithms through extensive experiments on real datasets.

Keywords:

Approximation algorithm
Connection (vector bundle)
Artificial intelligence
space
Training set
Feature vector
Computer science
Set (abstract data type)
Voronoi diagram
Data point
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations