Discovering Data Source Stability Patterns in Biomedical Repositories Based on Simplicial Projections from Probability Distribution Distances
2017
The degree of homogeneity of statistical distributions among data sources is a critical issue when reusing data of Integrated Data Repositories (IDR). Evaluating this data source stability is of utmost importance in order to ensure a confident data reuse. This work tackles the task of discovering and classifying patterns among the statistical distributions of multiple sources in IDRs, by means of a novel approach based on simplicial projections from probability distribution distances, combined with Density-based spatial clustering of applications with noise (DBSCAN). The results on the evaluated 20 public repositories support the existence of four main data source stability patterns in biomedical repositories: the global stability pattern (GSP), the local stability pattern (LSP), the sparse stability pattern (SSP) and the instability pattern (IP).
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
0
Citations
NaN
KQI