Discovering Data Source Stability Patterns in Biomedical Repositories Based on Simplicial Projections from Probability Distribution Distances

2017 
The degree of homogeneity of statistical distributions among data sources is a critical issue when reusing data of Integrated Data Repositories (IDR). Evaluating this data source stability is of utmost importance in order to ensure a confident data reuse. This work tackles the task of discovering and classifying patterns among the statistical distributions of multiple sources in IDRs, by means of a novel approach based on simplicial projections from probability distribution distances, combined with Density-based spatial clustering of applications with noise (DBSCAN). The results on the evaluated 20 public repositories support the existence of four main data source stability patterns in biomedical repositories: the global stability pattern (GSP), the local stability pattern (LSP), the sparse stability pattern (SSP) and the instability pattern (IP).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []