ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata
2019
Abstract Objective Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study data. We propose that the available data from research studies combined with provenance metadata provide a framework for evaluating scientific reproducibility. We developed the ProvCaRe platform to model, extract, and query semantic provenance information from 435, 248 published articles. Methods The ProvCaRe platform consists of: (1) the S3 model and a formal ontology; (2) a provenance-focused text processing workflow to generate provenance triples consisting of subject , predicate , and object using metadata extracted from articles; and (3) the ProvCaRe knowledge repository that supports “provenance-aware” hypothesis-driven search queries. A new provenance-based ranking algorithm is used to rank the articles in the search query results. Results The ProvCaRe knowledge repository contains 48.9 million provenance triples. Seven research hypotheses were used as search queries for evaluation and the resulting provenance triples were analyzed using five categories of provenance terms. The highest number of terms (34%) described provenance related to population cohort followed by 29% of terms describing statistical data analysis methods, and only 5% of the terms described the measurement instruments used in a study. In addition, the analysis showed that some articles included a higher number of provenance terms across multiple provenance categories suggesting a higher potential for reproducibility of these research studies. Conclusion The ProvCaRe knowledge repository ( https://provcare.case.edu/ ) is one of the largest provenance resources for biomedical research studies that combines intuitive search functionality with a new provenance-based ranking feature to list articles related to a search query.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
37
References
8
Citations
NaN
KQI