Modelling and computing the quality of scientific information on the Web of Data

2014 
The Web is being transformed into an open data commons, and is now the dominant point of access for information seeking scientists. In parallel the scientific community has been required to manage the challenges of "Big Data" - characterized by its large-scale, distributed, and diverse nature. The Web of Linked Data has emerged as a platform through which the sciences can meet this challenge, allowing them to publish and reuse data in a machine readable manner. The openness of the Web of Data is however a double-edged sword. On one hand it drives a rapid growth of adoption, but on the other a lack of governance and quality control has led to data of varied quality and trustworthiness.The challenge scientists face then is not that data on the Web is universally poor, but that the quality is unknown. Previous research has established the notion of Quality Knowledge, latent domain knowledge possessed by expert scientists to make quality based decisions. The main idea pursued in this thesis is that we can address Information Quality (IQ) issues in the Web of Data by repurposing these existing mechanisms scientists use to evaluate data. We argue that there are three distinct aspects of Quality Knowledge, objective, predictive, and subjective, defined by information required for their assessment, and present two studies focused on the modelling and exploitation of the objective and predictive aspects. We address the objective aspect by developing the Minimum Information Model as a repurposing of Minimum Information Checklists, an increasingly prevalent type of quality knowledge employed in the Life Sciences. A more general approach to modelling the predictive aspect explores the use of Multi-Entity Bayesian Networks to tackle the characteristic uncertainty in predictive quality knowledge, and the inconsistent availability of metadata in the Web of Data. We show that by following our classification we can develop techniques and infrastructure to successfully evaluate IQ that are tailored to the challenges of the Web of Data, and informed by the needs of the scientific community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []