Integration of Multiple Graph Datasets and Their Linguistic Summaries: An Application to Linked Data

2016 
This paper presents a novel method of generating and evaluating linguistic summaries of content stored in distributed graph datasets, like LinkedData. Linguistic summarization is a well known data mining technique, aimed to discover patterns in data and present them in natural language. So far, this method has been researched only for relational databases. In our recent paper we have presented how to adapt this method for graph datasets. We have solved the problems of subject definition (further extended in this paper), retrieval of the attributes for summarization, generalization of summarizers and qualifiers. In this paper we extend that research by adapting proposed method to distributed interlinked graph datasets, which results in obtaining new summaries, and therefore new knowledge. We discuss how to follow different types of equivalence links that may exists between graph datasets. In order to measure characteristics specific for summaries of distributed graph data we propose new truth values (degree of subject appropriateness, degree of summarizer order and degree of linkage), and adapt existing ones (degree of covering). We run several experiments on Linked Data and discuss the results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    2
    Citations
    NaN
    KQI
    []