XML Clustering: A Review of Structural Approaches

2015 
With its presence in data integration, chemistry, biological and geographic systems, XML has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents — an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. Additionally, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    98
    References
    10
    Citations
    NaN
    KQI
    []