Using XML to query XML: from theory to practice

2004 
A cornerstone concept in classical Information Retrieval is the vector space model, whereby both documents and queries are viewed as vectors in a multidimensional space. Relevance of a given document to a given query is determined by evaluating the similarity between these vectors, using a measure like the cosine measure of similarity, for instance. The vector space model has been highly successful for dealing with plain text collections, in both theoretical and practical terms. In prior work, we extended this classic approach to the search of XML collections by requiring queries to be presented as XML Fragments, which allows for a very simple extension of the cosine similarity measure to the XML framework. In this paper, we formalize this approach by presenting the full syntax and semantics of XML Fragments as implemented in a practical system. Furthermore, we show how small additions to the pure model improve the expressiveness of queries and enable us to deal with a wide range of users' needs. These additions introduce certain novel constructs that are not syntactically correct XML but implement essential operators. We evaluate the expressiveness of our model, both from a formal viewpoint, by comparing it to the XPath language, and from a practical viewpoint by running experiments on the INEX (Initiative for XML Retrieval) collection. Our conclusion is that the coupling of the classic vector space approach with a carefully chosen small set of relational operators allows us to express XML informational searches as enhanced XML fragments in a natural and powerful way.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    7
    Citations
    NaN
    KQI
    []