System Performance and Natural Language Expression of Information Needs

2005 
Consider information retrieval systems that respond to a query (a natural language statement of a topic, an information need) with an ordered list of 1000 documents from the document collection. From the responses to queries that all express the same topic, one can discern how the words associated with a topic result in particular system behavior. From what is discerned from different topics, one can hypothesize abstract topic factors that influence system performance. An example of such a factor is the specificity of the topic's primary key word. This paper shows that statements about the effect of abstract topic factors on system performance can be supported empirically. A combination of statistical methods is applied to system responses from NIST's Text REtrieval Conference. We analyze each topic using a measure of irrelevant-document exclusion computed for each response and a measure of dissimilarity between relevant-document return orders computed for each pair of responses. We formulate topic factors through graphical comparison of measurements for different topics. Finally, we propose for each topic a four-dimensional summarization that we use to select topic comparisons likely to depict topic factors clearly.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []