Information retrieval and estimation with auxiliary information

1990 
In the first part of this work we construct a new method for selecting from a document collection documents relevant to a user's query. We assume that the documents and the query are indexed by terms, that the documents and the terms are represented by vectors in a document-term space, and that vectors of closely related documents and terms are close to each other in that space. We model the relevance of terms and documents as a probability density over the document-term space. The density is high over the areas of the space containing the vectors of the documents relevant to the query and low over the areas containing the vectors of the non-relevant documents. We use Bayes's rule to compute the density. The relevance density can incorporate a priori knowledge about the user's interests, and the user's query and feedback. We state the desired behaviors of the method, propose two candidate densities and show that they have the desired properties. Tests of the proposed method and of the existing method (the vector averaging method) on two collections show that the proposed method requires much less computing. It performs significantly better than vector averaging in some cases and equally well with vector averaging in other cases. In the second part, we discuss estimating the mean of a Gaussian density when the observations in the sample are evaluated by an expert who tells us whether each observation is "typical" or "unusual", i.e. whether the probability density is high or low over each observed value. We start by assuming that the expert's judgements are correct, then extend the model to include a positive probability that the expert makes mistakes. We present an optimal (lowest variance) unbiased weighted average estimator and a trimmed mean-like unbiased estimator. When the expert is reliable, the variance of both estimators are lower than that of the sample mean X. The variance of the optimal weighted average is always less than or equal to that of X. We develop procedures for computing the MLE and prove consistency results. We also discuss convergence of posterior densities and existence of reproducing priors in the case of Bayesian estimation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []