Investigating the Optimise k-Dimensions and Threshold Values of Latent Semantic Indexing Retrieval Performance for Small Malay Language Corpus

2016 
Presenting users with relevant feedback is the main aim and core in information retrieval (IR). Due to the poor relevance feedback returned by simple exact term-matching technique, a latent semantic indexing (LSI) based IR has come into place to overcome the retrieval drawback, and improve the effectiveness of retrieval performance. In other words, LSI-based IR aims in satisfying users rather than satisfying a given query. However, in developing an LSI-based information retrieval application, there are parameters that need to be considered in order to produce relevant feedback which optimise the precision and recall in retrieval process. Therefore, this paper investigates two important parameters that characterised the retrieval performance, which are the optimise k-dimension to represent terms and documents in corpus, and the optimise threshold values for the documents to be accepted, judged and returned as relevant for a given term query. A small Malay corpus which comprises of 1395 Malay language documents and terms were used as the test collection. The analyses suggest that the effective performance of the retrieval which satisfied as well as balanced the precision and recall, is obtained for k-dimension is k = 4 and threshold value is e = 0.8 The study helps the software developers particularly the IR application developers in designing and choosing the optimise value of the k-dimension and the threshold in the search engine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []