Developing an Expert IR System from Multidimensional Dataset

2015 
Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. Application of TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix. Preprocessing reduces the size of dataset which ultimately improves the performance of search engine. After that, index is generated from dataset. Index contains term with its occurrence in file and also its location in file. This paper discusses the implication of efficient Information Retrieval system for text-based data using clustering approaches. General Terms Data Mining, Information Retrieval.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []