A Document Clustering Approach Using Shared Nearest Neighbour Affinity, TF-IDF and Angular Similarity

2021 
Quantum of data is increasing in an exponential order. Clustering is a major task in many text mining applications. Organizing text documents automatically, extracting topics from documents, retrieval of information and information filtering are considered as the applications of clustering. This task reveals identical patterns from a collection of documents. Understanding of the documents, representation of them and categorization of documents require various techniques. Text clustering process requires both natural language processing and machine learning techniques. An unsupervised spatial pattern identification approach is proposed for text data. A new algorithm for finding coherent patterns from a huge collection of text data is proposed, which is based on the shared nearest neighbour. The implementation followed by validation confirms that the proposed algorithm can cluster the text data for the identification of coherent patterns. The results are visualized using a graph. The results show the methodology works well for different text datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []