A Document Clustering Approach Using Shared Nearest Neighbour Affinity, TF-IDF and Angular Similarity

Mausumi Goswami

A Document Clustering Approach Using Shared Nearest Neighbour Affinity, TF-IDF and Angular Similarity

2021

Mausumi Goswami

Quantum of data is increasing in an exponential order. Clustering is a major task in many text mining applications. Organizing text documents automatically, extracting topics from documents, retrieval of information and information filtering are considered as the applications of clustering. This task reveals identical patterns from a collection of documents. Understanding of the documents, representation of them and categorization of documents require various techniques. Text clustering process requires both natural language processing and machine learning techniques. An unsupervised spatial pattern identification approach is proposed for text data. A new algorithm for finding coherent patterns from a huge collection of text data is proposed, which is based on the shared nearest neighbour. The implementation followed by validation confirms that the proposed algorithm can cluster the text data for the identification of coherent patterns. The results are visualized using a graph. The results show the methodology works well for different text datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations