Comparative Study on Punjabi Document Clustering

2020 
The objective of clustering, a class of techniques that fall under the category of machine learning is to consequently isolate information into groups called clusters. Clustering of Punjabi documents finds numerous applications in the domain of natural language processing. Currently, not much work has been done for native languages such as Punjabi. This study presents the results of certain common document clustering techniques such as agglomerative and K-means experimented with different feature extraction methods to compare its performance using intrinsic and extrinsic measures. The recently released pre-trained Punjabi word vector model by Facebook has also been experimented as one of the feature extraction methods. This study is conducted to know which combination of clustering algorithm and feature extraction technique gives the most optimum results. This study also uses a supervised approach to evaluate the results of an unsupervised learning algorithm such as clustering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []