Feature overlap-based dynamic self organizing model for hierarchical text clustering

2013 
In text document clustering documents are represented as feature vectors where features can be either words or phrases. Documents can belong to different topics when categorized by humans; however it is noted that obtaining one to one mapping between the features and the topics is almost impossible since the same features can and will be used in documents in different topics. Such common features results in overlap in text clustering, and as such traditional cluster purity measures may not be practical or meaningful. In this paper new methodology and algorithm is introduced which considers the feature overlap between the clusters when clustering text documents. Hierarchical clustering facilitated by the Growing Self-Organizing Map (GSOM) is used together with the calculated feature overlap to check the possibility of obtaining clusters with minimum feature overlap. We also present the experimental results obtained from following the proposed methodology with the new algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []