Text categorization based on improved Rocchio algorithm
2012
Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using the LBG and Rocchio algorithm. The codebook is then used to categorize the target documents using vector scores. We tested this method in the experiment and the result shows that this method can achieve better performance.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
4
References
9
Citations
NaN
KQI