An Approach for Algorithm of Tobacco Enterprise Archives Text Automatic Classification Based on KNN

2014 
By researching historical archives text data of a cigarette factory in Yunnan province, combing with actual situation, we have detailedly designed acquisition of file text subject headings and au- tomatic classification algorithm. Furthermore, TFIDF algorithm is introduced to acquisition algo- rithm of subject headings, thus the problem that algorithm can't automatically obtain subject headings when text file lack title, document number and statement items is solved. In this paper, KNN adjacent algorithm is introduced to the algorithm of automatic classification, and it solves the problem which can't be solved according to the title and approval document for automatically classifying archives text. At the same time, we also consider the problem that classifies file text according to the storage life. The experimental results show that this algorithm obviously im- proves the classified efficiency of archives text of the tobacco enterprise.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    1
    Citations
    NaN
    KQI
    []