Automatic classification of academic documents using text mining techniques

Haydemar Núñez,Esmeralda Ramos

Automatic classification of academic documents using text mining techniques

2012

Haydemar Núñez
Esmeralda Ramos

In this work an automatic classifier of undergraduate final projects based on text mining is presented. The dataset, comprising documents from four professional categories, was represented by means the vector space model with different index metrics. Also, a number of techniques for reduction dimensionality were applied over the word space. In order to construct the classification model the K-nearest neighbor algorithm was applied. Using 10-fold cross-validations we could obtain 82% of predictive accuracy. However, we achieved an accuracy of 95% with a recommendation of up to two categories taking into account the interdisciplinary in documents. This classifier was integrated into an application for automatic assignment of reviewers, which performs this assignation from teachers who belong to the areas recommended.

Keywords:

Data reduction
Curse of dimensionality
Vector space model
Classifier (linguistics)
Data mining
Computer science
Pattern recognition
Text mining
Artificial intelligence
Machine learning
neighbor algorithm
Further education

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations