Aplicación de técnicas de procesamiento de Lenguaje Natural y Minería de Texto para la clasificación de preguntas dentro de un cuestionario digital.

Along with the increasing number of digital documents that are generated daily in companies, organizations and institutions, arises the necessity to analyze and extract relevant information. This process leads to better management and organization of these data. Therefore this work is focused on establishing a reference guide for the automatic classification of digital questionnaires concerning Discrete Mathematics First Bimestre of the Open Method of the Universidad Tecnica Particular de Loja. For the development of this project is the use the CRISP-DM methodology (acronym in English, Cross Industry Standard Process for Data Mining) using text mining techniques (Text Mining) and Natural Language Processing (Natural Language Processing) . The representation of the data is performed by the TDM (Matrix -Term Document) method. Among the best text classification algorithms in Weka, we can mention the DMNtext-I1 and NavieBayesMultinominalUpdateable as between the results of these two algorithms have similarities in their final values Accuracy 0.847, 0.824 and 0.436 Recall of accuary, so both have a 0177 error. These values are the product of the Percentage Split configuration of 66%, 66 training data and 34 test data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader