ANÁLISE DE AGRUPAMENTOS SOBRE TEXTOS: UM ESTUDO DOS RESUMOS DO BANCO DE TESES E DISSERTAÇÕES DA CAPES: UM ESTUDO DOS RESUMOS DO BANCO DE TESES E DISSERTAÇÕES DA CAPES

2018 
The process of knowledge discovery in large volumes of information has a wide field of application. The main tasks of classification, clustering and association have been used in different areas of knowledge to make it possible to identify useful knowledge in large volumes of data. In this article, the application of data mining techniques, especially the K-Means clustering algorithm, is analyzed with the objective of verifying its effectiveness for the analysis of data from the Brazilian Open Data Portal, a public data repository organized and made available for the population. The dataset used for the application of the clustering algorithm was extracted from the information provided on the thesis and dissertation database made available by CAPES (Coordination of Improvement of Higher Education Personnel). The data were processed and inserted in the Apache Solr® platform where they were indexed, and the clusters were generated from the Carrot2 software, using the K-Means algorithm with customized configurations. The clusters were generated year by year and consolidated, with different configurations of the algorithm, making it possible to compare the obtained terms. It was concluded that the results of the used tools are directly related to the choice of the number of initial clusters, but the potential for discovering non-obvious clusters is obvious.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []