Minimalist Fitted Bayesian Classifier-Based on Likelihood Estimations and Bag-of-Words.
2021
The expansion of institutional repositories involves new challenges for autonomous agents that control the quality of semantic annotations in large amounts of scholarly knowledge. While evaluating metadata integrity in documents was already widely tackled in the literature, a majority of the frameworks are intractable when confronted with a big data environment. In this paper, we propose an optimal strategy based on feature engineering to identify spurious objects in large academic repositories. Through an application case dealing with a Brazilian institutional repository containing objects like PhD theses and MSc dissertations, we use maximum likelihood estimations and bag-of-words techniques to fit a minimalist Bayesian classifier that can quickly detect inconsistencies in class assertions guaranteeing approximately 94% of accuracy.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
29
References
0
Citations
NaN
KQI