Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications
2020
Any popular news website in our modern life, offering details to millions of users every day. Although computer technology continues to grow, the number of disease data is rising. How to structure the document to enable data recognition dynamically has become one of the main challenges for sophisticated web services. Traditional systematic classification of news text requires not only a lot of human and financial assets but it also hardly accomplishes fast classification function. In this work, we introduce a new method relying on both the Latent Dirichlet Allocation and the Neural Networks that are used in the Arabic document classification. Our approach adopts the Vector Space Model to interpret documents in applications for the text classification. In this process, the text is represented as a term vector; n-grams. These methods can not distinguish semantic or textual content; this results in considerable space for features and semantic losses. In this research, the new proposal utilizes a “topics” sampled as text characteristics by the Latent Dirichlet Allocation method. Effectively it eliminates the problems described. We have extracted important themes (topics) of all the texts. Each theme is identified by a different descriptor distribution, and then each text is depicted on the vectors of certain themes. Our experiments indicate that the proposed solution is capable of achieving high efficiency with an accuracy rate of 85.11% for the Arabic text classification task.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
11
References
0
Citations
NaN
KQI