Bilingual Auto-Categorization Comparison of Two LSTM Text Classifiers
2019
Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI