Unicode-8 based linguistics data set of annotated Sindhi text

Mazhar Ali Dootio,Asim Imdad Wagan

Unicode-8 based linguistics data set of annotated Sindhi text

2018

Mazhar Ali Dootio
Asim Imdad Wagan

Abstract Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis.

Keywords:

Machine translation
Sindhi
Natural language
Linguistics
Language model
Unicode
Lexicon
Sentiment analysis
Biology
Computational linguistics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations