Dynamic Phrase Generation for Detection of Idioms of Gujarati Language using Diacritics and Suffix-based Rules

2021 
Gujarati is the language used for everyday communication in the state of Gujarat, India. The Gujarati language is also officially recognized by the constitution and the government of India. Gujarati script is based on the Devanagari script. An idiom is an expression, phrase, or word that has a different meaning from the literal meaning of the words in it. Idioms represent the cultural heritage of Gujarati language. Idioms are used in Gujarati language for effective communication and convey of an accurate message. No Machine Translation System does the accurate translation of Gujarati idioms to English or any other language. Different idiom phrases can be generated by adding diacritic(s) as well as suffix to the root or base form of the idiom. Many forms of single idiom make automatic idiom identification as well as machine translation more challenging. This paper focuses on the design and implementation of diacritics and suffix-based rules for dynamic phrase generation and detection of idioms of Gujarati language. This implementation helps in identifying Gujarati idiom present in any possible form in the Gujarati text. The obtained results with the execution of 7050 different Gujarati idiom phrases yield an accuracy of 99.73%. The results are encouraging enough to make the proposed implementation useful for Natural Language processing tasks related to Gujarati language idioms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []