logo
    Abstract:
    Current world is growing so fast and communication between nation and different type of people with different language became part of our life. Even from buying product to our social life everything is dependent on communication. Therefore language is the most important part of human life. Though still now there is a language barrier for communication between people. But very soon language will be universal and everyone will be able to communicate in any language worldwide using the NLP technology. For that it is necessary to understand each language individually. This research proposes a new type of text generation of Bangla language using the bi-directional RNN. This technique is used to predict the next possible word in a Bangla text.
    Keywords:
    Human communication
    Named Entity Recognition (NER) belongs to the field of Information Extraction (IE) and Natural LanguageProcessing (NLP). NER aims to find and categorize named entities present in the textual data into recognizable classes. Named entities play vital roles in other related fields like question-answering, relationship extraction, and machine translation. Researchers have done a significant amount of work (e.g., dataset construction and analysis) in this direction for several languages like English, Spanish, Chinese, Russian, Arabic, to name a few. We do not find a comparable amount of work for several South-Asian languages like Bengali/Bangla. Hence, as part of the initial phase, we have constructed a qualitative dataset in Bengali.In this paper, we identify the presence of Named Entities (NEs) in the Bengali text (sentences), classify them in standardized categories, and test whether an automatic detection of NE is possible. We present a new corpus and experimental results. Our dataset, annotated by multiple humans, shows promising results (F-measures ranging from 0.72 to 0.84) in different setups (support vector machine (SVM) setups with simple language features and Long-Short Term Memory (LSTM) setup with various word embedding).
    Named Entity Recognition
    Speech recognition has received a less attention in Bengali literature due to the lack of a comprehensive dataset. In this paper, we describe the development process of the first comprehensive Bengali speech dataset on real numbers. It comprehends all the possible words that may arise in uttering any Bengali real number. The corpus has ten speakers from the different regions of Bengali native people. It comprises of more than two thousands of speech samples in a total duration of closed to four hours. We also provide a deep analysis of our corpus, highlight some of the notable features of it, and finally evaluate the performances of two of the notable Bengali speech recognizers on it.
    Citations (4)
    In the proposed approach, Word Sense Disambiguation (WSD) in Bengali language has been done using unsupervised methodology. This work is consisted of sequential two sub-tasks. First one is grouping of Bengali sentences into a certain number of clusters where a particular cluster contains the sentences of similar meaning and second one is labeling the clusters with its inner meanings with the help of a linguistic expert as these sense tagged clusters could be used as a knowledge reference for WSD task. In this work, clustering has been performed using weka-3-6-13 tool. The test sentences are collected from the Bengali text corpus developed in the TDIL (Technology Development for Indian Language) project of the Govt. of India. In this work, Type-based and Token-based distributional approaches have been developed for Bengali sentence clustering. In Type-based method, a feature vector of co-occurring words of a target word in a sentence has been considered and in Token-based method, synsets of the collocating words are also considered. The synsets of the collocating words are retrieved from the Bengali WordNet, developed at ISI, Kolkata. The base line result, achieved result and the pitfalls of the procedure are discussed in the report in detail.
    Paraphrase
    In this paper we present different methodologies to extract semantic role labels of Bengali nouns using 5W distilling. The 5W task seeks to extract the semantic information of nouns in a natural language sentence by distilling it into the answers to the 5W questions: Who, What, When, Where and Why. As Bengali is a resource constraint language, the building of annotated gold standard corpus and acquisition of linguistics tools for features extraction are described in this paper. The tag label wise reported precision values of the present system are: 79.56% (Who), 65.45% (What), 73.35% (When), 77.66% (Where) and 63.50% (Why).
    Bengali language has been declared as the state language of the Republic in Article 3 of the Constitution of Bangladesh. Bengali is our mother tongue and we have achieved this at the cost of much blood. Moreover Bangla Bhasha Procholon Ain (Bengali Language Implementation Act) was made in 1987 for ensuring compulsory use of Bengali in courts and offices of Bangladesh. In spite of these provisions, English is still used in the judicial system (Higher Courts) in Bangladesh. Often delivering of judgments in English creates various problems for poor and illiterate person. People in our country speak in Bengali. Language of courts should follow the language of the common people. An attempt has been made in this article to assess the status and the enforceability of Bengali language with historical background, limitations of bringing into practice and some necessary measures for effective use of Bengali language in the courts. Key words: Bengali language, judgments in English, impact on the peopleDOI: 10.3329/dujl.v2i3.4143 The Dhaka University Journal of Linguistics: Vol.2 No.3 February, 2009 Page: 53-68
    Citations (4)
    Solving problems with Artificial intelligence in a competitive manner has long been absent in Bangladesh and Bengali-speaking community. On the other hand, there has not been a well structured database for Bengali Handwritten digits for mass public use. To bring out the best minds working in machine learning and use their expertise to create a model which can easily recognize Bengali Handwritten digits, we organized Bengali.AI Computer Vision Challenge.The challenge saw both local and international teams participating with unprecedented efforts.
    Citations (1)
    Solving problems with Artificial intelligence in a competitive manner has long been absent in Bangladesh and Bengali-speaking community. On the other hand, there has not been a well structured database for Bengali Handwritten digits for mass public use. To bring out the best minds working in machine learning and use their expertise to create a model which can easily recognize Bengali Handwritten digits, we organized Bengali.AI Computer Vision Challenge.The challenge saw both local and international teams participating with unprecedented efforts.
    Citations (0)