language-icon Old Web
English
Sign In

IndoWordNet

IndoWordNet is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei (Manipuri), Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu and Urdu. IndoWordNet is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei (Manipuri), Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu and Urdu. In early 90s, the wordnet for English- called Princeton WordNet- was created in Princeton University by George Miller and Christiane Fellbaum who went on to get the prestigious Zampoli Prize in 2006. Then followed the EuroWordNet- the conglomeration of European Language wordnets- which got created in 1998. Wordnets are now essential resources for Natural Language Processing, Information Extraction, Word Sense Disambiguation and such other computations involving text. Indian languages form a very significant component of the languages landscape of the world. There are 4 streams of language typology operative in the Indian subcontinent- Indo European, Dravidian, Tibeto Burman and Austro Asiatic. Many languages rank within top 10 in the world in terms of the population speaking them, e.g., Hindi-Urdu 5th, Bangla 7th, Marathi 12th and so on as per the List of languages by number of native speakers. Creating wordnets of Indian languages is therefore a highly important techno-scientific and linguistic project. Such project indeed took off in 2000 with Hindi WordNet being created by the Natural Language Processing group at the Center for Indian Language Technology (CFILT) in the Computer Science and Engineering Department at IIT Bombay. It was made publicly available in 2006 under the GNU license. The Hindi WordNet was created with support from the TDIL project of Ministry of Communication and Information Technology, India and also partially from Ministry of Human Resources Development, India. Wordnets of other languages of India then followed suit. The large nationwide project of building Indian language wordnets was called the IndoWordNet project. IndoWordNet is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu and Urdu. The wordnets are getting created by using expansion approach from the Hindi WordNet. The Hindi WordNet was created from first principles (mentioned below) and was the first wordnet for an Indian language. The method adopted was same as the Princeton WordNet for English. Polish WordNet is being mapped to Princeton WordNet based on the strategy followed by IndoWordNet.

[ "Word-sense disambiguation", "WordNet", "Machine translation", "Hindi" ]
Parent Topic
Child Topic
    No Parent Topic