A Hybrid Approach to Word Sense Disambiguation

2010 
Words can have more than one distinct meaning called as polysemous words. This paper concentrates on Word Sense Disambiguation (WSD) which refers to the resolution of lexical ambiguity that arises when a given word has several different meanings. The paper presents a hybrid approach for this problem based on the basic principle by Yarowsky’s unsupervised algorithm for WSD. It also employs Naive Baye’s theorem to find the likelihood ratio of the sense in the given context. This way, the approach preserves the advantage of principles of Yarowsky’s (one sense per discourse and one sense per collocation) and utilizes Baye’s theorem for the better performance of the system. The seed/sense selection can be done either manually or automatically to find the local or global dependency of a given sense in a given window. To find the local dependency, the system uses the definitions provided by the dictionary (Word Net) for the target word whereas the global dependency is determined on the basis of fact that the word that occurs with significantly higher frequency in an entire corpus can be used as seed word. The proposed approach is applied on some ambiguous words for which training and test data is developed and the performance of the system is determined, listed in the form of tables. Finally the comparison is made between the two seed selection methods i.e. local and global. I. Introduction Natural languages are an integral part of our lives. They not only help us to communicate our day to day ideas but also play an instrumental role to record our knowledge. If we want to develop the applications such as machine translator, information retrieval, grammatical analyser, speech processor etc., we have to resolve the ambiguity inherent in the natural language. We are unaware of ambiguities in the natural language may be because we as human are good at resolving them but computationally removing ambiguities is not so easy and obvious. In the domain word sense disambiguation (WSD), a field of computational linguistics, the sense of word is determined by using the context in which word is used in sentence or discourse. The algorithms used in WSD can be classified as knowledge based and corpus based which involves supervised learning and unsupervised learning[4]. Knowledge based approach disambiguation is carried out using information from an explicit lexicon or knowledge base. The lexicon may be a machine readable dictionary, thesaurus or it may be hand-crafted. Supervised learning can be viewed as a classification task while unsupervised learning can be viewed as clustering task[4].
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    2
    Citations
    NaN
    KQI
    []