A comparison of statistical measures for the automatic identification of Persian light verb constructions

2012 
A multiword expression (MWE) is a combination of words with a meaning beyond the compositional combination of the part meanings. Light verb constructions (LVCs) are a type of MWE that are widely used in many languages, including English, Spanish, French, Japanese, Chinese, Urdu, and Persian, among others. An LVC consists of a semantically-light basic verb — such as take in English and gozâshtan (meaning ‘to put’) in Persian — combined with another word that can be an adjective, a prepositional phrase, or a noun. Examples of LVCs are take a walk in English, and ehteram gozâshtan in Persian (lit. put respect, meaning ‘t o respect’). In particular, most verbs in Persian are of the form of LVCs, and thus many linguistic studies have examined their properties. There is, however, not much computational work on the automatic identification and processing of Persian LVCs, despite its importance for the development of natural language processing systems, such as summarization and machine translation. In this study, we focus on the most common form of LVCs in Persian, in which a noun is combined with one of five commonly-used light verbs to form an LVC. Two standard measures of association are used as features of candidates as well as some linguistically-informed measures. We also propose a position-based fixedness measure and some translation-based measures based on the special properites of Persian LVCs and their translation to English. Our results show the good performance of the measures for identifying Persian LVCs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []