On international business intelligence Out-Of-Vocabulary processing based on sentence-aligned web corpus

2011 
International business intelligence processing is an important problem of cross-disciplinary research in artificial intelligence. The recognition of Out-Of-Vocabulary (OOV in short) in international commercial activities and its derivate OOV phrase brings challenge to widely used machine translation technology. Electronic dictionary with a fixed lexicon cannot catch up with the fast increase of international commercial OOV phrase. In this paper, we present a recognition and translation technology for OOV phrases in international business intelligence based on sentence-aligned web corpus. We first obtain the latest and most related textual resource from the Internet and build up a sentence-aligned corpus. Then calculate the relevancy of adjacent word string by Markov model to get a maximum likelihood of segmentation, and determine the OOV and OOV phrase in such business context. Then wipe off the redundancy and calculate the probabilities and weight of co-occurrence word pairs. Thus we have the OOV word pair and the translation of OOV phrase in business intelligence. Experiments show a good result in international business domain and timely update.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []