Language Identification in Code-Switching Scenario

2014 
This paper describes a CRF based token level language identification system entry to Language Identification in CodeSwitched (CS) Data task of CodeSwitch 2014. Our system hinges on using conditional posterior probabilities for the individual codes (words) in code-switched data to solve the language identification task. We also experiment with other linguistically motivated language specific as well as generic features to train the CRF based sequence labeling algorithm achieving reasonable results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    14
    Citations
    NaN
    KQI
    []