Native Language Identification Using Large, Longitudinal Data

Xiao Jiang,Yufan Guo,Jeroen Geertzen,Dora Alexopoulou,Lin Sun,Anna Korhonen

Native Language Identification Using Large, Longitudinal Data

2014

Xiao Jiang
Yufan Guo
Jeroen Geertzen
Dora Alexopoulou
Lin Sun
Anna Korhonen

Native Language Identification (NLI) is a task aimed at determining the native language (L1) of learners of second language (L2) on the basis of their written texts. To date, research on NLI has focused on relatively small corpora. We apply NLI to the recently released EFCamDat corpus which is not only multiple times larger than previous L2 corpora but also provides longitudinal data at several proficiency levels. Our investigation using accurate machine learning with a wide range of linguistic features reveals interesting patterns in the longitudinal data which are useful for both further development of NLI and its application to research on L2 acquisition.

Keywords:

Artificial intelligence
Natural language processing
Speech recognition
Computer science
Second-language acquisition
First language
Native-language identification
longitudinal data
second language

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations