Automated Detection of Usage Errors in non-native English Writing using One-Class Support Vector Machines

Satoru Fujishima,Shun Ishizaki

Automated Detection of Usage Errors in non-native English Writing using One-Class Support Vector Machines

2011

Satoru Fujishima
Shun Ishizaki

In an investigation of the use of a novelty detection algorithm for identifying inappropriate word combinations in a raw English corpus, we employ an unsupervised detection algorithm based on the oneclass support vector machines (OC-SVMs) and extract sentences containing word sequences whose frequency of appearance is significantly low in native English writing. Combined with n-gram language models and document categorization techniques, the OC-SVM classifier assigns given sentences into two different groups; the sentences containing errors and those without errors. Accuracies are 79.30 % with bigram model, 86.63 % with trigram model, and 34.34 % with four-gram model.

Keywords:

Trigram
Support vector machine
Classifier (linguistics)
Novelty detection
Bigram
Categorization
Language model
Computer science
Pattern recognition
Artificial intelligence
native english
Speech recognition
Natural language processing

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations