Producing an annotated corpus with automatic spelling correction
2013
This paper describes ConSpel, a software system for automatic detection and correction of non-word misspellings. We also present an ongoing research project for constructing an ETS (Educational Testing Service) Spelling Corpus. The corpus consists of essays written by native and non-native speakers of English to the writing prompts of TOEFL® and GRE® tests. Essays are annotated for misspellings by trained annotators, using a semi-automated methodology. An evaluation of the ConSpel system was conducted, using the data from the completed phase of the annotation project. The ConSpel system achieves above 95% accuracy in error detection. The evaluation also indicates that an advanced correction algorithm, which takes into account the local context of misspellings, achieves correction accuracy of 77% and consistently outperforms a baseline context-blind approach.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
6
Citations
NaN
KQI