Identifying Thesis Statements in Student Essays: The Class Imbalance Challenge and Resolution

2016 
A thesis statement or controlling idea is a key component of the Common Core State Standards of writing from grade 6 to grade 12. We developed a machine learning model to identify thesis statements in students’ essays in order to focus peer-reviewers on commenting on the presence and quality of an author’s thesis statement. Identifying thesis statements in essays can be considered as a classification task in which a classifier is trained to predict whether a sentence is a thesis statement or not based on the features extracted from the sentence. However, the number of sentences in the thesis class is usually much lower than those in the not thesis class. Our initial model could not deal adequately with the challenge of class imbalance; there were too few instances of thesis statements from which to learn. Our subsequent model employs synthetic over-sampling in order to address this challenge and improve performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    1
    Citations
    NaN
    KQI
    []