PAPER: Improving the Reliability of Essay Evaluations Using a Tool from the Literature on the Wisdom–of-Crowds

Avi Allalouf,Meir Barneron,Ilan Yaniv

PAPER: Improving the Reliability of Essay Evaluations Using a Tool from the Literature on the Wisdom–of-Crowds

2016

College candidates taking the Psychometric Entrance Test (the Israeli SAT equivalent) are required to write a short essay. This task tests the candidate’s academic writing skills and is a central component of the test. The essays are traditionally evaluated and graded by raters who are well-trained for this task. Given the importance of obtaining reliable and accurate evaluations, the common practice is to average the evaluations of two independent raters. This practice is known to improve the reliability in performance assessment tests. The National Institute for Testing and Evaluation, in charge of the PET, accepts essays written in a dozen of foreign languages. The rational is to make the entry into institutions of higher education accessible to candidates from various backgrounds and is based on the assumption that an examinee expresses her writing skills better in her mother tongue. Yet, in some languages (e.g., Amharic) it is hard to find well-trained raters. This raises the question of whether it exists a method to improve the accuracy of grades based on a single rater. Recent research in the field of judgment and decision-making suggests that judgments accuracy could be improved by eliciting multiple judgments from the same individual (at different times), rather than by eliciting single judgments from multiple individuals. This “wisdom-of-crowd effect” within the mind of a single individual implies that essay evaluations made by the same rater at two different occasions should be more accurate than a grade based on a single evaluation. Our study used professional raters and real essays. We found robust evidence for benefits of this method. The project is unique in that it incorporates ideas from the judgment and decision-making literature into the field of assessment and evaluation, suggesting a noteworthy application that should be considered in attempt to improve the reliability of evaluations.

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations