An Empirical Study to Determine if Mutants Can Effectively Simulate Students' Programming Mistakes to Increase Tutors' Confidence in Autograding

Benjamin S. Clegg,Phil McMinn,Gordon Fraser

An Empirical Study to Determine if Mutants Can Effectively Simulate Students' Programming Mistakes to Increase Tutors' Confidence in Autograding

2021

Automated grading often requires automated test suites to identify students' faults. However, tests may not detect some faults, limiting feedback, and providing inaccurate grades. This issue can be mitigated by first ensuring that tests can detect faults. Mutation analysis is a technique that generates artificial faulty variants of a program for this purpose, called mutants. Mutants that are not detected by tests reveal their inadequacies, providing knowledge on how they can be improved. By using mutants to improve test suites, tutors can gain the confidence that: a) generated grades will not be biased by unidentified faults, and b) students will receive appropriate feedback for their mistakes. Existing work has shown that mutants are suitable substitutes for faults in real world software, but no work has shown that this holds for students' faults. In this paper, we investigate whether mutants are capable of replicating mistakes made by students. We conducted a quantitative study on 197 Java classes written by students across three introductory programming assignments, and mutants generated from the assignments' model solutions. We found that generated mutants capture the observed faulty behaviour of students' solutions. We also found that mutants better assess test adequacy than code coverage in some cases. Our results indicate that tutors can use mutants to identify and remedy deficiencies in grading test suites.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations