How higher order mutant testing performs for deep learning models: A fine-grained evaluation of test effectiveness and efficiency improved from second-order mutant-classification tuples

2022 
Given the prevalence of Deep Learning (DL) models in daily life, it is crucial to guarantee their reliability by DL testing. Recently, researchers have adapted mutation testing into DL testing to measure the test power of test sets. The bottleneck of DL mutation testing is the expensive costs of generating a large number of mutants.We want to study whether the traditional ideology of “Higher Order” and “Strongly Subsuming” in Higher Order Mutant Testing is still applicable for DL mutation testing, i.e., whether they can be used to optimize DL mutation testing by reducing the number of mutants. twice, and search for “Strongly Subsuming” HOTs (SSHOTs) from HOTs.The experimental results conducted on four widely used datasets and five DL model structures tell us that (1) we can find a considerable number of SSHOTs (from 720 to 25,840 in five models) which can greatly reduce the original set of FOTs (with the reduction ratio from 28.69% to 91.97% in our studied DL models). (2) The reduced tuples by SSHOTs can perform very well in test case selection, since the selected test set is almost the same effective (i.e., with almost the same mutation score) and much more efficient (i.e., with a smaller test size, which is more than 50% reduced) for most studied DL models.Our study shows that “Higher Order” and “Strongly Subsuming” are useful to optimize DL mutation testing, i.e., SSHOTs can be introduced to reduce the number of mutants and test cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []