Can test input selection methods for deep neural network guarantee test diversity? A large-scale empirical study
2022
Recently, various methods on test input selection for (TIS-DNN) have been proposed. These methods can effectively reduce the labeling cost by selecting a subset from the original test inputs, which can still accurately estimate the performance (such as accuracy) of the target DNN models.Previous studies on TIS-DNN mainly focused on the performance on all the classes. However, the selected subset may miss the coverage of some classes or decrease the performance on some classes, which will reduce the test diversity of the original test inputs.Therefore, we conducted a large-scale empirical study to investigate whether previous TIS-DNN methods can guarantee test diversity in the subset. In our study, we selected five state-of-the-art TIS-DNN methods: SRS, CSS, CES, DeepReduce and PACE. Then we selected 18 pairs of DNN models and the corresponding test inputs from seven popular DNN datasets.Our experimental results can be summarized as follows. (1) Previous TIS-DNN methods can guarantee the performance on all the classes. However, these methods have a negative impact on the test diversity and the performance on each class is not satisfactory. (2) Reducing the performance estimation error on each class can help reduce the estimation error on the test adequacy of the original inputs based on DNN-based coverage criteria (especially for the criterion NC and the criterion TKNC). (3) There still exists great room for performance improvement (i.e., 7.637% improvement on all the classes and 12.833% improvement on each class) after comparing the TIS-DNN method PACE with approximately optimal solutions.The above experimental findings implicate there is still a long way for the TIS-DNN issue to go. Given this, we present observations about the road ahead for this issue.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI