Deep learning library testing via effective model generation

2020 
Deep learning (DL) techniques are rapidly developed and have been widely adopted in practice. However, similar to traditional software systems, DL systems also contain bugs, which could cause serious impacts especially in safety-critical domains. Recently, many research approaches have focused on testing DL models, while little attention has been paid for testing DL libraries, which is the basis of building DL models and directly affects the behavior of DL systems. In this work, we propose a novel approach, LEMON, to testing DL libraries. In particular, we (1) design a series of mutation rules for DL models, with the purpose of exploring different invoking sequences of library code and hard-to-trigger behaviors; and (2) propose a heuristic strategy to guide the model generation process towards the direction of amplifying the inconsistent degrees of the inconsistencies between different DL libraries caused by bugs, so as to mitigate the impact of potential noise introduced by uncertain factors in DL libraries. We conducted an empirical study to evaluate the effectiveness of LEMON with 20 release versions of 4 widely-used DL libraries, i.e., TensorFlow, Theano, CNTK, MXNet. The results demonstrate that LEMON detected 24 new bugs in the latest release versions of these libraries, where 7 bugs have been confirmed and one bug has been fixed by developers. Besides, the results confirm that the heuristic strategy for model generation indeed effectively guides LEMON in amplifying the inconsistent degrees for bugs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    12
    Citations
    NaN
    KQI
    []