Efficient testing of GUI applications by event sequence reduction

2021 
Abstract Automatic event sequence generation tools are widely used for testing GUI applications. With these tools, developers can easily test the target GUI applications with a large number of events and collect a group of crash-triggering sequences in a short time. However, some efficiency-oriented tools generate low-level events randomly based on coordinates of the screen instead of widgets, which leads to many ineffective events that have no contribution to the test. Besides, the randomly generated sequences may repeatedly operate on the same widget or jump to the same window, which increases the complexity of sequences and makes it difficult to extract key events that can lead to crashes. The sequence reduction technique can effectively help developers to understand the crashes and further improve the quality of code. In this paper, we propose a general model for the event sequence reduction problem on GUI applications. For better illustration, we take the random test generation tool Monkey as a concrete instance, which is widely used for testing Android applications, owing to its simplicity, effectiveness and good compatibility. To address the major drawbacks in original Monkey testing, in this paper, we attempt to enhance Monkey to support the sequence record-and-replay and propose a sequence reduction approach for Android apps, which helps the crash behavior comprehension and fault localization. By manually investigating the effectiveness of Monkey events, we find three types of ineffective events, including no-ops, single and combination of effect-free ones, and design nine reduction rules for them. To extract key events in one sequence for crash understanding, we analyze the state transition relation among events and propose a static GUI state hierarchy-tree-guided reduction approach. Additionally, we implement our approach in a tool CHARD to achieve event sequence reduction on real-world apps. We also design a semi-structured format to describe the actual behavior of events and improve the sequence comprehensibility. We collect 890 sequences from 74 applications as our benchmark, including 740 basic sequences, each of which contains 1,000 events, and 150 longer ones, each of which contains 10,000 events. CHARD can quickly identify 41.3% events as ineffective ones in the collected sequences. For sequences that can be stably replayed, over 94% of the reduced sequences keep the same functionalities as the original ones. By removing ineffective events, CHARD can be used as a pre-process part of the traditional delta-debugging process and make significant speed up. To evaluate the effectiveness of the key event extraction approach, we pick eight buggy applications and collect 40 crash-triggering event sequences generated by Monkey, the length of which varies from 19 to 2700. The results show that CHARD can successfully remove over 95.4% crash-irrelevant events in these crash-triggering sequences within around ten seconds, while the state-of-the-art delta-debugging tool removes 71.3% ones using over 27 hours, which indicates that CHARD can efficiently help the crash replay and sequence comprehension.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    1
    Citations
    NaN
    KQI
    []