Shuyue Li

Xi'an Jiaotong University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Ting Liu

Xi'an Jiaotong University

Jian–Guang Lou

Microsoft Research Asia (China)

Jiaqi Guo

Taiyuan University of Science and Technology

Ming Fan

Xi'an Jiaotong University

Dongmei Zhang

Microsoft Research Asia (China)

Qinghua Zheng

Ministry of Agriculture and Rural Affairs

Xiao Yan

Zhejiang Energy Research Institute

Xiapu Luo

Hong Kong Polytechnic University

Dejian Yang

Dalian University of Technology

Jun Liu

Xi'an Jiaotong University

Cooperative Institutions

Xi'an Jiaotong University

Microsoft Research Asia (China)

Tsinghua University

Central South University

Hong Kong Polytechnic University

Dalian University of Technology

Ministry of Agriculture and Rural Affairs

Tongji University

University of Hong Kong

Peking University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Testing Machine Learning Systems in Industry: An Empirical Study

Shuyue Li Jiaqi Guo Jian–Guang Lou Ming Fan Ting Liu

Machine learning becomes increasingly prevalent and integrated into a wide range of software systems. These systems, named ML systems, must be adequately tested to gain confidence that they behave correctly. Although many research efforts have been devoted to testing technologies for ML systems, the industrial teams are faced with new challenges on testing the ML systems in real-world settings. To absorb inspirations from the industry on the problems in ML testing, we conducted an empirical study including a survey with 87 responses and interviews with 7 senior ML practitioners from well-known IT companies. Our study uncovers significant industrial concerns on major testing activities, i.e., test data collection, test execution, and test result analysis, and also the good practices and open challenges from the perspective of the industry. (1) Test data collection is conducted in different ways on ML model, data, and code and faced with different challenges. (2) Test execution in ML systems suffers from two major problems: entanglement among the components and the regression on model performance. (3) Test result analysis centers on quantitative methods, e.g., metric-based evaluation, and is combined with some qualitative methods based on practitioners' experience. Based on our findings, we highlight the research opportunities and also provide some implications for practitioners.

Empirical Research

10.1109/icse-seip55303.2022.9793981

Cite

Citations (4)

How to manage a task-oriented virtual assistant software project: an experience report

Frontiers of Information Technology & Electronic Engineering (2022)

Shuyue Li Jiaqi Guo Yan Gao Jian–Guang Lou Dejian Yang

Software project management

10.1631/fitee.2100467

Cite

Citations (1)

Automated bug reproduction from user reviews for Android applications

Shuyue Li Jiaqi Guo Ming Fan Jian–Guang Lou Qinghua Zheng

Bug-related user reviews of mobile applications have negative influence on their reputation and competence, and thus these reviews are highly regarded by developers. Before bug fixing, developers need to manually reproduce the bugs reported in user reviews, which is an extremely time-consuming and tedious task. Hence, it is highly expected to automate this process. However, it is challenging to do so since user reviews are hard to understand and poorly informative for bug reproduction (especially lack of reproduction steps). In this paper, we propose RepRev to automatically Reproduce Android application bugs from user Reviews. Specifically, RepRev leverages natural language processing techniques to extract valuable information for bug reproduction. Then, it ranks GUI components by semantic similarity with the user review and dynamically searches on apps with a novel one-step exploration technique. To evaluate RepRev, we construct a benchmark including 63 crash-related user reviews from Google Play, which have been reproduced successfully by three graduate students. On this benchmark, RepRev presents comparable performance with humans, which successfully reproduces 44 user reviews in our benchmark (about 70%) with 432.2 seconds average time. We make the implementation of our approach publicly available, along with the artifacts and experimental data we used [4].

Benchmark (surveying)

Software bug

10.1145/3377813.3381355

Cite

Citations (9)

Sara: self-replay augmented record and replay for Android in industrial cases

Jiaqi Guo Shuyue Li Jian–Guang Lou Zijiang Yang Ting Liu

Record-and-replay tools are indispensable for quality assurance of mobile applications. Due to its importance, an increasing number of tools are being developed to record and replay user interactions for Android. However, by conducting an empirical study of various existing tools in industrial settings, researchers have revealed a gap between the characteristics requested from industry and the performance of publicly available record-and-replay tools. The study concludes that no existing tools under evaluation are sufficient for industrial applications. In this paper, we present a record-and-replay tool called SARA towards bridging the gap and targeting a wide adoption. Specifically, a dynamic instrumentation technique is used to accommodate rich sources of inputs in the application layer satisfying various constraints requested from industry. A self-replay mechanism is proposed to record more information of user inputs for accurate replaying without degrading user experience. In addition, an adaptive replay method is designed to enable replaying events on different devices with diverse screen sizes and OS versions. Through an evaluation on 53 highly popular industrial Android applications and 265 common usage scenarios, we demonstrate the effectiveness of SARA in recording and replaying rich sources of inputs on the same or different devices.

10.1145/3293882.3330557

Cite

Citations (29)

Testing machine learning systems in industry

Shuyue Li Jiaqi Guo Jian–Guang Lou Ming Fan Ting Liu

Empirical Research

Regression testing

10.1145/3510457.3513036

Cite

Citations (4)

An Empirical Evaluation of GDPR Compliance Violations in Android mHealth Apps

Ming Fan Le Yu Sen Chen Hao Zhou Xiapu Luo

The purpose of the General Data Protection Regulation (GDPR) is to provide improved privacy protection. If an app controls personal data from users, it needs to be compliant with GDPR. However, GDPR lists general rules rather than exact step-by-step guidelines about how to develop an app that fulfills the requirements. Therefore, there may exist GDPR compliance violations in existing apps, which would pose severe privacy threats to app users. In this paper, we take mobile health applications (mHealth apps) as a peephole to examine the status quo of GDPR compliance in Android apps. We first propose an automated system, named HPDROID, to bridge the semantic gap between the general rules of GDPR and the app implementations by identifying the data practices declared in the app privacy policy and the data relevant behaviors in the app code. Then, based on HPDROID, we detect three kinds of GDPR compliance violations, including the incompleteness of privacy policy, the inconsistency of data collections, and the insecurity of data transmission. We perform an empirical evaluation of 796 mHealth apps. The results reveal that 189 (23.7%) of them do not provide complete privacy policies. Moreover, 59 apps collect sensitive data through different measures, but 46 (77.9%) of them contain at least one inconsistent collection behavior. Even worse, among the 59 apps, only 8 apps try to ensure the transmission security of collected data. However, all of them contain at least one encryption or SSL misuse. Our work exposes severe privacy issues to raise awareness of privacy protection for app users and developers.

mHealth

Immutability

10.1109/issre5003.2020.00032

Cite

Citations (54)