Software systems over time suffer from a gradual quality decay and therefore costs rise if no pro-active countermeasures are taken.Quality controlling is the first step to avoid this cost trap.Continuous quality assessments enable the early identification of quality problems, when their removal is still inexpensive, and aid in making adequate decisions as they provide an integrated view on the current status of a software system.As a side effect, continuous and timely feedback enables developers and maintainers to improve their skills and thereby helps to avoid future quality defects.To make regular quality controlling feasible, it has to be highly automated and assessment results need to be presented in an aggregated manner to not overwhelm users with too much data.This article gives an overview of tools that aim at solving these issues.As an example, we present the flexible, open source toolkit ConQAT that supports the creation of dashboards for quality controlling and report on its application.
Manual software testing is tedious and costly as it involves significant human effort. Yet, it is still widely applied in industry and will be in the foreseeable future. Although there is arguably a great need for optimization of manual testing processes, research focuses mostly on optimization techniques for automated tests. Accordingly, there is no precise understanding of the practices and processes of manual testing in industry nor about pitfalls and optimization potential that is untapped. To shed light on this issue, we conducted a survey among 38 testing professionals from 16 companies, to investigate their manual testing processes and to identify potential for optimization. We synthesize guidelines when optimization techniques from automated testing can be implemented for manual testing. By means of case studies on two industrial software projects, we show that fault detection likelihood, test feedback time and test creation efforts can be improved when following our guidelines.
Due to their pivotal role in software engineering, considerable effort is spent on the quality assurance of software requirements specifications. As they are mainly described in natural language, relatively few means of automated quality assessment exist. However, we found that clone detection, a technique widely applied to source code, is promising to assess one important quality aspect in an automated way, namely redundancy that stems from copy&paste operations. This paper describes a large-scale case study that applied clone detection to 28 requirements specifications with a total of 8,667 pages. We report on the amount of redundancy found in real-world specifications, discuss its nature as well as its consequences and evaluate in how far existing code clone detection approaches can be applied to assess the quality of requirements specifications in practice.
Code review is the manual assessment of source code by humans, mainly intended to identify defects and quality problems. Modern Code Review (MCR), a lightweight variant of the code inspections investigated since the 1970s, prevails today both in industry and open-source software (OSS) systems. The objective of this paper is to increase our understanding of the practical benefits that the MCR process produces on reviewed source code. To that end, we empirically explore the problems fixed through MCR in OSS systems. We manually classified over 1,400 changes taking place in reviewed code from two OSS projects into a validated categorization scheme. Surprisingly, results show that the types of changes due to the MCR process in OSS are strikingly similar to those in the industry and academic systems from literature, featuring the similar 75:25 ratio of maintainability-related to functional problems. We also reveal that 7–35% of review comments are discarded and that 10–22% of the changes are not triggered by an explicit review comment. Patterns emerged in the review data; we investigated them revealing the technical factors that influence the number of changes due to the MCR process. We found that bug-fixing tasks lead to fewer changes and tasks with more altered files and a higher code churn have more changes. Contrary to intuition, the person of the reviewer had no impact on the number of changes.
Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually are in detecting regression faults. Our goal was to evaluate the validity of code coverage as a measure for test effectiveness. To do so, we conducted an empirical study in which we applied an extreme mutation testing approach to analyze the tests of open-source projects written in Java. We assessed the ratio of pseudo-tested methods (those tested in a way such that faults would not be detected) to all covered methods and judged their impact on the software project. The results show that the ratio of pseudo-tested methods is acceptable for unit tests but not for system tests (that execute large portions of the whole system). Therefore, we conclude that the coverage metric is only a valid effectiveness indicator for unit tests.
Teamscale is a software intelligence platform, that is, it creates transparency on code quality and the underlying software development process. This makes it possible for developers, testers and managers to better understand and control technical debt of their systems. In this paper, we give an overview of Teamscale and how this tool can be used in practice to control and lower technical debt in the long run. We explain which code analyses can be used to identify and address technical debt. Teamscale is available for free for research and teaching purposes at www.teamscale.io.
A significant amount of source code in software systems consists of comments, i. e., parts of the code which are ignored by the compiler. Comments in code represent a main source for system documentation and are hence key for source code understanding with respect to development and maintenance. Although many software developers consider comments to be crucial for program understanding, existing approaches for software quality analysis ignore system commenting or make only quantitative claims. Hence, current quality analyzes do not take a significant part of the software into account. In this work, we present a first detailed approach for quality analysis and assessment of code comments. The approach provides a model for comment quality which is based on different comment categories. To categorize comments, we use machine learning on Java and C/C++ programs. The model comprises different quality aspects: by providing metrics tailored to suit specific categories, we show how quality aspects of the model can be assessed. The validity of the metrics is evaluated with a survey among 16 experienced software developers, a case study demonstrates the relevance of the metrics in practice.
Regression testing analyzes whether software maintenance has inadvertently broken existing functionality. Since it is costly - especially for manual testing - it is typically limited to a subset of test cases. Since impact analysis of code modifications on test cases is far from trivial for real world software, regression test selection is hard. However, if it misses affected test cases, bugs may remain unnoticed. In response, the research community has proposed numerous test selection approaches. Regression test selection is especially relevant for manual tests, since their execution costs limit the number of tests that can be executed in practice. However, evaluations of existing work focus on automated tests. Its applicability to manual tests is thus unclear. We present an industrial case study that demonstrates the challenges that regression test selection techniques face when applied to manual system tests. Furthermore, we sketch how, given these challenges, manual regression test selection can be improved.