Many coupling measures have been proposed in the context of object oriented (OO) systems. In addition, due to the numerous dependencies present in OO systems, several studies have highlighted the complexity of using dependency analysis to perform impact analysis. An alternative is to investigate the construction of probabilistic decision models based on coupling measurement to support impact analysis. In addition to providing an ordering of classes where ripple effects are more likely, such an approach is simple and can be automated. In our investigation, we perform a thorough analysis on a commercial C++ system where change data has been collected over several years. We identify the coupling dimensions that seem to be significantly related to ripple effects and use these dimensions to rank classes according to their probability of containing ripple effects. We then assess the expected effectiveness of such decision models.
The use of Unified Modeling Language (UML) analysis/design models on large projects leads to a large number of interdependent UML diagrams. As software systems evolve, those diagrams undergo changes to, for instance, correct errors or address changes in the requirements. Those changes can in turn lead to subsequent changes to other elements in the UML diagrams. Impact analysis is then defined as the process of identifying the potential consequences (side-effects) of a change, and estimating what needs to be modified to accomplish a change. In this article, we propose a UML model-based approach to impact analysis that can be applied before any implementation of the changes, thus allowing an early decision-making and change planning process. We first verify that the UML diagrams are consistent (consistency check). Then changes between two different versions of a UML model are identified according to a change taxonomy, and model elements that are directly or indirectly impacted by those changes (i.e., may undergo changes) are determined using formally defined impact analysis rules (written with Object Constraint Language). A measure of distance between a changed element and potentially impacted elements is also proposed to prioritize the results of impact analysis according to their likelihood of occurrence. We also present a prototype tool that provides automated support for our impact analysis strategy, that we then apply on a case study to validate both the implementation and methodology.
This article is a first attempt towards a comprehensive, systematic methodology for class interface testing in the context of client/server relationships. The proposed approach builds on and combines existing techniques. It first consists in selecting a subset of the method sequences defined for the class testing of the client class, based on an analysis of the interactions between the client and the server methods. Coupling information is then used to determine the conditions, i.e., values for parameters and data members, under which the selected client method sequences are to be executed so as to exercise the interaction. The approach is illustrated by means of an abstract example and its cost-effectiveness is evaluated through a case study.
This technical report details our a semi-automated framework for the reverseengineering and testing of access control (AC) policies for web-based applications. In practice, AC specifications are often missing or poorly documented, leading to AC vulnerabilities. Our goal is to learn and recover AC policies from implementation, and assess them to find AC issues. Built on top of a suite of security tools, our framework automatically explores a system under test, mines domain input specifications from access request logs, and then, generates and executes more access requests using combinatorial test generation. We apply machine learning on the obtained data to characterise relevant attributes that influence access control to learn policies. Finally, the inferred policies are used for detecting AC issues, being vulnerabilities or implementation errors. We have evaluated our framework on three open-source applications with respect to correctness and completeness. The results are very promising in terms of the quality of inferred policies, more than 94% of them are correct with respect to implemented AC mechanisms. The remaining incorrect policies are mainly due to our unrefined permission classification. Moreover, a careful analysis of these policies has revealed 92 vulnerabilities, many of them are new.
Current cost estimation techniques have a number of drawbacks. For example, developing algorithmic models requires extensive past project data. Also, off-the-shelf models have been found to be difficult to calibrate but inaccurate without calibration. Informal approaches based on experienced estimators depend on estimators' availability and are not easily repeatable, as well as not being much more accurate than algorithmic techniques. We present a method for cost estimation that combines aspects of algorithmic and experiential approaches (referred to as COBRA, COst estimation, Benchmarking, and Risk Assessment). We find through a case study that cost estimates using COBRA show an average ARE of 0.09. Although we do not have the room to describe the benchmarking and risk assessment parts, the reader will find detailed information in (Briand et al., 1997).
Data quality is crucial in modern software systems, like data-driven decision support systems. However, data quality is affected by data anomalies, which represent instances that deviate from most of the data. These anomalies affect the reliability and trustworthiness of software systems, and may propagate and cause more issues. Although many anomaly detection approaches have been proposed, they mainly focus on numerical data. Moreover, the few approaches targeting anomaly detection for categorical data do not yield consistent results across datasets. In this paper, we propose a novel anomaly detection approach for categorical data named LAFF-AD (LAFF-based Anomaly Detection), which takes advantage of the learning ability of a state-of-the-art form filling tool (LAFF) to perform value inference on suspicious data. LAFF-AD runs a variant of LAFF that predicts the possible values of a suspicious categorical field in the suspicious instance. LAFF-AD then compares the output of LAFF to the recorded values in the suspicious instance, and uses a heuristic-based strategy to detect categorical data anomalies. We evaluated LAFF-AD by assessing its effectiveness and efficiency on six datasets. Our experimental results show that LAFF-AD can accurately determine a high range of data anomalies, with recall values between 0.6 and 1 and a precision value of at least 0.808. Furthermore, LAFF-AD is efficient, taking at most 7000 s and 735 ms to perform training and prediction, respectively.