DATA MINING FOR PREDICTORS OF SOFTWARE QUALITY
1999
"Knowledge discovery in data bases" (KDD) for software engineering is a process for finding useful information in the large volumes of data that are a byproduct of software development, such as data bases for configuration management and for problem reporting. This paper presents guidelines for extracting innovative process metrics from these commonly available data bases. This paper also adapts the Classification And Regression Trees algorithm, CART, to the KDD process for software engineering data. To our knowledge, this algorithm has not been used previously for empirical software quality modeling. In particular, we present an innovative way to control the balance between misclassification rates. A KDD case study of a very large legacy telecommunications software system found that variables derived from source code, configuration management transactions, and problem reporting transactions can be useful predictors of software quality. The KDD process discovered that for this software development environment, out of forty software attributes, only a few of the predictor variables were significant. This resulted in a model that predicts whether modules are likely to have faults discovered by customers. Software developers need such predictions early in development to target software enhancement techniques to the modules that need improvement the most.
Keywords:
- Software construction
- Data mining
- Computer science
- Software peer review
- Software Engineering Process Group
- Package development process
- Backporting
- Software quality analyst
- Data science
- Software sizing
- Goal-Driven Software Development Process
- Software analytics
- Software system
- Machine learning
- Software quality
- Software metric
- Artificial intelligence
- Software verification and validation
- Software development
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
52
Citations
NaN
KQI