The State of Empirical Evaluation in Static Feature Location
2019
Feature location (FL) is the task of finding the source code that implements a specific, user-observable functionality in a software system. It plays a key role in many software maintenance tasks and a wide variety of Feature Location Techniques (FLTs), which rely on source code structure or textual analysis, have been proposed by researchers. As FLTs evolve and more novel FLTs are introduced, it is important to perform comparison studies to investigate “Which are the best FLTs?” However, an initial reading of the literature suggests that performing such comparisons would be an arduous process, based on the large number of techniques to be compared, the heterogeneous nature of the empirical designs, and the lack of transparency in the literature. This article presents a systematic review of 170 FLT articles, published between the years 2000 and 2015. Results of the systematic review indicate that 95% of the articles studied are directed towards novelty, in that they propose a novel FLT. Sixty-nine percent of these novel FLTs are evaluated through standard empirical methods but, of those, only 9% use baseline technique(s) in their evaluations to allow cross comparison with other techniques. The heterogeneity of empirical evaluation is also clearly apparent: altogether, over 60 different FLT evaluation metrics are used across the 170 articles, 272 subject systems have been used, and 235 different benchmarks employed. The review also identifies numerous user input formats as contributing to the heterogeneity. Analysis of the existing research also suggests that only 27% of the FLTs presented might be reproduced from the published material. These findings suggest that comparison across the existing body of FLT evaluations is very difficult. We conclude by providing guidelines for empirical evaluation of FLTs that may ultimately help to standardise empirical research in the field, cognisant of FLTs with different goals, leveraging common practices in existing empirical evaluations and allied with rationalisations. This is seen as a step towards standardising evaluation in the field, thus facilitating comparison across FLTs.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
114
References
17
Citations
NaN
KQI