Will Data Influence the Experiment Results?: A Replication Study of Automatic Identification of Decisions

2021 
Decisions are an important type of artifacts in software development and maintenance, while decisions are not well-documented in projects due to limited human resources and budget. To this end, many studies focus on using automatic approaches to identify decisions from textual artifacts, e.g., mailing lists, issue tracking systems. In this paper, we present a replication study of our previous work (EASE2020), which conducted experiments to automatically identify decisions from the Hibernate developer mailing list. In addition, we utilized different datasets in the experiment with the aim of exploring the impact of the proprieties of dataset (i.e., the quality of positive samples, different negative samples in the dataset, and the size of the dataset) on classification results of decisions. The results show that (1) improving the quality of positive samples in the dataset can decently improve the classification results; (2) different negative samples in the dataset have an impact on the classification results; and (3) before the dataset size reaches 1200, increasing the size will improve the classification results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    3
    Citations
    NaN
    KQI
    []