Joint feature representation learning and progressive distribution matching for cross-project defect prediction

2021 
Abstract Context: Cross-Project Defect Prediction (CPDP) aims to leverage the knowledge from label-rich source software projects to promote tasks in a label-poor target software project. Existing CPDP methods have two major flaws. One is that previous CPDP methods only consider global feature representation and ignores local relationship between instances in the same category from different projects, resulting in ambiguous predictions near the decision boundary. The other one is that CPDP methods based on pseudo-labels assume that the conditional distribution can be well matched at one stroke, when instances of target project are correctly annotated pseudo labels. However, due to the great gap between projects, the pseudo-labels seriously deviate from the real labels. Objective: To address above issues, this paper proposed a novel CPDP method named Joint Feature Representation with Double Marginalized Denoising Autoencoders (DMDA_JFR). Method: Our method mainly includes two parts: joint feature representation learning and progressive distribution matching. We utilize two novel autoencoders to jointly learn the global and local feature representations simultaneously. To achieve progressive distribution matching, we introduce a repetitious pseudo-labels strategy, which makes it possible that distributions are matched after each stack layer learning rather than in one stroke. Results: The effectiveness of the proposed method was evaluated through experiments conducted on 10 open-source projects, including 29 software releases from PROMISE repository. Overall, experimental results show that our proposed method outperformed several state-of-the-art baseline CPDP methods. Conclusions: It can be concluded that (1) joint deep representations are promising for CPDP compared with only considering global feature representation methods, (2) progressive distribution matching is more effective for adapting probability distributions in CPDP compared with existing CPDP methods based on pseudo-labels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    1
    Citations
    NaN
    KQI
    []