Automated data function extraction from textual requirements by leveraging semi-supervised CRF and language model

2021 
Abstract Context: Function Point Analysis (FPA) provides an objective, comparative measure for size estimation in the early stage of software development. When practicing FPA, analysts typically abide by the following steps: data function (DF) extraction, transactional function extraction, function type classification and adjustment factor determination. However, due to lack of approach and tool support, these steps are usually conduct by human efforts in practice. Related approaches can hardly be applied in the FPA due to the following three challenges, i.e., FPA rule-driven extraction, domain-specific parsing, and expensive labeled resources. Objective: In this paper, we aim to automate the extraction of DFs, which is the starting and fundamental step in FPA. Method: We propose an automated approach named DEX to extract data functions from textual requirements. Specifically, DEX introduces the popularly-used conditional random field (CRF) model to predict the boundary of a data function. Besides, DEX employs the bootstrapping-based algorithm and DF-oriented language model to further boost the performance. Results: We evaluate DEX from two aspects: evaluation on a real industrial dataset and a manual review by domain experts. The evaluation on the real industrial dataset shows that DEX could achieve 80% precision, 84% recall, and 82% F1, and outperforms three state-of-the-art baselines. The expert review suggests that DEX could increase 16% precision and 13% recall, compared with those produced by engineers. Conclusion: DEX could achieve promising results under a small number of labeled requirements and outperform the state-of-the-art approaches. Moreover, DEX could help engineers produce more accurate and complete DFs in the industrial environment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    0
    Citations
    NaN
    KQI
    []