Extracting Knowledge from Open Source Projects to Improve Program Security

2018 
Open source repositories contain a wealth of unstructured and unlabeled data from which useful knowledge can be extracted. This knowledge can be applied in a wide range of applications such as discovering how programmers improve their programs over time and finding patterns to detect and mitigate vulnerabilities. In this work, we propose to use text mining and machine learning to extract knowledge from open source code in order to categorize and structure source code. By mining a subset (over 600,000 Java files) of a 2011 dataset that contains over 70,000 open source projects, we present a case study showing that useful patterns can be extracted from source code and that these patterns can be used to create a recommender system to help programmers avoid unsafe practices. We demonstrate the utility of our proposed techniques by applying them to the detection of SOL Injection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    3
    Citations
    NaN
    KQI
    []