language-icon Old Web
English
Sign In

Text Mining in Program Code

2009 
Searching for frequent pieces in a database with some sort of text is a wellknown problem. A special sort of text is program code as e.g. C++ or machine code for embedded systems. Filtering out duplicates in large software projects leads to more understandable programs and helps avoiding mistakes when reengineering the program. On embedded systems the size of the machine code is an important issue. To ensure small programs, duplicates must be avoided. Fast program execution can be ensured, when frequently used duplicates are encoded in hardware. The most successful approaches for finding code duplicates are based on graphs representing the data and control flow of the program and graph mining algorithms. Compared to applications of suffix tries on the code or fingerprinting, where some kind of special form of program parts is calculated, more duplicates are found.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []