Text Mining in Program Code
2009
Searching for frequent pieces in a database with some sort of text is a wellknown problem. A special sort of text is program code as e.g. C++ or machine code for embedded systems. Filtering out duplicates in large software projects leads to more understandable programs and helps avoiding mistakes when reengineering the program. On embedded systems the size of the machine code is an important issue. To ensure small programs, duplicates must be avoided. Fast program execution can be ensured, when frequently used duplicates are encoded in hardware. The most successful approaches for finding code duplicates are based on graphs representing the data and control flow of the program and graph mining algorithms. Compared to applications of suffix tries on the code or fingerprinting, where some kind of special form of program parts is calculated, more duplicates are found.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
20
References
1
Citations
NaN
KQI