Cross-architecture binary code similarity detection technology has been widely used in vulnerability discovery, reverse engineering and patch detection. The identification of binary file compilation information is conducive to the improvement of the accuracy in binary code similarity detection. The compilation information of binary files includes compilation architecture, compiler, optimization option and obfuscation strategy. For the compilation architecture, we build a compiling architecture feature library based on the ELF header information of the binary file for identification; for the compiler, we use Linux system commands to identify; for the optimization option and obfuscation strategy, we extract 70 static features of binary file function-level assembly language and establish a genetic neural network model for identification. In addition, we set up five experimental tasks to learn more about the compilation architecture and compiler impact on model identification optimization options and obfuscation strategies. The final experimental results show that the accuracy of the binary file compilation information identification model designed by us is 100% for both compilation architectures and compilers identification, and the F-Score for optimization options identification can reach 89.46%. The F-Score for obfuscation strategies identification can reach 88.74%, and the F-Score for simultaneous identification of optimization options and obfuscation strategies can reach 84.28%, which is significantly better than previous works.
Cross-architecture binary code similarity detection plays an important role in different security domains. In view of the low accuracy and poor scalability of existing cross-architecture detection technologies, we propose Optir-SBERT, which is the first technology to detect cross-architecture binary code similarity based on optimized LLVM IR. At the same time, we design a new data set BinaryIR, which is more diverse and provides a benchmark data set for subsequent research work based on LLVM IR. In terms of cross-architecture binary code similarity detection, the accuracy of Optir-SBERT reaches 94.38%, and the contribution of optimization is 3.99%. In terms of vulnerability detection, the average accuracy of Optir-SBERT reach 93.9%, and the contribution of optimization is 7%. The results are better than existing state-of-the-art (SOTA) cross-architecture detection technologies. In order to improve the efficiency of vulnerability detection in realistic scenarios, we introduced a file-level vulnerability identification mechanism on the basis of Optir-SBERT. The new model Optir-SBERT-F saved 45.36% of the detection time on the premise of a slight decrease in detection F value, which greatly improves the efficiency of vulnerability detection.
Binary function similarity analysis evaluates the similarity of functions at the binary level to aid program analysis, which is popular in many fields, such as vulnerability detection, binary clone detection, and malware detection. Graph-based methods have relatively good performance in practice, but currently, they cannot capture similarity in the aspect of the graph position distribution and lose information in graph processing, which leads to low accuracy. This paper presents PDM, a graph-based method to increase the accuracy of binary function similarity detection, by considering position distribution information. First, an enhanced Attributed Control Flow Graph (ACFG+) of a function is constructed based on a control flow graph, assisted by the instruction embedding technique and data flow analysis. Then, ACFG+ is fed to a graph embedding model using the CapsGNN and DiffPool mechanisms, to enrich information in graph processing by considering the position distribution. The model outputs the corresponding embedding vector, and we can calculate the similarity between different function embeddings using the cosine distance. Similarity detection is completed in the Siamese network. Experiments show that compared with VulSeeker and PalmTree+VulSeeker, PDM can stably obtain three-times and two-times higher accuracy, respectively, in binary function similarity detection and can detect up to six-times more results in vulnerability detection. When comparing with some state-of-the-art tools, PDM has comparable Top-5, Top-10, and Top-20 ranking results with respect to BinDiff, Diaphora, and Kam1n0 and significant advantages in the Top-50, Top-100, and Top-200 detection results.
In real-world combat scenarios, decisionmaking serves as the central dynamic between conflicting parties. The introduction of innovative technical means of decision support in combat situations is therefore of the utmost importance. Reinforcement learning algorithms offer the capability to optimize strategies through ongoing interaction with the environment, ultimately deriving the most effective action plan. This optimised strategy aims to improve command effectiveness by providing better support for commanders in making combat decisions. Our study focuses on historical war incidents in the Russian-Ukrainian conflict. We develop an intelligent decision-making system using the classical Q-learning algorithm in reinforcement learning. Taking into account the battlefield environment, the system uses input data to assess the basic condition of the troops of both factions. By evaluating the Q-values of each action in the resulting Q-table, the system provides training and decision-making suggestions. Ultimately, the goal is to provide valuable support to leaders to help them make informed combat decisions.
The intermediate representation has a natural advantage in solving the problem of cross-architecture binary code similarity detection, which can greatly reduce the amount of data required for machine learning model training and improve the versatility and scalability of the model. optimizing the intermediate representation can solve the problem of binary differences caused by different compilation architectures, compilers, optimization options, and obfuscation strategies, which is conducive to improving the accuracy of binary code similarity detection. We explored the effects of compilation architectures, compilers, optimization options and obfuscation strategies on the optimization of intermediate representations through four experiments. The experimental results show that the compilers GCC and Clang at the time of binary file compilation have a significant improvement on the optimization effect of intermediate representation. In addition, the optimization effect using opt-Ol and opt-O2 is the best, and the similarity can be improved by up to 21.7%. The compiler architectures XS6_32, XS6_64, ARM32, ARM64 and optimization options −OO, −Ol, −O2, −O3 have little effect on the optimization effect of the intermediate representation. The compiler architecture MIP32 and obfuscation strategies F1a, Sub, Bcf, All have a negative effect on the optimization effect of intermediate representation.