Dynamic binary translation (DBT) has attracted much attention as a powerful technique for the runtime adaptation of software among different ISAs. It offers unprecedented flexibility in the control and modification of a program during the runtime. However, its inherent high overhead has perplexed researchers for many years. In order to reduce the overhead of DBT, this paper presents a dynamic-static combined approach to reorganize the layout of software cache. Under this approach, we first employ an emulating execution to collect the profile information and the translated target code. Especially, the path of execution flow will be tracked. In the static phase, based on the profile information collected in the previous stage, we first use the method of code replicating to build the traces, and then reorganize the layout of the target code by putting the hottest traces at the top of the software cache. Because of exact prediction and improved locality, the execution stream will concentrate on a small area with less control transfer. This approach can greatly reduce the overhead of DBT on the condition that the program runs repeatedly. Experimental results on executing the SPEC 2000 benchmarks show that our approach can reduce more than 30% run time on average.
Clustering tries to find the natural structure of input datasets and partitions them into groups or clusters. As an unsupervised pattern classification method, it has been widely used in data mining, pattern recognition, image processing and so on. However, many of the existing clustering algorithms are suffering from many obstacles, such as low efficiency, poor clustering accuracy, more sensitive to noise points and cannot deal with complex big data properly. Aiming at these problems, an improved K-means algorithm (Grid-K-means) is firstly proposed. In the algorithm, dynamically changing grids operations are used to substitute data point operations to improve the clustering efficiency and reduce the number of manually setting initial parameters. Meanwhile, by utilizing grids with the highest density to determine the initial clustering centers, more accurate and stable clustering results are acquired. Then, based on the idea of utilizing grid as the weighted representative point to process the dataset, a new clustering validity index (BCVI) is introduced to better evaluate the quality of clustering results. BCVI can quickly determine the optimal clustering number especially for large-scale datasets. Experimental results on testing 5 simulated datasets (including two large sample data sets) have demonstrated that the Grid-K-means algorithm is faster and more accurate than the traditional ones. Meanwhile, the clustering results are evaluated by our BCVI and 6 other existing clustering validity indexes. The experimental results have also shown that the new BCVI is superior to traditional indexes in data processing speed and stability.