A Hybrid Salp Swarm Algorithm with $$\beta $$-Hill Climbing Algorithm for Text Documents Clustering
2021
Recently, the researchers’ attention became more interest in partitioning particular sets of documents into various subsets, due to the massive number of documents that make pattern recognition, information retrieval, and text mining more complicated. This problem is known as a text clustering problem (TCD). Several metaheuristic optimization algorithms have been adapted to address TDC optimally. A new efficient metaheuristic optimization algorithm mimics the behavior of salps in oceans known as the salp swarm algorithm (SSA) has been proposed and adapted to address different optimization problems. However, hybridizing optimization algorithms with another algorithm is becoming the focus of scholars to obtain a superior solution for the optimization problems. In this paper, a new hybrid optimization method of SSA algorithm and a well-known metaheuristic optimization algorithm called \(\beta \)-hill climbing algorithm (BHC), namely H-SSA, is proposed. The main aims of the proposed method to improve the quality of initial candidate solutions and enhance the SSA in terms of local search ability and convergence speed in attempting for optimal partitioning of the cluster. The proposed H-SSA performance is tested in the data cluster field using five standard datasets. In addition, the proposed method is tested using two scientific articles’ datasets, and six standard text datasets in the text document clustering domain. The experiment results show that the proposed method boosted the solutions in terms of convergence rate, recall, precision, F-measure, accuracy, entropy, and purity criteria. For comparative evaluation, the proposed H-SSA compared with the pure SSA algorithm and well-known clustering techniques like DBSCAN, agglomerative, spectral, k-means++ k-means clustering techniques and the optimization algorithms like KHA, PSO, GA, HS, CMAES, COA, and MVO. The comparative results prove the efficiency of the proposed method, where it exhibited and yielded better performance than the compared algorithms and techniques.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
67
References
0
Citations
NaN
KQI