Research on Improvement of Parallel Apriori Algorithm Based on Boolean Matrix and Weight

2019 
Aiming at the defect that Apriori algorithm needs to traverse the database multiple times, an improved algorithm based on Boolean matrix and weight WF_Apriori(Weight Function Apriori) is proposed. This algorithm will add weight rows to the matrix to trim out duplicate transactions and compress the stored matrix, saving the time to scan the transaction set, making full use of the intersection between rows and rows, avoiding the k-1 item set The self-joining operation of the k-item set enables one of the required k-term frequent sets to be obtained at one time. Based on Hadoop, the improved algorithm is parallelized. The large matrix is divided into small matrices, parallelizing the processing of small matrices, reducing the time complexity of the algorithm, making the algorithm more efficient and increasing the practicability of the algorithm. The experimental results show that the improved algorithm greatly shortens the processing time in the big data environment, improves the mining efficiency of the algorithm, and achieves the expected goal.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []