Retrieval of Frequent Itemset Using Improved Mining Algorithm in Hadoop

2021 
Today in parallel mining, extraction of frequent patterns from a huge dataset in a short time is a very difficult task. Frequent pattern mining not only plays a vital role in framing of association rule but plays an important role in effective classification and clustering also. Apriori, FP-Growth, and Eclat algorithms are very basic algorithms in frequent patterns mining but they are lagging in balancing of workload, fault-tolerance, and synchronization. To overcome this, recently proposed algorithm focuses on parallelization of a large number of machines in a distributed computing environment using MapReduce framework. For contributing in this case, we propose improved frequent itemset mining algorithm. This algorithm helps in finding frequent itemset from a huge dataset. It uses the concept of clustering for effective utilization of space and easy retrieval, in which large pattern sets are divided into discrete and uniform clusters, and each cluster is characterized by its center point. For pattern matching, we use FP-Growth algorithm. We are considering parameters like time and accuracy for comparing the existing system with the proposed system. Finally, we show that the proposed system is more accurate and requires less time to find frequent itemset. We have used online Retail Dataset, as we have a large amount of data for mining, we have extracted items that were bought by each customer. The existing system takes 99 s for discovering frequently occurred items. While our new approach for finding frequent itemsets takes very less time. It takes only 10 s, so it saves time more efficiently. Our new technique is implemented with the help of Locality Sensitive hashing technique, moving k-means, and FP-Growth algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    2
    Citations
    NaN
    KQI
    []