Towards efficient and scalable data mining using spark

2014 
Following the requirements of discovery of valuable information from data increasing rapidly, data mining technologies have drawn people's attention for the last decade. However, the big data era makes even higher demands from the data mmmg technologies in terms of both processing speed and data amounts. Any data mmmg algorithm itself can hardly meet these requirements towards effective processing of big data, so distributed systems are proposed to be used. In this paper, a novel method of integrating a sequential pattern mmmg algorithm with a fast large-scale data processing engine Spark is proposed to mine patterns in big data. We use the well-known algorithm PrefixSpan as an example to demonstrate how this method helps handle massive data rapidly and conveniently. The experiments show that this method can make full use of cluster computing resources to accelerate the mmmg process, with a better performance than
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    5
    Citations
    NaN
    KQI
    []