Scalable Implementations of Rough Set Algorithms: A Survey

2018 
With the rapid change of volume, variety, and velocity of data across real-life domains, learning from big data has become a growing challenge. Rough set theory has been successfully applied to knowledge discovery from databases (KDD) for handling data with imperfections. Most traditional rough set algorithms were implemented in a sequential manner and ran on a single machine, becoming computationally expensive and inefficient for handling massive data. Recent computing frameworks, such as MapReduce and Apache Spark, made it possible to realize parallel rough set algorithms on distributed clusters of commodity computers and speed up big data analyses. Although a variety of scalable rough set implementations have been developed, (1) most proposed research compared their work with outdated sequential implementations; (2) certain distributed computing frameworks were used more frequently, overlooking recently developed frameworks; and (3) existing issues and guidance in adapting new computing frameworks are lacking. The main objective of this paper is to provide current state-of-the-art scalable implementations of rough set algorithms. This paper will help researchers catch up with the recent developments in this field and further provide some insights to develop rough set algorithms in up-to-date high performance computing environments for big data analytics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    1
    Citations
    NaN
    KQI
    []