An Approach to Fuzzy Clustering of Big Data Inside a Parallel Relational DBMS

2020 
Currently, despite the widespread use of numerous NoSQL systems, relational DBMSs remain the basic tool for data processing in various subject domains. Integration of data mining methods with relational DBMS is a topical issue since such an approach avoids export-import bottleneck and provides the end-user with all the built-in DBMS services. Proprietary parallel DBMSs could be a subject for integration of data mining methods but they are expensive and oriented to custom hardware that is difficult to expand. At the same time, open-source DBMSs are now being a reliable alternative to commercial DBMSs and could be seen as a subject to encapsulate parallelism. In this study, we present an approach to fuzzy clustering of very large data sets inside a PDBMS. Such a PDBMS is obtained by small-scale modifications of the original source code of an open-source serial DBMS to encapsulate partitioned parallelism. The experimental evaluation shows that the proposed approach overtakes parallel out-of-DBMS solutions with respect to export-import overhead.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    1
    Citations
    NaN
    KQI
    []