Detection of Multiple Function Dependency Violations for Distributed Big Data

2019 
It is usually necessary to move data from one site to another when detecting function dependency violations under distributed data environment, which leads to low efficiency in big data processing. In this paper, a novel detection method of multiple function dependency violations was proposed based on the concept of equivalence class, and a response time cost model for the method was provided. Because it is a NP-hard for function dependency violation detection to allocate tasks under distributed environment, we converted response time minimum of violation detection into an integer programming problem, and provided near-optimal solution. Aiming at the different cluster scale and the number of function dependencies, different task assignment policies were provided, and load balancing problem was also considered adequately. The experimental results on real and artificial data set show that, compared to the centralized detection methods on Hadoop 2.0, the proposed method in the paper has an obvious efficiency promotion and good extensibility in big data processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []