TrustMR: Computation integrity assurance system for MapReduce

2015 
Data and computation integrity is the major concerns for the users of MapReduce systems. Most production-level MapReduce system optimistically assume that all nodes are trustworthy. Yet, even one compromised node can corrupt the integrity of final results generated by the computation. In the literature, this problem is addressed by many different approaches, where some of them proposed to use specialpropose hardware by losing the ability to work with commodity machines, some others proposed to inject watermarking patterns by targeting only particular datasets and jobs, and others replicated the whole jobs by incurring huge overheads. In this paper, we propose a new replication-based method, which can achieve very high attack detection rates (e.g., 99.99%) while incurring only one fifth (20%) of the overhead incurred by the other competitive approaches. The method is based on the decomposition of MapReduce computation into smaller pieces (i.e., intermediate result production). A subset of these pieces are selectively generated in the replicated tasks, and this significantly reduces the network transfer of the replicated tasks. Our empirical results show that relatively small number of replicated intermediate results can provide high detection rate while considerably reducing the overhead of replication.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    17
    Citations
    NaN
    KQI
    []