SideWalk: A Facility of Lightweight Out-of-Band Communications for Augmenting Distributed Data Processing Flows

2015 
The foundation of a data processing engine running on a large cluster is its programming model that defines data processing operations and data movements. A special kind of communication activities that are not normally defined in the programming model but are often used in ad hoc ways in system development, is called out-of-band communications. The existing ad hoc solutions of out-of-band communications are often hard to reuse, error-prone, and not free from unwanted side effects. To address these issues, we have designed and implemented a standalone facility of out-of-band communications called SideWalk. With this facility, users can add out-of-band communication operations into their distributed data flows through a set of reusable APIs. These APIs have well defined semantics and thus, users' chances of writing error-prone programs with SideWalk are minimized. To prevent users from introducing unwanted side effects while using SideWalk, we prototype SideWalk to efficiently handle lightweight out-of-band communications and we restrict communication patterns that can be conducted through SideWalk without affecting the applicability of SideWalk on typical use cases. Our experimental results show that execution times of distributed data processing flows in a Hadoop environment with out-of-band communications implemented with SideWalk are reduced up to 1.53 times compared with that of distributed data processing flows with out-of-band communications implemented with a representative ad hoc solution.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []