On the Optimization of Iterative Programming with Distributed Data Collections

2020 
Big data programming frameworks are becoming increasinglyimportant for the development of applications, for which performanceand scalability are critical. In those complex frameworks, optimizing codeby hand is hard and time-consuming, making automated optimizationparticularly necessary. In order to automate optimization, a prerequisite isto find suitable abstractions to represent programs; for instance, algebrasbased on monads or monoids to represent distributed data collections.Currently, however, such algebras do not represent recursive programs ina way which allows analyzing or rewriting them. In this paper, we extenda monoid algebra with a fixpoint operator for representing recursion as afirst class citizen and show how it allows new optimizations. Experimentswith the Spark platform illustrate performance gains brought by thesesystematic optimizations
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []