Active Data: A Programming Model for Managing Big Data Life Cycle

2012 
The Big Data challenge consists in managing, storing, analyzing and visualizing these ever growing huge datasets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication, deletion... To alleviate the complexity of the data life cycle, we propose Active Data, a programming model to automate and improve the expressiveness of data management applications. We first introduce the concept of data life cycle and define a formal model based on Petri Net. We present the concept of the Active Data programming model, which allows code execution at each stage of the data life cycle. With Active Data, routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data. We implement and evaluate the model with three use cases: a storage cache to Amazon S3, a cooperative sensor network, and an incremental implementation of the MapReduce programming model. Altogether, these scenarios illustrate the adequateness of the model to program applications which manage distributed and dynamic data. We also show that applications that do not leverage on data life cycle can benefit from Active Data to improve their performances.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    9
    Citations
    NaN
    KQI
    []