Linked Enterprise Data Model and Its Use in Real Time Analytics and Context-Driven Data Discovery

2015 
Raditional approaches for managing enterprise data revolve around a batch driven Extract Transform Load process, a one size fits all approach for storage, and an application architecture that is tightly coupled to the underlying data infrastructure. The emergence of Big Data technologies have led to the creation of alternate instantiations of the traditional approach, one where the storage systems have moved from relational databases to NoSQL technologies like HDFS. This approach to data management has been found wanting as enterprises begin to deal with complex and heterogeneous data, especially in the area of Internet of Things (IoT). IoT environments are characterized by data producers and data processing requirements. In this paper, we articulate the shortcomings of traditional approaches to data management in the context of IoT. We identify the challenges brought forth due to content heterogeneity, requirements of scale, and robustness of ETL processes, and the need to rapidly onboard and support multiple applications such as analytics. Our approach introduces the Linked Enterprise Data Model (LEDM), a knowledge representation approach derived from Linked Data for modeling and linking the disparate aspects of data infrastructure. We leverage this model in developing a scalable and robust ETL framework. The framework adopts the Lambda architecture approach and supports both stream and batch processing of incoming data. We build this capability for the streaming leg of the Lambda architecture comprising of Amazon Kinesis, Apache Spark Streaming, and Amazon Dynamo.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    6
    Citations
    NaN
    KQI
    []