Scalable analytics of air quality batches with Apache Spark and Apache Sedona

2021 
According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []