Using Big Data Technologies with Earth Science Data in HDF5: HDF5 Scalable Solutions

2019 
HDF5 (Hierarchical Data Format 5) is open-source, high-performance software that consists of an abstract data model, library, and fileformat used for storing and managing extremely large and/or complex data collections. NASA Earth Observing System (EOS) Data and Information Systems use HDF5 as an archival format to store remote sensing data from EOS satellites. HDF5 is also used to store other types of Geoscience and Strophysical data, e.g., seismic data and data from Low-Frequency Array (LOFAR) radio telescopes. Data stored in HDF5 has reached tens of petabytes and is growing at an accelerated rate.With the growing amout of HDF5 Earth Science data to analyze and process, scientists need to adopt big data technologies including new storage paradigms such as cloud and object storage. To run models and perform data analysis they also need to utilizied efficient and diverse ways to access data, from high-performance computing's (HPC) Message Passing Interface (MPI) I/O and deep memory hierarchies (DMH) to non-HPC frameworks such as Apache Hadoop, Spark, and Drill. The HDF Group continually works to enable usage of big data technologies in HDF software.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []