Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model
2018
High-performance computing (HPC) systems generate huge amounts of metadata about different entities such as jobs, users, and files. Existing systems can efficiently record and manage part of these metadata, mainly the POSIX metadata of data files (e.g., file size, name, and permissions mode). But another important set of metadata, referred to as “rich” metadata in this study, which record not only wider range of entities (e.g., running processes and jobs) but also more complex relationships between them, are mostly missing in current HPC systems. Yet such rich metadata are critical for supporting many advanced data management functions such as identifying data sources and parameters behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. To uniformly and efficiently manage the rich metadata generated in HPC systems, We propose to utilize a graph model in this study. We identify the key challenges of implementing such a graph-based HPC rich metadata management system and present GraphMeta, a graph-based rich metadata management system designed and optimized for HPC platforms, to tackle these challenges. Extensive evaluations on both synthetic and real HPC metadata workloads show its advantages in both performance and scalability compared with existing solutions.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
36
References
2
Citations
NaN
KQI