In-memory Query System for Scientific Dataseis
2015
The growing gap between compute performance and I/O bandwidth coupled with the increasing data volumes has resulted in a bottleneck to the traditional post-simulation data processing method. Hence in-situ computing and query-driven data analysis are important techniques to minimize data movement. By taking advantage of the growing memory capacity on supercomputers, we developed an in-memory query system for scientific data analysis. Our approach is a combination of bitmap indexing, spatial data layout re-organization, distributed shared memory, and location-aware parallel execution. Our evaluations using real scientific datasets showed that we can aggregate the memory capacity from thousands of computes nodes to analyze a 750GB simulation dataset without transferring data to remote nodes or storage systems. Comparing to traditional solutions based on out-of-core parallel file systems, we achieve significant higher query performance.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
27
References
7
Citations
NaN
KQI