MONTRES : Merge ON-the-Run External Sorting Algorithm for Large Data Volumes on SSD Based Storage Systems

2017 
External sorting algorithms are commonly used by data-centric applications to sort quantities of data that are larger than the main-memory. Many external sorting algorithms were proposed in state-of-the-art studies to take advantage of SSD performance properties to accelerate the sorting process. In this paper, we demonstrate that unfortunately, many of those algorithms fail to scale when it comes to increasing the dataset size under memory pressure. In order to address this issue, we propose a new sorting algorithm named MONTRES. MONTRES relies on SSD performance model while decreasing the overall number of I/O operations. It does this by reducing the amount of temporary data generated during the sorting process by continuously evicting small values in the final sorted file. MONTRES scales well with growing datasets under memory pressure. We tested MONTRES using several data distributions, different amounts of main-memory workspace and three SSD models. Results showed that MONTRES outperforms state-of-the-art algorithms as it reduces the sorting execution time of TPC-H datasets by more than 30 percent when the file size to main-memory size ratio is high.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    12
    Citations
    NaN
    KQI
    []