An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

Lianyin Jia,Chongde Zhang,Mengjuan Li,Yinong Chen,Yong Liu,Jiaman Ding

An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

2020

Trie is one of the most common data structures for string storage and retrieval. As a fast and efficient implementation of trie, double array (DA) can effectively compress strings to reduce storage spaces. However, this method suffers from the problem of low index construction efficiency. To address this problem, we design a two-level partition (TLP) framework in this paper. We first divide the dataset is into smaller lower-level partitions, and then we merge these partitions into bigger upper-level partitions using a min-heap based greedy merging algorithm (MH-GMerge). TLP has an excellent characteristic of load balancing and can be easily parallelized. We implemented two efficient parallel partitioned DAs based on TLP. Extensive experiments were carried out, and the results showed that the proposed methods can significantly improve the construction efficiency of DA and can achieve a better trade-off between construction and retrieval performance than the existing state-of-the-art methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations