Large-Scale GPU Search

Tim Kaldewey,Andrea Di Blas

Large-Scale GPU Search

2012

Publisher Summary This chapter introduces P-ary search the scalable parallel search algorithm that uses single instruction multiple data architectures—graphical processing unit (GPUs). The increases in memory bandwidth over the past decade are a result of the multi/many-core era, with GPUs being a poster child, offering memory bandwidth well in excess of 100 GB/s. However, achieving peak bandwidth requires concurrent (parallel) memory access from all cores. The obvious use of increasingly parallel architectures for large-scale search applications is to execute multiple search queries simultaneously, which often results in significant increases in throughput. Query response time, on the other hand, remains memory-latency bound, unless one abandons conventional serial algorithms and goes back to the drawing board. To grasp the implications of porting a memory-bound application like search to the GPU, this chapter first conducts a brief memory performance analysis. Conventional threading models and the memory access pattern produced by concurrent search queries do not map well to the GPU's memory subsystem. With P-ary search it demonstrates how parallel memory access combined with the superior synchronization capabilities of SIMD architectures like GPUs can be leveraged to compensate for memory latency. P-ary search outperforms conventional search algorithms not only in terms of throughput but also response time. Implemented on a GPU it can outperform a similarly priced CPU by up to three times, and is compatible with existing index data structures like inverted lists and B-trees.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations