An Efficient Data Selection Policy for Search Engine Cache Management

2015 
Caching is an effective optimization in search engine. The data selection policy plays a key role in caching, which places the data to be cached in memory. However, the current data selection policies are not suitable to the hybrid storage architecture with solid state disks (SSDs), which have gradually replaced hard disk drives (HDDs) in search engines. In this paper, we present an Efficient Data Selection policy (EDS) for search engine cache management, which views cache media as a knapsack, and views results and posting lists as items. The best benefit can be computed by greedy algorithms. In order to verify the effectiveness, we carry out a series of experiments to study essential factors of data selection in different architectures, including HDD, SSD, and SSD-based hybrid storage architecture, which uses SSD as a secondary cache for memory. The experimental results demonstrate that the proposed policy improves the hit ratio by 20.04% and the retrieval performance on HDD, SSD, and hybrid architecture by 31.98%, 28.72% and 23.24%, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []