EDS: An Efficient Data Selection policy for search engine storage architectures

2017 
Abstract Caching is an effective optimization in search engine storage architectures. Many caching algorithms have been proposed to improve retrieval performance. The data selection policy of search engine cache management plays an important role, which carefully places the data in memory or other storage, such as solid state disks (SSDs). Considering that the historical query log has a guiding role for the future query, we present an Efficient Data Selection (EDS) policy for search engine cache management, which views cache media as a knapsack, and views results and posting lists as items. The best benefit of EDS can be computed by greedy algorithms. We carry out a series of experiments to study the essential factors of the data selection in different architectures, including hard disk drive (HDD), SSD, and SSD-based hybrid storage architectures. The hybrid storage architecture is a two-level cache architecture, which uses SSD as a secondary cache for the memory. Our main goal is to improve the performance of the search engines and reduce the cost of the servers on two-level cache architecture. The experimental results demonstrate that our proposed policy improves the hit ratio by 20.04% as well as the retrieval performance on HDD, SSD, and hybrid architecture by 31.98%, 28.72% and 23.24%, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    2
    Citations
    NaN
    KQI
    []