A Content-Based Indexing Scheme for Large-Scale Unstructured Data

Nan Zhu,Yangdi Lu,Wenbo He,Yu Hua

A Content-Based Indexing Scheme for Large-Scale Unstructured Data

2017

The sheer volume of multimedia contents generated by today's Internet services are stored in the cloud. The traditional indexing method associating the user-generated metadata with the content is vulnerable to the inaccuracy caused by the low quality of the metadata. While the content-based indexing does not depend on the error-prone metadata. However, the state-of-the-art research focuses on developing descriptive features and miss the system-oriented considerations when incorporating these features into the practical cloud computing systems. We propose an Update-Efficient and Parallel-Friendly content-based multimedia indexing system, called Partitioned Hash Forest (PHF). The PHF system incorporates the state-of-the-art content-based indexing models and multiple system-oriented optimizations. PHF contains an approximate content-based index and leverages the hierarchical memory system to support the high volume of updates. Additionally, the content-aware data partitioning and lock-free concurrency management module enable the parallel processing of the concurrent user requests. We evaluate PHF in terms of indexing accuracy and system efficiency by comparing it with the state-of-the-art content-based indexing algorithm and its variances. We achieve the significantly better accuracy with less resource consumption, around 37% faster in update processing and up to 2.5X throughput speedup in a multi-core platform comparing to other parallel-friendly designs.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations