A distributed incremental information acquisition model for large-scale text data

2017 
Timely discovering and acquiring information from incremental data on the Internet is a hot topic in a big data era. This paper presents a distributed incremental information acquisition model for large-scale text data. To obtain a lower false positive rate and higher efficiency of the traditional Bloom filter, a distributed multidimensional Bloom filter is designed and proposed to cope with the deduplication of large-scale Web URL text data. Three methods related to Bloom filter were compared based on the false positive rate and response efficiency. The results show that the distributed incremental information acquisition model for large-scale text data can achieve a high duplicate removal rate with a lower false positive rate.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    5
    Citations
    NaN
    KQI
    []