A distributed incremental information acquisition model for large-scale text data

Shengtao Sun,Jibing Gong,Albert Y. Zomaya,Aizhi Wu

A distributed incremental information acquisition model for large-scale text data

2017

Shengtao Sun
Jibing Gong
Albert Y. Zomaya
Aizhi Wu

Timely discovering and acquiring information from incremental data on the Internet is a hot topic in a big data era. This paper presents a distributed incremental information acquisition model for large-scale text data. To obtain a lower false positive rate and higher efficiency of the traditional Bloom filter, a distributed multidimensional Bloom filter is designed and proposed to cope with the deduplication of large-scale Web URL text data. Three methods related to Bloom filter were compared based on the false positive rate and response efficiency. The results show that the distributed incremental information acquisition model for large-scale text data can achieve a high duplicate removal rate with a lower false positive rate.

Keywords:

Big data
Data mining
False positive rate
Real-time computing
Computer science
Data deduplication
The Internet
Bloom filter
computer communication networks
information acquisition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations