Hashing-Based Hybrid Duplicate Detection for Bayesian Network Structure Learning

2015 
In this work, we address the well-known score-based Bayesian network structure learning problem. Breadth-first branch and bound BFBnB has been shown to be an effective approach for solving this problem. Delayed duplicate detection DDD is an important component of the BFBnB algorithm. Previously, an external sorting-based technique, with complexity $${\text {O}}\left m \log m\right $$, where m is the number of nodes stored in memory, was used for DDD. In this work, we propose a hashing-based technique, with complexity $${\text {O}}\left m\right $$, for DDD. In practice, by removing the $${\text {O}}\left \log m\right $$ overhead of sorting, over an order of magnitude more memory is available for the search. Empirically, we show the extra memory improves locality and decreases the amount of expensive external memory operations. We also give a bin packing algorithm for minimizing the number of external memory files.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    1
    Citations
    NaN
    KQI
    []