Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle.

2017 
A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of the network, which makes structure learning computationally efficient; does not require the elicitation of prior knowledge from experts; and satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it contradicts the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are very sensitive to the value of its only hyperparameter. We will also show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []