Out-of-bag discriminative graph mining

2013 
In class-labeled graph databases, each graph is associated with one from a finite set of classes, which induces associations between classes and subgraphs occurring in the database graphs. The subgraphs with strong class associations are called discriminative subgraphs. In this work, discriminative subgraphs are repeatedly mined on bootstrap samples of a graph database in order to improve on estimation of subgraph associations. The number of times a subgraph occurs in a graph associated with each class (support values) is recorded over the out-of-bag instances of the bootstrap process. We investigate sample mean and maximum likelihood estimation for the approximation of the true underlying support from these empirical values. It is shown that both significantly improve on the process, compared to single runs of discriminative graph mining, by applying the methods to publicly available toxicological databases, and validating support values, class bias, and class significance. In toxicology, the detection of subgraphs (fragments of chemical structure) that induce toxicity is a major goal. Apart from the subgraph associations being statistically validated, the number of subgraphs created by the proposed methods are much lower than for ordinary discriminative graph mining, which is often a bottleneck in the application of computational models to such databases, and hinders interpretation of the results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []