Efficient Construction of Decision Trees by the Dual Information Distance Method

2014 
The construction of efficient decision and classification trees is a fundamental task in Big Data analytics which is known to be NP-hard. Accordingly, many greedy heuristics were suggested for the construction of decision-trees, but were found to result in local-optimum solutions. In this work we present the dual information distance (DID) method for efficient construction of decision trees that is computationally attractive, yet relatively robust to noise. The DID heuristic selects features by considering both their immediate contribution to the classification, as well as their future potential effects. It represents the construction of classification trees by finding the shortest paths over a graph of partitions that are defined by the selected features. The DID method takes into account both the orthogonality between the selected partitions, as well as the reduction of uncertainty on the class partition given the selected attributes. We show that the DID method often outperforms popular classifiers, in terms of average depth and classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    23
    Citations
    NaN
    KQI
    []