A Review and Experimental Comparison of Multivariate Decision Trees

2021 
Decision trees are popular as stand-alone classifiers or as base learners in ensemble classifiers. Mostly, this is due to decision trees having the advantage of being easy to explain. To improve the classification performance of decision trees, some authors have used Multivariate Decision Trees (MDTs), which allow combinations of features when splitting a node. While there is growing interest in the area, recent research in MDTs all have in common that they do not provide adequate comparison of related work: they do not consider relevant rival techniques, or they test algorithm performance in an insufficient number of databases. As a result, claims have no statistical sustain and, hence, there is a lack of general understanding of the actual capabilities of existing MDT induction algorithms, crucial to improving the state-of-the-art. In this paper, we report on an exhaustive review of MDTs. In particular, we give an overview of 37 MDT induction algorithms, out of which we have experimentally compared 19 of them in 57 databases. We provide a statistical comparison in all databases and subsets of databases according to the number of classes, number of features, number of instances, and degree of class imbalance. This allows us to identify groups of top-performing algorithms for different types of databases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    74
    References
    1
    Citations
    NaN
    KQI
    []