Data-generating models under which the random forest algorithm performs badly.

2020 
Examples are given of data-generating models under which some versions of the random forest algorithm may fail to be consistent or at least may be extremely slow to converge to the optimal predictor. Evidence provided for these properties is based on partly intuitive and partly rigorous arguments and on numerical experiments. Although one can always choose a model under which random forests perform very badly, in each case simple methods based on statistics of `variable use' and `variable importance' can be used to construct a better predictor based on a sort of mixture of random forests.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []