Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

2021 
Gradient Boosting Decision Trees (GBDTs), which train a set of decision trees in sequence with a gradient boosting strategy to fit the features of training data, has become very popular in recent years due to its strong capability in dealing with machine learning tasks. In many well-known machine learning competitions, GBDT even outperforms very complicate deep neural networks. Nevertheless, training such tree-based models requires accessing the whole dataset to find the split points on the features, which makes distributed training of GBDT models difficult. Particularly, in Federated Learning (FL), where training data is decentralized distributed and cannot be shared considering the privacy and security, training GBDT becomes challenging. To address this issue, in this paper, we propose a new tree-boosting method, named Gradient Boosting Forest (GBF), where the single decision tree in each gradient boosting round of GBDT is replaced by a set of trees trained from different subsets of the training data (referred to as a forest), which enables training GBDT in Federated Learning scenarios. We empirically prove that GBF outperforms the existing GBDT methods in both centralized (GBF-Cen) and federated (GBF-Fed) cases. In a series of experiments, GBF-Cen achieves 1.1% higher accuracy on HIGGS-1M dataset over XGBoost and GBF-Fed obtains 12.2%–48.0% lower RMSE loss over the state-of-the-art federated GBDT methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []