Handling Missing Data in Decision Trees: A Probabilistic Approach.

Pasha Khosravi,Antonio Vergari,YooJung Choi,Yitao Liang,Guy Van den Broeck

Handling Missing Data in Decision Trees: A Probabilistic Approach.

2020

Pasha Khosravi
Antonio Vergari
YooJung Choi
Yitao Liang
Guy Van den Broeck

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction" of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss" w.r.t.\ our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.

Keywords:

Mathematics
deployment time
Decision tree
Estimator
Interpretability
Probabilistic logic
Baseline (configuration management)
Artificial intelligence
Machine learning
Missing data

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations