The Effects of Confounding When Making Automatic Intervention Decisions Using Machine Learning

2019 
With ubiquitous data and computing power, we see machine learning models making countless decisions automatically. Often these models are deployed with the goal of having a causal effect, that is, of improving an outcome by means of an intervention. Common examples are influencing someone's purchasing behavior with an advertisement or affecting customer retention with a special offer. If these models are built using observational data, as is usually the case, they will likely suffer from confounding bias. Investing in experimental data offers a way to build unconfounded models, but such data is costly and therefore might be in short supply. So, would it be better to use small experimental data or big (confounded) observational data? This paper presents a theoretical comparison between the use of observational and experimental data for building models to make automated intervention decisions. The results reveal different regimes where each approach is preferable. Perhaps surprisingly, confounding may help to make better decisions --- such as when larger causal effects are overestimated more. Even when this is not the case, the benefits of a larger data set may outweigh the detrimental effect of confounding on intervention decisions, an unexpected insight given that confounding bias cannot be corrected with more data when estimating causal effects. The upshot is that large, confounded observational data may be preferable to small experimental data when training models for intervention decisions. This result is important for practitioners using machine learning models to make interventions because experiments entail a variety of costs (design, implementation, opportunity, political).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    1
    Citations
    NaN
    KQI
    []