Surveying the Forests and Sampling the Trees: An overview of Classification and Regression Trees and Random Forests with applications in Survey Research

2018 
While survey and social science researchers have become well versed in traditional modeling approaches such as multiple regression or logistic regression, there are more contemporary nonparametric techniques that are more flexible in terms of model form and distributional assumptions. Classification and regression trees (CARTs) and random forests represent two of the methods that are being applied more commonly within the survey research context for creating nonresponse adjustments and for creating propensity scores to be used within the responsive/adaptive survey context. Both of these methods can be used for regression or classification related tasks and offer researchers and practitioners excellent alternatives to the more classical approaches. CARTs and random forests can be applied when typical statistical distributional assumptions are not likely satisfied and can incorporate interactions automatically. CART models can be estimated in the presence of missing data and random forest methods can adapt to the complexity of the dataset and can be estimated when the number of predictors is large relative to the sample size. This article provides an accessible description for both of these methods and illustrates their use by developing models that predict survey response from a collection of demographic variables known for both respondents and nonrespondents.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    6
    Citations
    NaN
    KQI
    []