When 4 ≈ 10,000: The power of social science knowledge in predictive performance
2018
Computer science has devised leading methods for predicting variables; can social science compete? The author sets
out a social scientific approach to the Fragile Families Challenge. Key insights included new variables constructed
according to theory (e.g., a measure of shame relating to hardship), lagged values of the target variables, using predicted
values of certain outcomes to inform others, and validated scales rather than individual variables. The models were
competitive: a four-variable logistic regression model was placed second for predicting layoffs, narrowly beaten by a
model using all the available variables (>10,000) and an ensemble of algorithms. Similarly, a relatively small random
forest model (25 variables) was ranked seventh in predicting material hardship. However, a similar approach overfitted
the prediction of grit. Machine learning approaches proved superior to linear regression for modeling the continuous
outcomes. Overall, social scientists can contribute to predictive performance while benefiting from learning more
about data science methods.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
9
References
1
Citations
NaN
KQI