Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge

2018 
In this paper, we discuss and analyze our approach to the Fragile Families Challenge. The challenge involved predicting six outcomes for 4,242 children from disadvantaged families from around the United States. The data consisted of over 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9. Our approach relied primarily on existing data science techniques, including: (1) data preprocessing: elimination of low variance features, imputation of missing data, and construction of composite features; (2) feature selection through univariate Mutual Information and extraction of non-zero LASSO coefficients; (3) three machine learning models: Random Forest, Elastic Net, and Gradient-Boosted Trees; and finally (4) prediction aggregation according to performance. The top-performing submissions produced winning out-of-sample predictions for three outcomes: GPA, grit, and layoff. However, predictions were at most 20% better than a baseline that predicted the mean value of the training data of each outcome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []