Learning about individuals' health from aggregate data

2017 
There is growing awareness that user-generated social media content contains valuable health-related information and is more convenient to collect than typical health data. For example, Twitter has been employed to predict aggregate-level outcomes, such as regional rates of diabetes and child poverty, and to identify individual cases of depression and food poisoning. Models which make aggregate-level inferences can be induced from aggregate data, and consequently are straightforward to build. In contrast, learning models that produce individual-level (IL) predictions, which are more informative, usually requires a large number of difficult-to-acquire labeled IL examples. This paper presents a new machine learning method which achieves the best of both worlds, enabling IL models to be learned from aggregate labels. The algorithm makes predictions by combining unsupervised feature extraction, aggregate-based modeling, and optimal integration of aggregate-level and IL information. Two case studies illustrate how to learn health-relevant IL prediction models using only aggregate labels, and show that these models perform as well as state-of-the-art models trained on hundreds or thousands of labeled individuals.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    3
    Citations
    NaN
    KQI
    []