Insights for Curriculum Development: Identifying Emerging Data Science Topics through Analysis of Q&A Communities

2020 
Updating curricula in new computer science domains is a critical challenge faced by many instructors and programs. In this paper we present an approach for identifying emerging topics and issues in Data Science by using Question and Answer (Q&A) sites as a resource. Q&A sites provide a useful online platform for discussion of topics and through the sharing of information they become a valuable corpus of knowledge. We applied latent Dirichlet allocation (LDA), a statistical topic modeling technique, to analyze data science related threads from from two popular Q&A communities "Stack Exchange and Reddit". We uncovered both important topics as well as useful examples that can be incorporated into teaching. In addition to technical topics, our analysis also identified topics related to professional development. We believe that approaches such as these are critical in order to update curriculum and bridge the workplace-school divide in teaching of newer topics such as data science. Given the pace of technical development and frequent changes in the field, this is an inventive and effective method to keep teaching up to date. We also discuss the limitations of this approach whereby topics of importance such as data ethics are largely missing from online discussions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    11
    Citations
    NaN
    KQI
    []