Sample Size for Latent Dirichlet Allocation of Constructed-Response Items

2021 
Over the past decade, topic models have been used to analyze students’ responses to constructed-response items. Analyzing students’ responses using topic models has been shown to yield similar results to a qualitative analysis. As the use of topic models increases in the educational setting, it is important to assess the performance of the underlying statistical mechanism. Simulation studies are an essential tool when evaluating the performance of a statistical model. Using a simulation study to assess performance of topic models, such as the latent Dirichlet allocation (LDA) model, requires generating simulated text responses rather than scored responses. LDA and other related topic models, such as the supervised latent Dirichlet allocation model, assumes a generative process for construction of responses. Topic models also assume that the text data follows a bag-of-words distribution. These key assumptions allow generating simulated text responses to be possible. In this paper we demonstrate the simulation process for topic models followed by a simulation study that assesses the sample size needed to recover the parameters of the LDA model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []