Challenges in Automating Maze Detection

2014 
SALT is a widely used annotation approach for analyzing natural language transcripts of children. Nine annotated corpora are distributed along with scoring software to provide norming data. We explore automatic identification of mazes ‐ SALT’s version of disfluency annotations ‐ and find that cross-corpus generalization is very poor. This surprising lack of crosscorpus generalization suggests substantial differences between the corpora. This is the first paper to investigate the SALT corpora from the lens of natural language processing, and to compare the utility of different corpora collected in a clinical setting to train an automatic annotation system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    6
    Citations
    NaN
    KQI
    []