TBALL data collection: the making of a young children's speech corpus.

Abe Kazemzadeh,Hong You,Markus Iseli,Barbara Jones,Xiaodong Cui,Patti Price,Elaine S. Andersen,Shrikanth Narayanan,Abeer Alwan

TBALL data collection: the making of a young children's speech corpus.

2005

Abe Kazemzadeh
Hong You
Markus Iseli
Barbara Jones
Xiaodong Cui
Patti Price
Elaine S. Andersen
Shrikanth Narayanan
Abeer Alwan

In this paper we describe the data collection for the TBALL project (Technology Based Assessment of Language and Literacy) and report the results of our efforts. We focus on aspects of our corpus that distinguish it from currently available corpora. The speakers are children (grades K-4), largely nonnative speakers of English, and from diverse socio-economic backgrounds, who are learning to read. We also describe how we adapted our methodology to accommodate these differences: our recording setup, data collection methodology, and transcription scheme. We also discuss the task this corpus was designed to serve and our research approach.

Keywords:

Data collection
Speech recognition
Learning to read
Pattern recognition
Data collection methodology
Natural language processing
Making-of
Computer science
Speech corpus
Artificial intelligence
Literacy

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations