Prague DaTabase of Spoken Czech 1.0

Jan Hajič,Petr Pajas,Pavel Ircing,Jan Romportl,Nino Peterek,Miroslav Spousta,Marie Mikulová,Martin Gruber,Milan Legát

Prague DaTabase of Spoken Czech 1.0

2017

Jan Hajič
Petr Pajas
Pavel Ircing
Jan Romportl
Nino Peterek
Miroslav Spousta
Marie Mikulová
Martin Gruber
Milan Legát

PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcription and manually reconstructed text. PDTSC 1.0 is a delayed release of data annotated in 2012. It is an update of Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 (published in 2009). In 2017, Prague Dependency Treebank of Spoken Czech (PDTSC) 2.0 was published as an update of PDTSC 1.0.

Keywords:

Dialog box
Czech
Speech recognition
Spoken language
Natural language processing
Treebank
Speech corpus
Computer science
Artificial intelligence
speech reconstruction

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations