Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications

2016 
KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    2
    Citations
    NaN
    KQI
    []