Low-Resource Language Identification From Speech Using Transfer Learning

Kexin Feng,Theodora Chaspari

Low-Resource Language Identification From Speech Using Transfer Learning

2019

Identification of low-resource data is a traditionally difficult machine learning problem, since the sparsity of available resources prevents classifiers from being adequately trained. An effective way to address the inevitable data sparsity in certain applications, such as in low-resource speech language identification, is transfer learning, which uses the knowledge learned from tasks with large labeled data in settings of limited data. Motivated by the fact that various languages share common phonetic and phonotactic characteristics, we explore transfer learning systems that employ various neural network architectures. We leverage readily available large datasets for creating robust instantiations of language identification models using feed-forward neural networks. These are further fine-tuned on the low-resource data from a target domain to improve the system performance. We apply the proposed approach to the automatic identification of African languages, which comprises a challenging task due to the low-resource data from such languages. We conduct our experiments using two publicly available datasets: the VoxForge corpus which contains 7 Indo-European languages as source data, and the Lwazi corpus which includes 11 African languages as target data. Our results indicate the effectiveness of transfer learning for the identification of low-resource languages from speech signals.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations