Analysis of sequence to sequence neural networks on grapheme to phoneme conversion task.

2016 
In this paper, we analyze the performance of various sequence to sequence neural networks on the task of grapheme to phoneme (G2P) conversion. G2P is a very important component in applications like text-to-speech, automatic speech recognition etc,. Because the number of graphemes that a word consists of and the corresponding number of phonemes are different, they are first aligned and then mapped. With the recent advent of sequence to sequence neural networks, the alignment step can be skipped allowing us to directly map the input and output sequences. Although the sequence to sequence neural nets have been applied for this task very recently, there are some questions concerning the architecture that need to be addressed. We show in this paper that, complex recurrent neural network units (like long-short term memory cells) may not be required to achieve good performance on this task. Instead simple recurrent neural networks (RNN) will suffice. We also show that the encoder can be a uni-directional RNN as opposed to the usually preferred bi-directional RNN. Further, our experiments reveal that encoder-decoder models with soft-alignment outperforms fixed vector context counterpart. The results demonstrate that with very few parameters we can indeed achieve comparable performance to much more complicated architectures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    2
    Citations
    NaN
    KQI
    []