Human Voice Emotion Identification Using Prosodic and Spectral Feature Extraction Based on Deep Neural Networks

2019 
It is well known that human voice at the perceptual level consists of multimodality information. Therefore, a modality can be shared via neural emotion network through the independent stimuli processes. The expression and identification of emotions are significant steps for the human communication process. This process is biologically adaptive in a continuous manner, and for this reason, human voice identification becomes useful for classifying and identifying an effective specific characteristic between them. In this paper, we propose to identify the difference between the six essentials of human voice emotion. They will be generated using prosodic and spectral features extraction by utilizing Deep Neural Networks (DNNs). The result of our experiment has obtained accuracy as much as 78.83%. It presupposed that the higher intensity of emotions found in the sound sample would automatically trigger the level of accuracy as same as a higher one. Moreover, gender identification was also carried out along with the approximate accuracy at 90%. Nevertheless, from the learning process with the composition of 80:20, the training-testing data has obtained an exact accurate result by 100%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    10
    Citations
    NaN
    KQI
    []