Speaker Identification of Whispering Sound: Effectiveness of Timbre Audio Descriptors

2019 
Identification of a person from the whispered voice is challenging task as many variations are observed in the speech attributes of the same the speaker in whispered and neutral mode. The success of the speaker identification system relies on the selection of good audio features and this paper mainly focus on the feature selection for the task. There are hundreds of audio features available for sound description but their performance depends upon the type of the database. The motivation of this paper is to investigate the suitability of timbre features for whispered database. The choice of timbre features is due to their perceptual and multidimensional approach. However all the features may not be contributing to the maximum speaker identification accuracy. Hence a careful selection of limited audio descriptors from the available large set is essential to increase the speaker identification with low process time. The Hybrid Selection method is used to select the well-performing audio descriptors from all available descriptors in MPEG-7. Five timbre features namely roll-off roughness brightness irregularity and MFCC are found outperforming for the database used. Here a comparison of results is being done among traditional MFCC feature and timbre features where later reported an absolute accuracy of 10.4. The database consist of about 480 utterances including neutral and whispered speech mode. K-NN classifier with three nearest neighbour and Euclidean distance is used.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []