With advancement in machine learning techniques, several speech related applications deploy end-to-end models to learn relevant features from the raw speech signal. In this work, we focus on the speech rate estimation task using an end-to-end model to learn representation from raw speech in a data driven manner. We propose an end-to-end model that comprises of 1-d convolutional layer to extract representations from raw speech and a convolutional dense neural network (CDNN) to predict speech rate from these representations. The primary aim of the work is to understand the nature of representations learned by end-to-end model for the speech rate estimation task. Experiments are performed using TIMIT corpus, in seen and unseen subject conditions. Experimental results reveal that, the frequency response of the learned 1-d CNN filters are low-pass in nature, and center frequencies of majority of the filters lie below 1000Hz. While comparing the performance of the proposed end-to-end system with the baseline MFCC based approach, we find that the performance of the learned features with CNN are on par with MFCC.
Outward appearance assumes an imperative part in face acknowledgment frameworks and picture handling methods of human machine interface, A human outward appearance is at least one movement or place of the muscles underneath the skin of the face. Facial demeanours are a type of nonverbal communication, they are essential methods for passing on friendly data between humans, but they additionally happen in most different vertebrates and some other creature species. Humans can receive an outward appearance deliberately or automatically and the neural components are liable for controlling the demeanour contrast in each case. Facial feelings assume the most significant part to perceive the goals of others. In this paper detailed study of steps involved in facial emotion recognition and extraction from different medium is discussed. Finally, a system is proposed here to extract facial information from both grey scale & RGB scale images using Robert's & Sobel's Techniques. It is trained by a dataset using a Convolutional Neural Network (CNN) model to predict for the test images which can be verified from the result.
The Biometric, in a husky way, is using parts of your body as means of secured transaction in all means. Having stated this, there are numerous algorithms that have risen up in facilitating this job. Facial recognition is one such modality which has wide spread existence in the field of biometric. Face recognition involves solving wide variety of problems like pose, illumination, expression etc., Facial expression provides an important demeanour for studies major social and mental behaviour of humans. Facial recognition has recently become a challenging research area. Its applications include human emotion analysis and human computer interfaces. In this proposed methodology we propose a comparative experiment for facial expression recognition of human beings using different dimensionality reduction techniques and classifier methods. Fisher Linear Discriminant Analysis (FLDA) and Modular FLDA are used for feature extraction. Feature vector for the test image is compared with those of the train images. In this experiment we compared 17 distance measures and their modifications between feature vectors with respect to the recognition rates. The experimental results revealed that Modular FLDA produces the best recognition rate.