logo
    An End-to-End Text-Independent Speaker Identification System on Short Utterances
    17
    Citation
    0
    Reference
    10
    Related Paper
    Citation Trend
    Keywords:
    End-to-end principle
    Identification
    Speaker identification
    The voice of each speaker is unique, which means that speaker identification based on voice is possible, but there are no reliable methods for that, yet. The reason is that the voice of a speaker can vary a lot and that different voices can sometimes sound quite similar. The aim of this research is to determine which features are the most important to humans for speaker identification, as well as features which the automated speaker recognition system relies on.
    Speaker diarisation
    Speaker identification
    Identification
    Voice analysis
    The arise of the deep learning techniques has accelerated the advance of the speaker recognition and the increase in personalized devices weighted the importance of the target speaker recognition (TSR). More precisely, it is important to recognize the target speaker correctly even when a variety of speakers utter at the same time. In this paper the TSR methods are proposed and evaluated in the multi-speaker environments: (1) TSE is performed before the speaker recognition on the input voice; (2) results from (1) and the speaker recognition are fused. Among the proposed methods, the latter method showed the better results; more precisely, the fusion based method showed the relative performance improvements of at least 11% from the ordinary speaker recognition system.
    Speaker diarisation
    Presently, speaker adaptive systems are the state-of-the-art in automatic speech recognition. A general baseline model is adapted to the current speaker during recognition in order to improve the quality of the results obtained. However, the adaptation procedure needs to be able to distinguish between data from different speakers. Therefore, in a general speaker adaptive recognizer speaker recognition has to be performed implicitly. The resulting information about the identity of the person speaking can be of great importance in many applications of speech recognition, e.g. in man-machine communication. Therefore, we propose an integrated framework for speech and speaker recognition. Our system is able to detect new speakers and to identify already known ones. For a new speaker both an identification and an adapted recognition model are learned from limited data. The latter is then used for the recognition of utterances attributed to this speaker. We will present evaluation results with respect to speaker identification performance on two non-trivial speech recognition tasks that demonstrate the effectiveness of our integrated approach.
    Speaker diarisation
    Identification
    Citations (6)
    In this case report a number of problems with speaker identification by lay persons are discussed. It is argued that specific circumstances of the identification procedure(s) and factors such as subjects' emotional states, personal attitudes towards the speaker in question, short-term changes in the speaker's vocal behaviour and the semantic content and type of speech material used may heavily bias subjects' judgements. On the other hand, a 'conventional' forensic voice comparison carried out by professionals provided clear-cut results even under severe time constraints. It is argued that speaker identification by lay persons, if indispensable in a case, should be carried out by means of a formalized, classical voice line-up experiment.
    Speaker identification
    Identification
    Voice analysis
    Citations (0)
    This paper proposes a new method for text-dependent speaker recognition. The scheme is based on learning (what we refer to as) speaker-specific compensators for each speaker in the system. The compensator is essentially a speaker to speaker transformation which enables the recognition of the speech of one speaker through a speaker-dependent speech recognition system built for the other. Such a transformation, adequate for our purposes, may be achieved by a simple vector addition in the cepstral domain. This speaker-specific compensator captures the characteristics of the speaker we wish to recognize. For each speaker who is registered into the system, we learn a unique set of compensators. The speaker recognition decision is then based on which compensator achieves best speech recognition scores.
    Speaker diarisation
    Mel-frequency cepstrum
    Voice recognition is the identification of a speaker on the basis of the characteristics of voices. For this, features of speech patterns that differ between individuals are used to achieve the objective. In this paper speaker recognition system are discussed. Implementation of speaker's voice recognition system with MATLAB makes possible use of voice for real life applications. This paper provides a brief review of different DSP based techniques applied for speech recognition.
    Identification
    Citations (3)
    The method of identifying a speaker based on his or her speech is known as automatic speaker recognition. Speaker/voice recognition is a biometric sensory device that recognizes people by their voices. Most speaker recognition systems nowadays are focused on spectral information, which means they use spectral information derived from speech signal segments of 10-30 ms in length. However, if the received speech signal contains some noise, the cepstral-based system's output suffers. The primary goal of the study is to see the various factors responsible for improved performance of the speaker recognition systems by modeling prosodic features, and phases of speaker recognition system. Furthermore, in the presence of background noise, the analysis focused on a text-independent speaker recognition system.
    Speaker diarisation
    Mel-frequency cepstrum
    SIGNAL (programming language)
    Cepstrum
    Citations (0)