Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification

2017 
J-vector has been proved to be very effective in text dependent speaker verification with short-duration speech. However, the current back-end classifiers cannot make full use of such deep features. In this paper, we propose a method to model the multi-faceted information in the j-vector explicitly and jointly. Examples of the multi-faceted information include speaker identity and text content. In our approach, the j-vector was modeled as a result derived by a generative multi-view (joint 1 ) Probability Linear Discriminant Analysis (PLDA) model, which contains multiple kinds of latent variables. The usual PLDA model only considers one single label. However, in practical use, when using multi-task learned network as feature extractor, the extracted feature are always associated with several labels. This type of feature is called multi-view deep feature (e.g. j-vector). With multi-view (joint) PLDA, we are able to explicitly build a model that can combine multiple heterogeneous information from the j-vectors. In verification step, we calculated the likelihood to describe whether the two j-vectors having consistent labels or not. This likelihood is used in the following decision-making. Experiments have been conducted on large scale data corpus of different languages. On the public RSR2015 data corpus, the results showed that our approach can achieve 0.02% EER and 0.09% EER for impostor wrong and impostor correct cases respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []