Reduction of Speech Data Posteriorgrams by Compressing Maximum-likelihood State Sequences in Query by Example

2020 
Spoken-term detection (STD) has recently attracted increased interest in speech-based retrieval research. STD is the task of finding sections in speech data matching a query consisting of one or more words. Query by example (QbE) using spoken queries is another important research topic in STD. Although the use of posteriorgrams (sequences of output probabilities generated by deep neural networks from speech data) is a promising approach for QbE, that method results in long retrieval times and excessive memory usage. We previously proposed a method that replaces posteriorgrams for spoken queries with sequences of state numbers for triphone hidden Markov models, omitting calculations of local distance. While that method greatly reduced retrieval times, it still required large amounts of memory to store speech data posteriorgrams. We therefore newly propose a method for reducing memory usage and retrieval times by compressing speech data posteriorgrams into sets of posterior probability vectors for each utterance, each speech document, or all speech data, rather than storing all posterior probability vectors for each frame of speech data. Evaluation experiments conducted using open test collections for the “SpokenDoc” tasks of the NTCIR-10 and NTCIR-12 workshops demonstrate memory usage reduction by the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []