The Psychometrics of Automatic Speech Recognition

2021 
Automatic speech recognition (ASR) software has been suggested as a candidate model of the human auditory system thanks to recent dramatic improvements in performance. To test this hypothesis, we compared several state-of-the-art ASR systems to results from humans on a barrage of standard psychometric experiments. While some systems showed qualitative agreement with humans in certain tests, in others all tested systems diverged markedly from humans. In particular, all systems used spectral invariance, temporal fine structure and speech periodicity differently from humans. We conclude that none of the tested ASR systems can yet act as a strong proxy for human speech recognition. However, we note that the more recent systems with better performance also tend to better match human results, suggesting that continued cross-fertilisation of ideas between human and automatic speech recognition may be fruitful. Our open source toolbox allows researchers to assess future ASR systems or add additional psychoacoustic measures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    1
    Citations
    NaN
    KQI
    []