A High Accuracy Multiple-Command Speech Recognition ASIC Based on Configurable One-Dimension Convolutional Neural Network

2021 
Speech command interaction has drawn much attention in smart application market. Many of previous chips achieve an ultra-low power consumption at the cost of a certain accuracy loss, and they are designed only for the fixed speech command recognition tasks, which is inflexible and restrains further development. Here, we demonstrate a configurable speech command recognition ASIC with an ultra-high accuracy fabricated by the TSMC commercial 180-nm CMOS technology. In this chip, Mel-Frequency Cepstrum Coefficients (MFCCs) are used as speech features and a One-Dimension Convolutional Neural Network (1-D CNN) is adopted for the speech feature recognition, which simplifies the design of network and the storage method of memory. Moreover, the configurable 1-D CNN layer of the network ensures the diversity and flexibility of the commands. The measurement results indicate that the chip achieves a 95.6% accuracy on Google Speech Command Database (GSCD) when working at 16 MHz and keeping a reasonable power consumption as 26.4 mW. Moreover, the chip supports max 30 speech commands at a time, which is better than the state-of-the-art chips.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []