Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings

2019 
Abstract The direct synthesis of continuously spoken speech from neural activity could provide a fast and natural way of communication for users suffering from speech disorders. Mapping the complex dynamics of neural activity to spectral representations of speech is a demanding task for regression models. Convolutional neural networks have recently shown promise for finding patterns in neural signals and might be a good candidate for this particular regression task. However, the intrinsic agency of the resulting networks is challenging to interpret and thus provides little opportunity to gain insights on neural processes underlying speech. While activation maximization can be used to get a glimpse into what a network has learned for a classification task, it usually does not benefit regression problems. Here, we show that convolutional neural networks can be used to reconstruct an audible waveform from invasively-measured brain activity. By adapting activation maximization, we present a method that can provide insights from neural networks targeting regression problems. Based on experimental data, we achieve statistically significant correlations between spectrograms of synthesized and original speech. Our interpretation approach shows that trained models reveal that electrodes placed in cortical regions associated with speech production tasks have a large impact on the reconstruction of speech segments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    81
    References
    11
    Citations
    NaN
    KQI
    []