Compact and interpretable architecture for speech decoding from stereotactic EEG

2021 
Background: Brain-computer interfaces (BCIs) decode neural activity and extract from it information that can be meaningfully interpreted. One of the most intriguing opportunities is to employ BCIs for decoding speech, a uniquely human trait, which opens up plentiful applications from rehabilitation of patients to a direct and seamless communication between human species. To decipher neuronal code complex deep neural networks furnish only limited success. In such solutions an iffy performance gain is achieved with uniterpretable decision rules characterised by thousands of parameters to be identified from a limited amount of training data. Our recent experience shows that when applied to neural activity data compact neural networks with trainable and physiologically meaningful feature extraction layers [1] deliver comparable performance, ensure robustness of the learned decision rules and offer the exciting opportunity of automatic knowledge discovery. Methods: We collected approximately one hour of data (from two sessions) where we recorded stereotactic EEG (sEEG) activity during overt speech (6 different randomly shuffled phrases and rest). We have also recorded synchronized audio speech signal. The sEEG recording was carried out in an epilepsy patient implanted for medical reasons with an sEEG electrode passing through Broca area with 6 contacts spaced at 5 mm. We then used a compact convolutional network-based architecture to recover speech mel-cepstrum coefficients followed by a 2D convolutional network to classify individual words. We then interpreted the former network weights using the theoretically justified approach devised by us earlier [1]. Results: We achieved on average 44% accuracy in classifying 26+ 1 words (3.7% chance level) using only 6 channels of data recorded with a single minimally invasive sEEG electrode. We compared the performance of our compact convolutional network to that of the DenseNet-like architecture that has recently been featured in neural speech decoding literature and did not find statistically significant performance differences. Moreover, our architecture appeared to be able to learn faster and resulted in a stable, interpretable and physiologically meaningful decision rule successfully operating over a contiguous data segment no-overlapping with the training data interval. Spatial characteristics of neuronal population pivotal to the task corroborate the results of active speech mapping procedure and frequency domain patterns show primary involvement of the high frequency activity. Conclusions: Most of the speech decoding solutions availabel to date either use potentially harmful intracortical electrodes or rely on the data recorded with impractically massive multielectrode grids covering large cortical area. Here we for the first time achieved practically usable decoding accuracy for the vocabulary of 26 words + 1 silence class backed by only 6 channels of cortical activity sampled with a single sEEG shaft. The decoding was implemented using a compact and interpretable architecture which ensures robustness of the solution and requires small amount of training data. The proposed approach is the first step towards minimally invasive implantable BCI solution for restoring speech function.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []