A Cross-Modal Guiding and Fusion Method for Multi-Modal RSVP-based Image Retrieval

2021 
Rapid Serial Visual Presentation (RSVP) is an important paradigm in Brain-Computer Interface (BCI). It can be used in speller, image retrieval, anomaly detection, etc. RSVP paradigm uses a small number of target pictures in a high speed presented picture sequence to induce specific event-related potential (ERP) components. However, the application of RSVP based BCI is challenged by the accuracy of ERP detection. Thus, the goal of this study is to introduce other related modalities to the traditional EEG-based BCI to make robust predictions and improve the detection performance. First, we introduce the eye movement modality into the RSVP-based BCI and collect a multimodality RSVP-based dataset simultaneously during the image retrieval task. Second, we design a simple but efficient CNN-based network with two modality fusion modules to fully utilize the multi-modality data in two stages. In the feature extraction stage, we propose a Cross-modality-Guided Feature Calibration (cm-GFC) module to enable the EEG modality feature to modify the eye movement modality feature, and the aim is to make eye movement modality features and EEG modality features are more complementary. In the feature fusion stage, we propose a Dynamic Gated Fusion (DGF) module, which applies modality-specific gates to retain the complementary information of the two modalities and reduce redundant information from the two modalities. To evaluate our method, we conduct extensive experiments on the dataset with EEG and eye movement data are from 20 subjects. The proposed method achieves a high balanced accuracy of 87.83 ± 2.31% of classification, which outperforms a series of single modality and multi-modality approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []