Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Yifan Chen,Qi Zeng,Dilek Hakkani Tür,Di Jin,Heng Ji,Yun Yang

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

2022

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations