FNetAR: Mixing Tokens with Autoregressive Fourier Transforms.

Tim Lou,Michael Park,Mohammad Ramezanali,Vincent Tang

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms.

2021

Tim Lou
Michael Park
Mohammad Ramezanali
Vincent Tang

In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms. Using the Wikitext-103 benchmark, we demonstratethat FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modelingcompared to a Transformer-XL baseline (24.2 ppl) with only half the number self-attention layers,thus providing further evidence for the superfluity of deep neural networks with heavily compoundedattention mechanisms. The autoregressive Fourier transform could likely be used for parameterreduction on most Transformer-based time-series prediction models.

Keywords:

Benchmark (computing)
Generalization
transformer
FNET
Algorithm
Computer science
deep neural networks
Fourier transform
Autoregressive model
Mixing (mathematics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations