Transformer on a Diet.

Chenguang Wang,Zihao Ye,Aston Zhang,Zheng Zhang,Alexander J. Smola

Transformer on a Diet.

2020

Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alexander J. Smola

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70% parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.

Keywords:

Artificial intelligence
Computer science
Perplexity
Transformer
Computation
Machine learning
Source code
Computer engineering

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations