Compressive Transformers for Long-Range Sequence Modelling

Jack W. Rae,Anna Potapenko,Siddhant M. Jayakumar,Chloe Hillier,Timothy Lillicrap

Compressive Transformers for Long-Range Sequence Modelling

2020

Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Chloe Hillier
Timothy Lillicrap

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Keywords:

sequence model
Sequence learning
object matching
Computer science
Machine learning
Artificial intelligence
language modelling
Transformer
Language model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

114

Citations