Adaptive Semiparametric Language Models

Dani Yogatama,Cyprien de Masson dAutume,Lingpeng Kong

Adaptive Semiparametric Language Models

2021

Dani Yogatama
Cyprien de Masson dAutume
Lingpeng Kong

We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidden states---similar to transformer-XL---and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep. We design a gating function to adaptively combine multiple information sources to make a prediction. This mechanism allows the model to use either local context, short-term memory, or long-term memory (or any combination of them) on an ad hoc basis depending on the context. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method compared to strong baselines.

Keywords:

Artificial neural network
transformer
Parametric statistics
Machine learning
Component (UML)
Episodic memory
context
Computer science
Set (abstract data type)
Artificial intelligence
Language model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations