BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Alex Warstadt,Alicia Parrish,Haokun Liu,Anhad Mohananey,Wei Peng,Sheng-Fu Wang,Samuel R. Bowman

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

2020

Alex Warstadt
Alicia Parrish
Haokun Liu
Anhad Mohananey
Wei Peng
Sheng-Fu Wang
Samuel R. Bowman

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

Keywords:

Natural language processing
Blimp
Artificial intelligence
n-gram
Language model
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations