We Need to Talk About Random Splits.

Anders Søgaard,Sebastian Ebert,Joost Bastings,Katja Filippova

We Need to Talk About Random Splits.

2020

Anders Søgaard
Sebastian Ebert
Joost Bastings
Katja Filippova

Gorman and Bedrick (2019) recently argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. In some cases, even worst-case splits under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This proves wrong the common conjecture that bias can be corrected for by re-weighting data (Shimodaira, 2000; Shah et al., 2020). Instead of using multiple random splits, we propose that future benchmarks instead include multiple, independent test sets.

Keywords:

Computer science
Conjecture
Machine learning
Artificial intelligence
Theoretical computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations