Document Classification for COVID Literature
2020
The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset. We find that pre-trained language models outperform other models in both low and high data regimes, achieving a maximum F1 score of around 86. We note that even the highest performing models still struggle with label correlation, distraction from introductory text and CORD-19 generalization. Both data and code are available on GitHub.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI