How BERT’s Dropout Fine-Tuning Affects Text Classification?
2021
Language models pretraining facilitated fitting models on new and small datasets by keeping the previous pretraining knowledge. The task-agnostic models are to be fine-tuned on all NLP tasks. In this paper, we study the fine-tuning effect of BERT on small amount of data for news classification and sentiment analysis. Our experiments highlight the impact of tweaking the dropout hyper-parameters on the classification performance. We conclude that combining the hidden layers and the attention dropouts probabilities reduce overfitting.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
19
References
0
Citations
NaN
KQI