RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model.

Milan Straka,Jakub Náplava,Jana Straková,David Samuel

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model.

2021

Milan Straka
Jakub Náplava
Jana Straková
David Samuel

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at this https URL and this https URL.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations