VML-HP: Hebrew Paleography Dataset

Ahmad Droby,Berat Kurar Barakat,Daria Vasyutinsky-Shapira,Irina Rabaev,Jihad El-Sana

VML-HP: Hebrew Paleography Dataset

2021

Ahmad Droby
Berat Kurar Barakat
Daria Vasyutinsky-Shapira
Irina Rabaev
Jihad El-Sana

This paper presents a public dataset, VML-HP, for Hebrew paleography analysis. The VML-HP dataset consists of 537 document page images with labels of 15 script sub-types. Ground truth is manually created by a Hebrew paleographer at a page level. In addition, we propose a patch generation tool for extracting patches that contain an approximately equal number of text lines no matter the variety of font sizes. The VML-HP dataset contains a train set and two test sets. The first is a typical test set, and the second is a blind test set for evaluating algorithms in a more challenging setting. We have evaluated several deep learning classifiers on both of the test sets. The results show that convolutional networks can classify Hebrew script sub-types on a typical test set with accuracy much higher than the accuracy on the blind test.

Keywords:

Palaeography
Convolutional neural network
Deep learning
test
Ground truth
Computer science
Natural language processing
Set (abstract data type)
Hebrew
Test set
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations