LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines

2018 
Logical layout analysis, which determines the function of a document region, for example, whether it is a title, paragraph, or caption, is an indispensable part in a document understanding system. Rule-based algorithms have long been used for such systems. The datasets available have been small, and so the generalization of the performance of these systems is difficult to assess. In this paper, we present LABA, a supervised machine learning system based on multiple support vector machines for conducting a logical Layout Analysis of scanned pages of Books in Arabic. Our system labels the function (class) of a document(scanned book pages) region, based on its position on the page and other features. We evaluated LABA with the benchmark "BCE-Arabic-v1" dataset, which contains scanned pages of illustrated Arabic books. We obtained high recall and precision values, and found that the F-measure of LABA is higher for all classes except the "noise" class compared to a neural network method that was based on prior work.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    1
    Citations
    NaN
    KQI
    []