LAMBERT: Layout-Aware language Modeling using BERT for information extraction

Łukasz Garncarek,Rafal Powalski,Tomasz Stanisławek,Bartosz Topolski,Piotr Halama,Filip Graliński

LAMBERT: Layout-Aware language Modeling using BERT for information extraction

2020

Łukasz Garncarek
Rafal Powalski
Tomasz Stanisławek
Bartosz Topolski
Piotr Halama
Filip Graliński

In this paper we introduce a novel approach to the problem of understanding documents where the local semantics is influenced by non-trivial layout. Namely, we modify the Transformer architecture in a way that allows it to use the graphical features defined by the layout, without the need to re-learn the language semantics from scratch, thanks to starting the training process from a model pretrained on classical language modeling tasks.

Keywords:

language semantics
Computer science
Information extraction
Semantics
Artificial intelligence
Scratch
Classical language
Architecture
Language model
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations