Revising FUNSD dataset for key-value detection in document images.

Hieu M. Vu,Diep Thi Ngoc Nguyen

Revising FUNSD dataset for key-value detection in document images.

2020

Hieu M. Vu
Diep Thi Ngoc Nguyen

FUNSD is one of the limited publicly available datasets for information extraction from document im-ages. The information in the FUNSD dataset is defined by text areas of four categories ("key", "value", "header", "other", and "background") and connectivity between areas as key-value relations. In-specting FUNSD, we found several inconsistency in labeling, which impeded its applicability to thekey-value extraction problem. In this report, we described some labeling issues in FUNSD and therevision we made to the dataset. We also reported our implementation of for key-value detection onFUNSD using a UNet model as baseline results and an improved UNet model with Channel-InvariantDeformable Convolution.

Keywords:

Information retrieval
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations