A workflow for creating, analysing, and storing multi-layer corpora: Pepper, Atomic, ANNIS and LAUDATIO

Stephan Druskat,Carolin Odebrecht,Thomas Krause,Florian Zipser

A workflow for creating, analysing, and storing multi-layer corpora: Pepper, Atomic, ANNIS and LAUDATIO

2016

• The creation and analysis of corpus linguistic resources pose technical challenges: Different tools and formats have to be combined in a single workflow • These challenges can best be faced with a generic architecture and metamodel, common to the complete tool chain •We present an open source set of tools which support the conversion (Pepper, Zipser et al. (2011)), annotation (Atomic, Druskat et al. (2014)), analysis (ANNIS, Krause & Zeldes (2014)), and long-term accessibility (LAUDATIO, Odebrecht et al. (2015)) of corpora •Our tools are well-aligned due to a common, generic graph-based data model, Salt (Zipser & Romary, 2010), which is theory-neutral and supports annotation types which can be represented as key-value pairs •Our tools can be freely combined to represent a complete, iterative workflow for the creation of corpus linguistic resources (cf. central graphic)

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations