SumatraTT: a generic data pre-processing system

P. Aubrecht,P. Miksovsky,L. Kral

SumatraTT: a generic data pre-processing system

2003

A systematic process of indexing cultural heritage artefacts began well before the era of computers. The first step of digitising such archives of hand- and typewriter-written data was naturally focused on transfer of these files into a digital form - either by means of re-typing the original data manually or by applying OCR methods on scanned documents. As a result, there exist huge digital archives of data and metadata in Europe, which describes millions of artefacts kept by thousands of galleries, museums, and/or private collections. To explore such archives (inc. data mining methods), the data need to be converted into a unified format and data model. Moreover, the original indexing methodologies may also vary significantly. Thus, even conversion to a unified metadata (ontology) model is needed. Any data transformation is a tedious task, which usually requires designing, implementing, and testing number of scripts, which will be executed in order to transform the data sets. To simplify such data transformation processes, a generic data transformation system called SumatraTT has been developed at the Gerstner laboratory of the Czech Technical University in Prague. The system has been verified on a number of applications, mostly as a data pre-processing system in the process of data mining. Currently, the goals of the CIPHER project opened new research directions aimed at investigating the ontology transformation and unification problems using SumatraTT.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations