SumatraTT: a generic data pre-processing system
2003
A systematic process of indexing cultural heritage artefacts began well before the era of computers. The first step of digitising such archives of hand- and typewriter-written data was naturally focused on transfer of these files into a digital form - either by means of re-typing the original data manually or by applying OCR methods on scanned documents. As a result, there exist huge digital archives of data and metadata in Europe, which describes millions of artefacts kept by thousands of galleries, museums, and/or private collections. To explore such archives (inc. data mining methods), the data need to be converted into a unified format and data model. Moreover, the original indexing methodologies may also vary significantly. Thus, even conversion to a unified metadata (ontology) model is needed. Any data transformation is a tedious task, which usually requires designing, implementing, and testing number of scripts, which will be executed in order to transform the data sets. To simplify such data transformation processes, a generic data transformation system called SumatraTT has been developed at the Gerstner laboratory of the Czech Technical University in Prague. The system has been verified on a number of applications, mostly as a data pre-processing system in the process of data mining. Currently, the goals of the CIPHER project opened new research directions aimed at investigating the ontology transformation and unification problems using SumatraTT.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI