SumatraTT: a generic data pre-processing system

2003 
A systematic process of indexing cultural heritage artefacts began well before the era of computers. The first step of digitising such archives of hand- and typewriter-written data was naturally focused on transfer of these files into a digital form - either by means of re-typing the original data manually or by applying OCR methods on scanned documents. As a result, there exist huge digital archives of data and metadata in Europe, which describes millions of artefacts kept by thousands of galleries, museums, and/or private collections. To explore such archives (inc. data mining methods), the data need to be converted into a unified format and data model. Moreover, the original indexing methodologies may also vary significantly. Thus, even conversion to a unified metadata (ontology) model is needed. Any data transformation is a tedious task, which usually requires designing, implementing, and testing number of scripts, which will be executed in order to transform the data sets. To simplify such data transformation processes, a generic data transformation system called SumatraTT has been developed at the Gerstner laboratory of the Czech Technical University in Prague. The system has been verified on a number of applications, mostly as a data pre-processing system in the process of data mining. Currently, the goals of the CIPHER project opened new research directions aimed at investigating the ontology transformation and unification problems using SumatraTT.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []