Data quality in ETL process: A preliminary study

2019 
Abstract The accuracy and relevance of Business Intelligence & Analytics (BI&A) rely on the ability to bring high data quality to the data warehouse from both internal and external sources using the ETL process. The latter is complex and time-consuming as it manages data with heterogeneous content and diverse quality problems. Ensuring data quality requires tracking quality defects along the ETL process. In this paper, we present the main ETL quality characteristics. We provide an overview of the existing ETL process data quality approaches. We also present a comparative study of some commercial ETL tools to show how much these tools consider data quality dimensions. To illustrate our study, we carry out experiments using an ETL dedicated solution (Talend Data Integration) and a data quality dedicated solution (Talend Data Quality). Based on our study, we identify and discuss quality challenges to be addressed in our future research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    10
    Citations
    NaN
    KQI
    []