language-icon Old Web
English
Sign In

Data Readiness Report

2021 
Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current data cleaning and data readiness assessment practices for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and often results in loss of productivity. We introduce the concept of a Data Readiness Report as accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of data. Data characteristics and challenges on various quality dimensions are identified and documented, keeping in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations, including applied transformations. This provides a detailed lineage for data governance and management. In effect, the report captures and documents the actions taken by various personas in a data readiness and assessment workflow. Over time this becomes a repository of best practices and can potentially drive a recommendation system for building automated data readiness workflows on the lines of AutoML [1]. The data readiness report could serve as a valuable asset for organizing and operationalizing data in a Data-as-a-service model as it augments the trust and reliability of the datasets. We anticipate that together with the Datasheets [2], Dataset Nutrition Label [3], FactSheets [4] and Model Cards [5], the Data Readiness Report completes the AI documentation pipeline and increases trust and re-useability of data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []