Data Readiness Report

Shazia Afzal,Rajmohan C,Manish Kesarwani,Sameep Mehta,Hima Patel

Data Readiness Report

2021

Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current data cleaning and data readiness assessment practices for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and often results in loss of productivity. We introduce the concept of a Data Readiness Report as accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of data. Data characteristics and challenges on various quality dimensions are identified and documented, keeping in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations, including applied transformations. This provides a detailed lineage for data governance and management. In effect, the report captures and documents the actions taken by various personas in a data readiness and assessment workflow. Over time this becomes a repository of best practices and can potentially drive a recommendation system for building automated data readiness workflows on the lines of AutoML [1]. The data readiness report could serve as a valuable asset for organizing and operationalizing data in a Data-as-a-service model as it augments the trust and reliability of the datasets. We anticipate that together with the Datasheets [2], Dataset Nutrition Label [3], FactSheets [4] and Model Cards [5], the Data Readiness Report completes the AI documentation pipeline and increases trust and re-useability of data.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations