Risk-based Quality Management in CDM An inquiry into the value of generalized query-based data cleaning

2021 
INTRODUCTION: The essence of Risk-based Quality Management is to focus on what matters most. Sponsor companies are annually spending many billions of dollars on automated and manual data cleaning, utilizing internal and site resources to create the illusion of an error-free data base. However, over the last 15 years, a robust data acquisition process was developed, and much of the current query-based data cleaning approach is a self-imposed remnant of the error-prone processes of the past.OBJECTIVE: Motivated by previously reported insights regarding the limited value of generalized source data verification (SDV), seven pharmaceutical companies and a clinical trials solution vendor teamed up to answer the question “How efficient is the query-based data cleaning process?”METHODS: Twenty completed Phase III studies representing different sponsors and therapeutic areas (TAs) were randomly selected from a collective data pool. Query data was aggregated across studies and classified by query type (automatic/manual) and initiator role. Form types were standardized using a semi-supervised machine learning model. Each query record was classified as to whether the query resulted in a change (query efficacy) and whether a change was direct or indirect. Query efficacy was characterized by TA, query type, initiator role and form type and graphically represented using bar charts, histograms, heat maps and waffle plots.RESULTS: Combined, the studies represented more than 20,000 study participants, comprising nearly 50 million data points and over 1.9 million queries. While the overall query rate was 3.9%, even including indirect and non-informative modifications, fewer than half of these queries actually resulted in a data change, affecting less than 1.7% of entered data.CONCLUSION: While clear differences between the type of query, type of form, and query initiator provide important insights on approaches for further improving the Clinical Data Management (CDM) process, our data show that overall the current acquisition process is very robust. Considering the tremendous efforts that go into the generalized query-based data cleaning process, and the limited impact it has, it is concluded that this process does not contribute proportionally to the quality of the final database used for analysis. We recommend ending the current query process to correct errors of non-critical data in Phase 3 studies, and placing more emphasis on tools and techniques that help identify systemic issues in the data collection process. This would be a great leap forward for CDM towards a risk-based approach to quality management as described in ICH E6 (R2).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []