A Model-Agnostic Framework to Correct Label-Bias in Training Data Using a Sample of Trusted Data

2021 
The importance of usage of massive datasets in machine learning is continuously growing. Such massive datasets pose serious challenge to robustness of the model to label-noise which is a critical property for any classifier. This challenge primarily stems from the fact that such datasets often contain biases which unfairly disadvantage certain groups. Non-random label-noise can be introduced into the data because of automatic labeling, lack of domain expertise while manual labeling and data poisoning attack by adversaries. Classifiers trained on such datasets can inherit these biases. In most of the prior work it is assumed that the entire training data are potentially corrupted. However it is always possible to curate small amount of trusted data for testing and validation and a part of it can be used up for denoising training data. Recently a methodology has been proposed to de-noise label-corrupted data using a sample of trusted data. The core of the methodology is based on the assumption that adversarial process has no access to true labels of the data. This is quite strong an assumption as it underestimates the capacity of the adversarial process. As part of this work, we proposed a more generic framework to correct label noise without any assumption on the access of the adversarial process to true labels of the data. Experimental results suggest performance improvement by our method over existing methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []