Explaining black box models by means of local rules

2019 
Many high performance machine learning methods produce black box models, which do not disclose their internal logic yielding the prediction. However, in many application domains understanding the motivation of a prediction is becoming a requisite to trust the prediction itself. We propose a novel rule-based method that explains the prediction of any classifier on a specific instance by analyzing the joint effect of feature subsets on the classifier prediction. The relevant subsets are identified by learning a local rule-based model in the neighborhood of the prediction to explain. While local rules give a qualitative insight of the local behavior, their relevance is quantified by using the concept of prediction difference. Preliminary experiments show that, despite the approximation introduced by the local model, the explanations provided by our method are effective in detecting the effects of attribute correlation. Our method is model-agnostic. Hence, experts can compare explanations and local behaviors of the predictions for the same instance made by different classifiers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    10
    Citations
    NaN
    KQI
    []