Interpreting Interpretations: Organizing Attribution Methods by Criteria

Zifan Wang,PiotrPiotr Mardziel,Anupam Datta,Matt Fredrikson

Interpreting Interpretations: Organizing Attribution Methods by Criteria

2020

Attribution methods that explains the behaviour of machine learning models, e.g. Convolutional Neural Networks (CNNs), have developed into many different forms, motivated by desirable distinct, though related, criteria. Following the diversity of attribution methods, evaluation tools are in need to answer: which method is better for what purpose and why? This paper introduces a new way to decompose the evaluation for attribution methods into two criteria: ordering and proportionality. We argue that existing evaluations follow an ordering criteria roughly corresponding to either the logical concept of necessity or sufficiency. The paper further demonstrates a notion of Proportionality for Necessity and Sufficiency, a quantitative evaluation to compare existing attribution methods, as a refinement to the ordering criteria. Evaluating the performance of existing attribution methods on explaining the CNN for image classification, we conclude that some attribution methods are better in the necessity analysis and the others are better in the sufficiency analysis, but no method is always the winner on both sides.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations