Game-theoretic Understanding of Adversarially Learned Features.

2021 
This paper aims to understand adversarial attacks and defense from a new perspecitve, i.e., the signal-processing behavior of DNNs. We novelly define the multi-order interaction in game theory, which satisfies six properties. With the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide more insights into and make a revision of previous understanding for the shape bias of adversarially learned features. Besides, the multi-order interaction can also explain the recoverability of adversarial examples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    86
    References
    7
    Citations
    NaN
    KQI
    []