Adversarial Robustness of Model Sets

2020 
Machine learning models are vulnerable to very small adversarial input perturbations. Here, we study the question of whether the list of predictions made by a list of models can also be changed arbitrarily by a single small perturbation. Clearly, this is a harder problem since one has to simultaneously mislead several models using the same perturbation, where the target classes assigned to the models might differ. This attack has several applications over models designed by different manufacturers for a similar purpose. One might want a single perturbation that acts differently on each model; like only mis-leading a subset, or making each model predict a different label. Also, one might want a perturbation that misleads each model the same way and thereby create a transferable perturbation. Current approaches are not applicable for this general problem directly. Here, we propose an algorithm that is able to find a perturbation that satisfies several kinds of attack patterns. For example, all the models could have the same target class, or different random target classes, or target classes designed to be maximally contradicting. We evaluated our algorithm using three model sets consisting of publicly available pre-trained ImageNet models of varying capacity and architecture. We demonstrate that, in all the scenarios, our method is able to find visually insignificant perturbations that achieve our target adversarial patterns.1
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []