Prof. Song Fu’s Research Group Makes Significant Progress in AI Security

As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks, so it is an important and arduous task to study the attack and defense against adversarial examples. Professor Song Fu from SIST and his collaborators have conductedlong-term research in this field and made significant progress.

Professor Song Fu’s research group from the School of Information Science and Technology (SIST) has recently published a research paper entitled “Attack as Defense: Characterizing Adversarial Examples using Robustness”. This paper was published in the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021), one of the most prestigious software engineering conferences in the world (CCF-A).

This paper proposed a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, this paper proposes a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example’s robustness.A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches.

The figure above shows adversarial examples of targeted attack from ‘airplane’ to ‘cat’ and ‘horse’. The first column shows the original image. Without any defense, adversarial examples with less distortion can be crafted, cf. the second column. The adversarial examples at this time are very deceptive, with perturbations that are difficult to detect by humans, but can deceive the neural network. If both our defense method and adversarial training are enabled, attackers require much more distortion to craft adversarial examples, cf. the lase column. Now the distortion is too large to be human- imperceptible, and users can clearly see the silhouettes of ‘cats’ or ‘horse’ on the adversarial examples.

Ph.D. student Zhao Zhe is the first author, Professor Song Fu is the corresponding author for this paper, and ShanghaiTech University is the first affiliation. Graduate student Chen Guangke and undergraduate student Yang Yiwei are the second and fourth authors, respectively.