Robust Feature Leakage in Adversarial Examples

A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’: Robust Feature Leakage

Ilyas et al. report a surprising result: a model trained on adversarial examples is effective on clean data. They suggest this transfer is driven by non-robust cues overlaid on the dataset by the adversarial examples. However, an alternate mechanism for the transfer could be a kind of “robust feature leakage” where the model picks up on faint robust cues in the attacks.

Lower Bounding Leakage

Our technique for quantifying leakage consists of two steps:

First, we construct features that are provably robust, in a sense we will soon specify.
Next, we train a linear classifier on the datasets and …

We show that (a) PGD-based adversarial examples actually alter features in the data and (b) models can learn from human-meaningless/mislabeled training data. The dataset, on the other hand, illustrates that the non-robust features are actually sufficient for generalization and can be preferred over robust ones in natural settings.

Conclusion

The experiment put forth in the comment is a clever way of showing that such leakage is indeed possible. However, we want to stress (as the comment itself does) that robust feature leakage does not have an impact on our main thesis — the dataset explicitly controls for robust feature leakage (and in fact, allows us to quantify the models’ preference for robust features vs non-robust features — see Appendix D.6 in the paper).

Acknowledgments

Shan Carter (started the project), Preetum (technical discussion), Chris Olah (technical discussion), Ria (technical discussion), Aditiya (feedback)

References

Adversarial examples are not bugs, they are features Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B. and Madry, A., 2019. arXiv preprint arXiv:1905.02175.

Updates and Corrections

If you see mistakes or want to suggest changes, please create an issue on GitHub.

Reuse

Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Citation

For attribution in academic contexts, please cite this work as

Goh, "A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage", Distill, 2019.

@article{goh2019a,
  author = {Goh, Gabriel},
  title = {A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage},
  journal = {Distill},
  year = {2019},
  note = {https://distill.pub/2019/advex-bugs-discussion/response-2},
  doi = {10.23915/distill.00019.2}
}

Post Views: 63

Robust Feature Leakage in Adversarial Examples

A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features’: Robust Feature Leakage

Lower Bounding Leakage

Conclusion

Acknowledgments

References

Updates and Corrections

Reuse

Citation

Goodwood Festival of Speed unveils Future Lab lineup for 2026

Generate single title from this title Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore in 100 -150 characters. And it must return...

LLMs help robots understand vague instructions and focus on key details | MIT News

We Ranked #11 on the Top 100 Inspiring Workplaces List. Here’s What Got Us There.

SmartThings Blog

Goodwood Festival of Speed unveils Future Lab lineup for 2026

Generate single title from this title Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore in 100 -150 characters. And it must return...

LLMs help robots understand vague instructions and focus on key details | MIT News

We Ranked #11 on the Top 100 Inspiring Workplaces List. Here’s What Got Us There.

SmartThings Blog

How to Build an Employee Recognition Budget That Actually Gets Approved

Exploring the societal impacts of AI | MIT News

SmartThings Blog

LEAVE A REPLY Cancel reply

Latest

Goodwood Festival of Speed unveils Future Lab lineup for 2026

Generate single title from this title Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore in 100 -150 characters. And it must return...

LLMs help robots understand vague instructions and focus on key details | MIT News

Categories

Useful Links

Our Newsletter