On the Existence of Non-Robust Features in Neural Networks
Discussion Themes
- Clarifications: Discussion between respondents and original authors surfaced several misunderstandings or opportunities to sharpen claims.
- Successful Replication: Respondents successfully reproduced many of the experiments in Ilyas et al. and had no unsuccessful replication attempts.
- Exploring the Boundaries of Non-Robust Transfer: Three comments focused on variants of the "non-robust dataset" experiment, where training on adversarial examples transfers to real data.
- Properties of Robust and Non-Robust Features: The other three comments focused on the properties of robust and non-robust models.
Comments
- Adversarial Example Researchers Need to Expand What is Meant by "Robustness"
- Justin and Dan discuss "non-robust features" as a special case of models being non-robust because they latch on to superficial correlations.
- Original authors: The authors fully agree that studying a wider notion of robustness will become increasingly important in ML.
- Adversarially Robust Neural Style Transfer
- Reiichiro shows that adversarial robustness makes neural style transfer work by default on a non-VGG architecture.
- Original authors: Very interesting results that highlight the potential role of non-robust features and the utility of robust models for downstream tasks.
- Adversarial Examples are Just Bugs, Too
- Preetum constructs a family of adversarial examples with no transfer to real data, suggesting that some adversarial examples are "bugs" in the original paper’s framing.
- Original authors: A fine-grained look at adversarial examples that neatly our thesis (i.e. that non-robust features exist and adversarial examples arise from them) while providing an example of adversarial examples that arise from "bugs".
- Learning from Incorrectly Labeled Data
- Eric shows that training on a model’s training errors, or on how it predicts examples from an unrelated dataset, can both transfer to the true test set.
- Original authors: These experiments are a creative demonstration of the fact that the underlying phenomenon of learning features from "human-meaningless" data can actually arise in a broad range of settings.
Original Author Discussion and Responses
The original authors describe their takeaways and some clarifications that resulted from the conversation.
Citation Information
If you wish to cite this discussion as a whole, citation information can be found below. The author order is all participants in the conversation in alphabetical order. You can also cite individual comments or the author responses using the citation information provided at the bottom of the corresponding article.
Editorial Note
This discussion article is an experiment organized by Chris Olah and Ludwig Schubert. Chris Olah facilitated and edited the comments and discussion process. Ludwig Schubert assisted by assembling the responses into their current presentation.
Conclusion
The discussion article presents a unique opportunity for researchers to engage in a deep and nuanced discussion of the paper’s findings. The responses and comments provide valuable insights into the properties of robust and non-robust features, and the implications of these findings for the broader field of machine learning.
FAQs
Q: What are non-robust features?
A: Non-robust features are features that are highly predictive but imperceptible to humans.
Q: Why do adversarial examples exist?
A: Adversarial examples exist because they are a manifestation of non-robust features in neural networks.
Q: Can we eliminate adversarial examples?
A: It is unclear whether it is possible to eliminate adversarial examples entirely, but developing a deeper understanding of non-robust features may help to reduce their impact.

