Introduction
Neural networks can learn to classify images more accurately than any system humans directly design. This raises a natural question: What have these networks learned that allows them to classify images so well?
Feature Visualization
Feature visualization is a thread of research that tries to answer this question by letting us "see through the eyes" of the network. It began with research into visualizing individual neurons and trying to determine what they respond to. Because neurons don’t work in isolation, this led to applying feature visualization to simple combinations of neurons. However, there was still a problem – what combinations of neurons should we be studying? A natural answer (foreshadowed by work on model inversion) is to visualize activations, the combination of neurons firing in response to a particular input.
Activation Atlases
These approaches are exciting because they can make the hidden layers of networks comprehensible. These layers are the heart of how neural networks outperform more traditional approaches to machine learning, and historically, we’ve had little understanding of what happens in them. Feature visualization addresses this by connecting hidden layers back to the input, making them meaningful.
Unfortunately, visualizing activations has a major weakness – it is limited to seeing only how the network sees a single input. Because of this, it doesn’t give us a big-picture view of the network. When what we want is a map of an entire forest, inspecting one tree at a time will not suffice.
Aggregating Multiple Images
Activation grids show how the network sees a single image, but what if we want to see more? What if we want to understand how it reacts to millions of images? Of course, we could look at individual activation grids for those images one by one. But looking at millions of examples doesn’t scale, and human brains aren’t good at comparing lots of examples without structure. In the same way that we need a tool like a histogram to understand millions of numbers, we need a way to aggregate and organize activations if we want to see meaningful patterns in millions of them.
Activation Atlases in Practice
To create activation atlases, we collect activations from one million images, randomly select one spatial activation per image, and avoid the edges due to boundary effects. This gives us one million activation vectors – spatial consistency is necessary for making atlases zoomable. Thus, we preferred this method over clustering in this article, but finding trade-offs between these techniques remains an open question.
Discussion and Review
In this article, we introduce activation atlases to the quiver of techniques that give a global view of a network. By combining feature visualization with the approach of showing feature visualizations of averaged activations, we can get the advantages of each in one view – a global map seen through the eyes of the network.
Author Contributions
Shan Carter wrote the majority of the article and performed most of the experiments. Zan Armstrong helped with the interactive diagrams and the writing. Ludwig Schubert provided technical help throughout and performed the numerical analysis of the manual patches. Ian Johnson provided inspiration for the original idea and advice throughout. Chris Olah provided essential technical contributions and substantial writing contributions throughout.
Acknowledgments
Thanks to Kevin Quealy and Sam Greydanus for substantial editing help. Thanks to Colin Raffel, Arvind Satyanarayan, Alexander Mordvintsev, and Nick Cammarata for additional feedback during development. We’re also very grateful to Phillip Isola for stepping in as acting Distill editor for this article, and to our reviewers who took time to give us feedback, significantly improving our paper.
FAQs
Q: What are activation atlases?
A: Activation atlases are a new way to visualize the internal workings of neural networks, showing how they respond to different inputs and patterns in the data.
Q: How do they work?
A: By collecting and organizing the activations from millions of images, we can create a zoomable map of how the network sees the world, showing how different parts of the network respond to different types of images.
Q: What are the limitations of activation atlases?
A: While activation atlases can reveal high-level misunderstandings in a model, they are limited to the distribution of the data used to train the network, and may not generalize to new data.
Q: How can I use activation atlases?
A: By studying the activation atlases, you can gain a deeper understanding of how your own models work, and identify potential biases or limitations in your training data.

