Introduction
The Grand Tour is a classic visualization technique for high-dimensional point clouds that projects a high-dimensional dataset into two dimensions.
Over time, the Grand Tour smoothly animates its projection so that every possible view of the dataset is (eventually) presented to the viewer.
Unlike modern nonlinear projection methods such as t-SNE and UMAP, the Grand Tour is fundamentally a linear method.
In this article, we show how to leverage the linearity of the Grand Tour to enable a number of capabilities that are uniquely useful to visualize the behavior of neural networks.
Concretely, we present three use cases of interest: visualizing the training process as the network weights change, visualizing the layer-to-layer behavior as the data goes through the network and visualizing both how adversarial examples are crafted and how they fool a neural network.
Introduction
Deep neural networks often achieve best-in-class performance in supervised learning contests such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
Unfortunately, their decision process is notoriously hard to interpret
In this article, we present a method to visualize the responses of a neural network which leverages properties of deep neural networks and properties of the Grand Tour.
Notably, our method enables us to more directly reason about the relationship between changes in the data and changes in the resulting visualization
Working Examples
To illustrate the technique we will present, we trained deep neural network models (DNNs) with 3 common image classification datasets: MNIST
MNIST
Image credit to https://en.wikipedia.org/wiki/File:MnistExamples.png
fashion-MNIST
Fashion-MNIST

Image credit to https://towardsdatascience.com/multi-label-classification-and-class-activation-map-on-fashion-mnist-1454f09f5925
and CIFAR-10
CIFAR-10

Image credit to https://www.cs.toronto.edu/~kriz/cifar.html
While our architecture is simpler and smaller than current DNNs, it’s still indicative of modern networks, and is complex enough to demonstrate both our proposed techniques and shortcomings of typical approaches.
Convolutional Layers
With a change of representation, we can animate a convolutional layer like the previous section.
For 2D convolutions this change of representation involves flattening the input and output, and repeating the kernel pattern in a sparse matrix , where and are the dimensionalities of the input and output respectively.
This change of representation is only practical for a small dimensionality (e.g. up to 1000), since we need to solve SVD for linear layers.
However, the singular value decomposition of multi-channel 2D convolutions can be computed efficiently
Max-pooling Layers
We replace it by average-pooling and scaling by the ratio of the average to the max.
We compute the matrix form of average-pooling and use its SVD to align the view before and after this layer.
Functionally, our operations have equivalent results to max-pooling, but this introduces
unexpected artifacts. For example, the max-pooling version of the vector

