Date:

Visualizing Neural Networks with the Grand Tour

Introduction

The Grand Tour is a classic visualization technique for high-dimensional point clouds that projects a high-dimensional dataset into two dimensions.

Over time, the Grand Tour smoothly animates its projection so that every possible view of the dataset is (eventually) presented to the viewer.

Unlike modern nonlinear projection methods such as t-SNE and UMAP, the Grand Tour is fundamentally a linear method.

In this article, we show how to leverage the linearity of the Grand Tour to enable a number of capabilities that are uniquely useful to visualize the behavior of neural networks.

Concretely, we present three use cases of interest: visualizing the training process as the network weights change, visualizing the layer-to-layer behavior as the data goes through the network and visualizing both how adversarial examples are crafted and how they fool a neural network.

Introduction

Deep neural networks often achieve best-in-class performance in supervised learning contests such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

Unfortunately, their decision process is notoriously hard to interpret, and their training process is often hard to debug.

In this article, we present a method to visualize the responses of a neural network which leverages properties of deep neural networks and properties of the Grand Tour.

Notably, our method enables us to more directly reason about the relationship between changes in the data and changes in the resulting visualization.

Working Examples

To illustrate the technique we will present, we trained deep neural network models (DNNs) with 3 common image classification datasets: MNIST

MNIST contains grayscale images of 10 handwritten digits

Image credit to https://en.wikipedia.org/wiki/File:MnistExamples.png
,
fashion-MNIST

Fashion-MNIST contains grayscale images of 10 types of fashion items:


Image credit to https://towardsdatascience.com/multi-label-classification-and-class-activation-map-on-fashion-mnist-1454f09f5925

and CIFAR-10

CIFAR-10 contains RGB images of 10 classes of objects

Image credit to https://www.cs.toronto.edu/~kriz/cifar.html
.
While our architecture is simpler and smaller than current DNNs, it’s still indicative of modern networks, and is complex enough to demonstrate both our proposed techniques and shortcomings of typical approaches.

Convolutional Layers

Convolutional layers can be represented as special linear layers.
With a change of representation, we can animate a convolutional layer like the previous section.
For 2D convolutions this change of representation involves flattening the input and output, and repeating the kernel pattern in a sparse matrix MRm×nM \in \mathbb{R}^{m \times n}

Max-pooling Layers

Animating max-pooling layers is nontrivial because max-pooling is neither linear A max-pooling layer is piece-wise linear or coordinate-wise.
We replace it by average-pooling and scaling by the ratio of the average to the max.
We compute the matrix form of average-pooling and use its SVD to align the view before and after this layer.
Functionally, our operations have equivalent results to max-pooling, but this introduces
unexpected artifacts. For example, the max-pooling version of the vector [0.9,0.9,0.9,1.0][0.9, 0.9, 0.9, 1.0]

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here