Date:

Effective Weight Banding

Branch Specialization

Open up any ImageNet conv net and look at the weights in the last layer. You’ll find a uniform spatial pattern to them, dramatically unlike anything we see elsewhere in the network. No individual weight is unusual, but the uniformity is so striking that when we first discovered it we thought it must be a bug. Just as different biological tissue types jump out as distinct under a microscope, the weights in this final layer jump out as distinct when visualized with NMF. We call this phenomenon weight banding.

1. When visualized with NMF, the weight banding in layer mixed_5b is as visually striking compared to any other layer in InceptionV1 (here shown: mixed_3a) as the smooth, regular striation of muscle tissue is when compared to any other tissue (here shown: cardiac muscle tissue and epithelial tissue).

So far, the Circuits thread has mostly focused on studying very small pieces of neural network – individual neurons and small circuits. In contrast, weight banding is an example of what we call a “structural phenomenon,” a larger-scale pattern in the circuits and features of a neural network. Other examples of structural phenomena are the recurring symmetries we see in equivariance motifs and the specialized slices of neural networks we see in branch specialization.

Where weight banding occurs

Weight banding consistently forms in the final convolutional layer of vision models with global average pooling.

Technical Notes

Training the simplified network

The simplified network used to study this phenomenon was trained on Imagenet (1.2 million images) for 90 epochs. Training was done on 8 GPUs with a global batch size of 512 for the first 30 epochs and 1024 for the remaining 60 epochs. The network was built using TF-Slim. Batch norm was used on convolutional layers and fully connected layers, except for the last fully connected layer with 1001 outputs.

12. Types of banding across different experiments.

Follow up experiment ideas

  • Using x-pooling and y-pooling together before the fully connected layer to present a lossy form of spatial positions to the fully connected layer. (Alec Radford’s suggestion)
  • Rotating the input randomly acts as a regularization technique to induce no banding? (it would likely work but hurt performance)

Author Contributions

As with many scientific collaborations, the contributions are difficult to separate because it was a collaborative effort that we wrote together.

Research. Ludwig Schubert accidentally discovered weight banding, thinking it was a bug. Michael Petrov performed an array of systematic investigations into when it occurs and how architectural decisions affect it. This investigation was done in the context of and informed by collaborative research into circuits by Nick Cammarata, Gabe Goh, Chelsea Voss, Chris Olah, and Ludwig.

Writing and Diagrams. Michael wrote and illustrated a first version of this article. Chelsea improved the text and illustrations, and thought about big picture framing. Chris helped with editing.

Acknowledgments

We are grateful to participants of #circuits in the Distill Slack for their engagement on this article, and especially to Alex Bäuerle, Ben Egan, Patrick Mineault, Vincent Tjeng, and David Valdman for their remarks on a first draft.

Conclusion

In this article, we presented a phenomenon called weight banding, which is a uniform spatial pattern in the weights of the final layer of vision models with global average pooling. We demonstrated that this phenomenon is consistently observed in these models and is not a bug. We also discussed the relationship between weight banding and architectural decisions, such as the use of global average pooling and fully connected layers.

FAQs

What is weight banding?

Weight banding is a uniform spatial pattern in the weights of the final layer of vision models with global average pooling.

Why does weight banding occur?

Weight banding appears to be related to architectural decisions, such as the use of global average pooling and fully connected layers. Further research is needed to fully understand the causes and implications of weight banding.

Can weight banding be used for image recognition?

Weight banding may be useful for image recognition, as it provides a way to visualize and understand the spatial patterns in the weights of the final layer of vision models. However, further research is needed to determine the practical applications of weight banding in image recognition.

Can weight banding be used for other tasks?

Weight banding may be applicable to other tasks beyond image recognition, such as object detection and segmentation. However, further research is needed to determine the practical applications of weight banding in other tasks.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here