The Challenges of Computation on Graphs
Graphs are extremely flexible mathematical models; but this means they lack consistent structure across instances.
Lack of Consistent Structure
Consider the task of predicting whether a given chemical molecule is toxic:
Looking at a few examples, the following issues quickly become apparent:
- Molecules may have different numbers of atoms.
- The atoms in a molecule may be of different types.
- Each of these atoms may have different number of connections.
- These connections can have different strengths.
Representing graphs in a format that can be computed over is non-trivial, and the final representation chosen often depends significantly on the actual problem.
Node-Order Equivariance
Extending the point above: graphs often have no inherent ordering present amongst the nodes. Compare this to images, where every pixel is uniquely determined by its absolute position within the image!
Above:
The same graph labelled in two different ways. The alphabets indicate the ordering of the nodes.
As a result, we would like our algorithms to be node-order equivariant: they should not depend on the ordering of the nodes of the graph. If we permute the nodes in some way, the resulting representations of the nodes as computed by our algorithms should also be permuted in the same way.
Scalability
Graphs can be really large! Think about social networks like Facebook and Twitter, which have over a billion users. Operating on data this large is not easy.
Luckily, most naturally occuring graphs are ‘sparse’: they tend to have their number of edges linear in their number of vertices. We will see that this allows the use of clever methods to efficiently compute representations of nodes within the graph. Further, the methods that we look at here will have significantly fewer parameters in comparison to the size of the graphs they operate on.
Problem Setting and Notation
There are many useful problems that can be formulated over graphs:
- Node Classification: Classifying individual nodes.
- Graph Classification: Classifying entire graphs.
- Node Clustering: Grouping together similar nodes based on connectivity.
- Link Prediction: Predicting missing links.
- Influence Maximization: Identifying the most influential nodes in a graph.
However, there do exist more powerful techniques for ‘pooling’ together node representations:
- SortPool: Sort vertices of the graph to get a fixed-size node-order invariant representation of the graph, and then apply any standard neural network architecture.
- DiffPool: Learn to cluster vertices, build a coarser graph over clusters instead of nodes, then apply a GNN over the coarser graph. Repeat until only one cluster is left.
- SAGPool: Apply a GNN to learn node scores, then keep only the nodes with the top scores, throwing away the rest. Repeat until only one node is left.
Conclusion
Graph neural networks are a family of neural networks that can operate naturally on graph-structured data. By extracting and utilizing features from the underlying graph, GNNs can make more informed predictions about entities in these interactions, as compared to models that consider individual entities in isolation.
Frequently Asked Questions
What are the main challenges of computation on graphs?
The main challenges of computation on graphs are the lack of consistent structure, node-order equivariance, and scalability.
What are some examples of graph neural networks?
Some examples of graph neural networks include SortPool, DiffPool, and SAGPool.
What are some applications of graph neural networks?
Some applications of graph neural networks include node classification, graph classification, node clustering, link prediction, and influence maximization.