Large heterogeneous graphs are common in many real-world datasets.
In this context, large means single graphs with thousands of nodes and edges. Heterogeneous means the nodes in the graph represent different types of entities, while the edges represent diverse relations between those entity types.
For example, social network can be modelled in graph form, with nodes representing users, and the edges between them indicating friendship connections. A heterogeneous graph can accommodate other node types such as posts users have made, groups they belong to, and events they are attending.
An entire social network can become extremely large with many users, posts, etc and all of the relations between them.
With the latest release of Graphcore’s Poplar SDK 3.3, we have extended our PyTorch Geometric IPU support to enable this class of application to be accelerated using Graphcore IPUs.
In this blog we will briefly show the latest features that have enabled using large heterogeneous graphs with IPUs. We also have a number of tutorials on these topics that can be run on Paperspace Gradient Notebooks, as well as an example that demonstrates using GNNs for fraud detection on a large heterogeneous graph.
Large graphs on IPUs
As graph size increases, there comes a point where training with full-batch requires an amount of memory too large for accelerators – even the IPU with its industry-leading on-chip SRAM.
Full-batch is when the input to your model is the entire graph and so each iteration involves all nodes and edges.
The solution for this is to sample from the large graph forming smaller mini-batches which are used as inputs to the model.
A common approach for this is GraphSAGE neighbour sampling where a mini-batch is formed from nodes to compute a representation, then randomly selected neighbours of those nodes, and neighbours of those etc. In this way a good representation of the target nodes can be formed while being scalable.
Using PyTorch Geometric, this approach is straightforward. There exists a NeighborLoader
object which provides a data loader which produces mini-batches of samples.
The IPU uses ahead-of-time compilation, meaning the entire graph must be static, including the inputs. This enables an efficient layout of memory and communication, as well as certain optimisations to be made during compilation.
The NeighborLoader
provided by PyTorch Geometric produces mini-batches that are not fixed-size, so in Graphcore’s latest Poplar SDK 3.3 PopTorch Geometric package, we provide a FixedSizeNeighborLoader
object which wraps the PyG’s NeighborLoader
but additionally makes the output mini-batches fixed in size. It's straightforward to use, being a drop-in replacement:
Additionally, we have provided a FixedSizeClusterLoader
, equivalent to the PyG ClusterLoader, with fixed-size mini-batch outputs.
Take a look at our tutorial on getting started with large graphs on the IPU.
Heterogeneous graphs on IPUs
We previously discussed that many real world graphs are heterogeneous, containing multiple types of nodes, and multiple types of relations between them. PyTorch Geometric already has great support for using heterogeneous graphs, enabling the construction of models in a flexible and concise way.
For more information, see PyG documentation.
Using PyG’s heterogeneous functionality with IPUs is a simple case of taking the existing functionality and wrapping your module in another that includes the loss function.
All existing data loaders in Graphcore’s latest Poplar SDK 3.3 PopTorch Geometric package now support Heterogeneous graph data, enabling the creation of fixed size heterogeneous mini-batches.
This is done by providing a number of nodes and edges to make a mini-batch up to such that it becomes fixed-size, or by specifically setting for each node and edge type a different value in order for the mini-batches to contain fewer nodes and edges for padding.
This also includes helper functionality to automatically get the required number of nodes and edges to pad a mini-batch from an existing PyG dynamic data loader.
We have written a tutorial to guide you through this in more detail.
Conclusions
With this additional support to use PyG’s sampling approaches and guidance on how to use PyG’s heterogeneous functionality with IPUs, it is now straightforward to start doing heterogeneous graph learning with large graphs on Graphcore IPUs.
Our tutorials and Paperspace Gradient Notebooks examples will help you get started and can be run using Paperspace's six hour free IPU trial.
Sampling Large Graphs for IPUs using PyG
Heterogeneous Graph Learning on IPUs
Training a GNN for fraud detection on IPUs with PyG