# torch_geometric.nn¶

class Sequential(input_args: str, modules: List[Union[Tuple[Callable, str], Callable]])[source]

An extension of the torch.nn.Sequential container in order to define a sequential GNN model. Since GNN operators take in multiple input arguments, torch_geometric.nn.Sequential expects both global input arguments, and function header definitions of individual operators. If omitted, an intermediate module will operate on the output of its preceding module:

from torch.nn import Linear, ReLU
from torch_geometric.nn import Sequential, GCNConv

model = Sequential('x, edge_index', [
(GCNConv(in_channels, 64), 'x, edge_index -> x'),
ReLU(inplace=True),
(GCNConv(64, 64), 'x, edge_index -> x'),
ReLU(inplace=True),
Linear(64, out_channels),
])


where 'x, edge_index' defines the input arguments of model, and 'x, edge_index -> x' defines the function header, i.e. input arguments and return types, of GCNConv.

In particular, this also allows to create more sophisticated models, such as utilizing JumpingKnowledge:

from torch.nn import Linear, ReLU, Dropout
from torch_geometric.nn import Sequential, GCNConv, JumpingKnowledge
from torch_geometric.nn import global_mean_pool

model = Sequential('x, edge_index, batch', [
(Dropout(p=0.5), 'x -> x'),
(GCNConv(dataset.num_features, 64), 'x, edge_index -> x1'),
ReLU(inplace=True),
(GCNConv(64, 64), 'x1, edge_index -> x2'),
ReLU(inplace=True),
(lambda x1, x2: [x1, x2], 'x1, x2 -> xs'),
(JumpingKnowledge("cat", 64, num_layers=2), 'xs -> x'),
(global_mean_pool, 'x, batch -> x'),
Linear(2 * 64, dataset.num_classes),
])

Parameters
• input_args (str) – The input arguments of the model.

• modules ([(str, Callable) or Callable]) – A list of modules (with optional function header definitions).

## Convolutional Layers¶

 MessagePassing Base class for creating message passing layers of the form GCNConv The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” paper ChebConv The chebyshev spectral graph convolutional operator from the “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering” paper SAGEConv The GraphSAGE operator from the “Inductive Representation Learning on Large Graphs” paper GraphConv The graph neural network operator from the “Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks” paper GravNetConv The GravNet operator from the “Learning Representations of Irregular Particle-detector Geometry with Distance-weighted Graph Networks” paper, where the graph is dynamically constructed using nearest neighbors. GatedGraphConv The gated graph convolution operator from the “Gated Graph Sequence Neural Networks” paper ResGatedGraphConv The residual gated graph convolutional operator from the “Residual Gated Graph ConvNets” paper GATConv The graph attentional operator from the “Graph Attention Networks” paper GATv2Conv The GATv2 operator from the “How Attentive are Graph Attention Networks?” paper, which fixes the static attention problem of the standard GATConv layer: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. TransformerConv The graph transformer operator from the “Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification” paper AGNNConv The graph attentional propagation layer from the “Attention-based Graph Neural Network for Semi-Supervised Learning” paper TAGConv The topology adaptive graph convolutional networks operator from the “Topology Adaptive Graph Convolutional Networks” paper GINConv The graph isomorphism operator from the “How Powerful are Graph Neural Networks?” paper GINEConv The modified GINConv operator from the “Strategies for Pre-training Graph Neural Networks” paper ARMAConv The ARMA graph convolutional operator from the “Graph Neural Networks with Convolutional ARMA Filters” paper SGConv The simple graph convolutional operator from the “Simplifying Graph Convolutional Networks” paper APPNP The approximate personalized propagation of neural predictions layer from the “Predict then Propagate: Graph Neural Networks meet Personalized PageRank” paper MFConv The graph neural network operator from the “Convolutional Networks on Graphs for Learning Molecular Fingerprints” paper RGCNConv The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper FastRGCNConv SignedConv The signed graph convolutional operator from the “Signed Graph Convolutional Network” paper DNAConv The dynamic neighborhood aggregation operator from the “Just Jump: Towards Dynamic Neighborhood Aggregation in Graph Neural Networks” paper PointNetConv The PointNet set layer from the “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” and “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” papers PointConv GMMConv The gaussian mixture model convolutional operator from the “Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs” paper SplineConv The spline-based convolutional operator from the “SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels” paper NNConv The continuous kernel-based convolutional operator from the “Neural Message Passing for Quantum Chemistry” paper. ECConv CGConv The crystal graph convolutional operator from the “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” paper EdgeConv The edge convolutional operator from the “Dynamic Graph CNN for Learning on Point Clouds” paper DynamicEdgeConv The dynamic edge convolutional operator from the “Dynamic Graph CNN for Learning on Point Clouds” paper (see torch_geometric.nn.conv.EdgeConv), where the graph is dynamically constructed using nearest neighbors in the feature space. XConv The convolutional operator on $$\mathcal{X}$$-transformed points from the “PointCNN: Convolution On X-Transformed Points” paper PPFConv The PPFNet operator from the “PPFNet: Global Context Aware Local Features for Robust 3D Point Matching” paper FeaStConv The (translation-invariant) feature-steered convolutional operator from the “FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis” paper HypergraphConv The hypergraph convolutional operator from the “Hypergraph Convolution and Hypergraph Attention” paper LEConv The local extremum graph neural network operator from the “ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations” paper, which finds the importance of nodes with respect to their neighbors using the difference operator: PNAConv The Principal Neighbourhood Aggregation graph convolution operator from the “Principal Neighbourhood Aggregation for Graph Nets” paper ClusterGCNConv The ClusterGCN graph convolutional operator from the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper GENConv The GENeralized Graph Convolution (GENConv) from the “DeeperGCN: All You Need to Train Deeper GCNs” paper. GCN2Conv The graph convolutional operator with initial residual connections and identity mapping (GCNII) from the “Simple and Deep Graph Convolutional Networks” paper PANConv The path integral based convolutional operator from the “Path Integral Based Convolution and Pooling for Graph Neural Networks” paper WLConv The Weisfeiler Lehman operator from the “A Reduction of a Graph to a Canonical Form and an Algebra Arising During this Reduction” paper, which iteratively refines node colorings: FiLMConv The FiLM graph convolutional operator from the “GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation” paper SuperGATConv The self-supervised graph attentional operator from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper FAConv The Frequency Adaptive Graph Convolution operator from the “Beyond Low-Frequency Information in Graph Convolutional Networks” paper EGConv The Efficient Graph Convolution from the “Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions” paper. PDNConv The pathfinder discovery network convolutional operator from the “Pathfinder Discovery Networks for Neural Message Passing” paper GeneralConv A general GNN layer adapted from the “Design Space for Graph Neural Networks” paper. HGTConv The Heterogeneous Graph Transformer (HGT) operator from the “Heterogeneous Graph Transformer” paper. HeteroConv A generic wrapper for computing graph convolution on heterogeneous graphs.
class MessagePassing(aggr: Optional[str] = 'add', flow: str = 'source_to_target', node_dim: int = - 2)[source]

Base class for creating message passing layers of the form

$\mathbf{x}_i^{\prime} = \gamma_{\mathbf{\Theta}} \left( \mathbf{x}_i, \square_{j \in \mathcal{N}(i)} \, \phi_{\mathbf{\Theta}} \left(\mathbf{x}_i, \mathbf{x}_j,\mathbf{e}_{j,i}\right) \right),$

where $$\square$$ denotes a differentiable, permutation invariant function, e.g., sum, mean or max, and $$\gamma_{\mathbf{\Theta}}$$ and $$\phi_{\mathbf{\Theta}}$$ denote differentiable functions such as MLPs. See here for the accompanying tutorial.

Parameters
• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max" or None). (default: "add")

• flow (string, optional) – The flow direction of message passing ("source_to_target" or "target_to_source"). (default: "source_to_target")

• node_dim (int, optional) – The axis along which to propagate. (default: -2)

propagate(edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, **kwargs)[source]

The initial call to start propagating messages.

Parameters
• edge_index (Tensor or SparseTensor) – A torch.LongTensor or a torch_sparse.SparseTensor that defines the underlying graph connectivity/message passing flow. edge_index holds the indices of a general (sparse) assignment matrix of shape [N, M]. If edge_index is of type torch.LongTensor, its shape must be defined as [2, num_messages], where messages from nodes in edge_index[0] are sent to nodes in edge_index[1] (in case flow="source_to_target"). If edge_index is of type torch_sparse.SparseTensor, its sparse indices (row, col) should relate to row = edge_index[1] and col = edge_index[0]. The major difference between both formats is that we need to input the transposed sparse adjacency matrix into propagate().

• size (tuple, optional) – The size (N, M) of the assignment matrix in case edge_index is a LongTensor. If set to None, the size will be automatically inferred and assumed to be quadratic. This argument is ignored in case edge_index is a torch_sparse.SparseTensor. (default: None)

• **kwargs – Any additional data which is needed to construct and aggregate messages, and to update node embeddings.

message(x_j: torch.Tensor)torch.Tensor[source]

Constructs messages from node $$j$$ to node $$i$$ in analogy to $$\phi_{\mathbf{\Theta}}$$ for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes $$i$$ and $$j$$ by appending _i or _j to the variable name, .e.g. x_i and x_j.

aggregate(inputs: torch.Tensor, index: torch.Tensor, ptr: Optional[torch.Tensor] = None, dim_size: Optional[int] = None)torch.Tensor[source]

Aggregates messages from neighbors as $$\square_{j \in \mathcal{N}(i)}$$.

Takes in the output of message computation as first argument and any argument which was initially passed to propagate().

By default, this function will delegate its call to scatter functions that support “add”, “mean” and “max” operations as specified in __init__() by the aggr argument.

Fuses computations of message() and aggregate() into a single function. If applicable, this saves both time and memory since messages do not explicitly need to be materialized. This function will only gets called in case it is implemented and propagation takes place based on a torch_sparse.SparseTensor.

update(inputs: torch.Tensor)torch.Tensor[source]

Updates node embeddings in analogy to $$\gamma_{\mathbf{\Theta}}$$ for each node $$i \in \mathcal{V}$$. Takes in the output of aggregation as first argument and any argument which was initially passed to propagate().

register_propagate_forward_pre_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward pre-hook on the module. The hook will be called every time before propagate() is invoked. It should have the following signature:

hook(module, inputs) -> None or modified input


The hook can modify the input. Input keyword arguments are passed to the hook as a dictionary in inputs[-1].

Returns a torch.utils.hooks.RemovableHandle that can be used to remove the added hook by calling handle.remove().

register_propagate_forward_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward hook on the module. The hook will be called every time after propagate() has computed an output. It should have the following signature:

hook(module, inputs, output) -> None or modified output


The hook can modify the output. Input keyword arguments are passed to the hook as a dictionary in inputs[-1].

Returns a torch.utils.hooks.RemovableHandle that can be used to remove the added hook by calling handle.remove().

register_message_forward_pre_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward pre-hook on the module. The hook will be called every time before message() is invoked. See register_propagate_forward_pre_hook() for more information.

register_message_forward_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward hook on the module. The hook will be called every time after message() has computed an output. See register_propagate_forward_hook() for more information.

register_aggregate_forward_pre_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward pre-hook on the module. The hook will be called every time before aggregate() is invoked. See register_propagate_forward_pre_hook() for more information.

register_aggregate_forward_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward hook on the module. The hook will be called every time after aggregate() has computed an output. See register_propagate_forward_hook() for more information.

register_message_and_aggregate_forward_pre_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward pre-hook on the module. The hook will be called every time before message_and_aggregate() is invoked. See register_propagate_forward_pre_hook() for more information.

register_message_and_aggregate_forward_hook(hook: Callable)torch.utils.hooks.RemovableHandle[source]

Registers a forward hook on the module. The hook will be called every time after message_and_aggregate() has computed an output. See register_propagate_forward_hook() for more information.

jittable(typing: Optional[str] = None)[source]

Analyzes the MessagePassing instance and produces a new jittable module.

Parameters

typing (string, optional) – If given, will generate a concrete instance with forward() types based on typing, e.g.: "(Tensor, Optional[Tensor]) -> Tensor".

class GCNConv(in_channels: int, out_channels: int, improved: bool = False, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, bias: bool = True, **kwargs)[source]

The graph convolutional operator from the “Semi-supervised Classification with Graph Convolutional Networks” paper

$\mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},$

where $$\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$$ denotes the adjacency matrix with inserted self-loops and $$\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$$ its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Its node-wise formulation is given by:

$\mathbf{x}^{\prime}_i = \mathbf{\Theta} \sum_{j \in \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j \hat{d}_i}} \mathbf{x}_j$

with $$\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}$$, where $$e_{j,i}$$ denotes the edge weight from source node j to target node i (default: 1.0)

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• improved (bool, optional) – If set to True, the layer computes $$\mathbf{\hat{A}}$$ as $$\mathbf{A} + 2\mathbf{I}$$. (default: False)

• cached (bool, optional) – If set to True, the layer will cache the computation of $$\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• normalize (bool, optional) – Whether to add self-loops and compute symmetric normalization coefficients on the fly. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class ChebConv(in_channels: int, out_channels: int, K: int, normalization: Optional[str] = 'sym', bias: bool = True, **kwargs)[source]

The chebyshev spectral graph convolutional operator from the “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering” paper

$\mathbf{X}^{\prime} = \sum_{k=1}^{K} \mathbf{Z}^{(k)} \cdot \mathbf{\Theta}^{(k)}$

where $$\mathbf{Z}^{(k)}$$ is computed recursively by

\begin{align}\begin{aligned}\mathbf{Z}^{(1)} &= \mathbf{X}\\\mathbf{Z}^{(2)} &= \mathbf{\hat{L}} \cdot \mathbf{X}\\\mathbf{Z}^{(k)} &= 2 \cdot \mathbf{\hat{L}} \cdot \mathbf{Z}^{(k-1)} - \mathbf{Z}^{(k-2)}\end{aligned}\end{align}

and $$\mathbf{\hat{L}}$$ denotes the scaled and normalized Laplacian $$\frac{2\mathbf{L}}{\lambda_{\max}} - \mathbf{I}$$.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• K (int) – Chebyshev filter size $$K$$.

• normalization (str, optional) –

The normalization scheme for the graph Laplacian (default: "sym"):

1. None: No normalization $$\mathbf{L} = \mathbf{D} - \mathbf{A}$$

2. "sym": Symmetric normalization $$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}$$

3. "rw": Random-walk normalization $$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1} \mathbf{A}$$

You need to pass lambda_max to the forward() method of this operator in case the normalization is non-symmetric. lambda_max should be a torch.Tensor of size [num_graphs] in a mini-batch scenario and a scalar/zero-dimensional tensor when operating on single graphs. You can pre-compute lambda_max via the torch_geometric.transforms.LaplacianLambdaMax transform.

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x, edge_index, edge_weight: Optional[torch.Tensor] = None, batch: Optional[torch.Tensor] = None, lambda_max: Optional[torch.Tensor] = None)[source]
class SAGEConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, normalize: bool = False, root_weight: bool = True, bias: bool = True, **kwargs)[source]

The GraphSAGE operator from the “Inductive Representation Learning on Large Graphs” paper

$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \mathbf{W}_2 \cdot \mathrm{mean}_{j \in \mathcal{N(i)}} \mathbf{x}_j$
Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• normalize (bool, optional) – If set to True, output features will be $$\ell_2$$-normalized, i.e., $$\frac{\mathbf{x}^{\prime}_i} {\| \mathbf{x}^{\prime}_i \|_2}$$. (default: False)

• root_weight (bool, optional) – If set to False, the layer will not add transformed root node features to the output. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class GraphConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, aggr: str = 'add', bias: bool = True, **kwargs)[source]

The graph neural network operator from the “Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks” paper

$\mathbf{x}^{\prime}_i = \mathbf{\Theta}_1 \mathbf{x}_i + \mathbf{\Theta}_2 \sum_{j \in \mathcal{N}(i)} e_{j,i} \cdot \mathbf{x}_j$

where $$e_{j,i}$$ denotes the edge weight from source node j to target node i (default: 1)

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "add")

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class GravNetConv(in_channels: int, out_channels: int, space_dimensions: int, propagate_dimensions: int, k: int, num_workers: int = 1, **kwargs)[source]

The GravNet operator from the “Learning Representations of Irregular Particle-detector Geometry with Distance-weighted Graph Networks” paper, where the graph is dynamically constructed using nearest neighbors. The neighbors are constructed in a learnable low-dimensional projection of the feature space. A second projection of the input feature space is then propagated from the neighbors to each vertex using distance weights that are derived by applying a Gaussian function to the distances.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – The number of output channels.

• space_dimensions (int) – The dimensionality of the space used to construct the neighbors; referred to as $$S$$ in the paper.

• propagate_dimensions (int) – The number of features to be propagated between the vertices; referred to as $$F_{\textrm{LR}}$$ in the paper.

• k (int) – The number of nearest neighbors.

• num_workers (int) – Number of workers to use for k-NN computation. Has no effect in case batch is not None, or the input lies on the GPU. (default: 1)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], batch: Union[torch.Tensor, None, Tuple[torch.Tensor, torch.Tensor]] = None)torch.Tensor[source]
class GatedGraphConv(out_channels: int, num_layers: int, aggr: str = 'add', bias: bool = True, **kwargs)[source]

The gated graph convolution operator from the “Gated Graph Sequence Neural Networks” paper

\begin{align}\begin{aligned}\mathbf{h}_i^{(0)} &= \mathbf{x}_i \, \Vert \, \mathbf{0}\\\mathbf{m}_i^{(l+1)} &= \sum_{j \in \mathcal{N}(i)} e_{j,i} \cdot \mathbf{\Theta} \cdot \mathbf{h}_j^{(l)}\\\mathbf{h}_i^{(l+1)} &= \textrm{GRU} (\mathbf{m}_i^{(l+1)}, \mathbf{h}_i^{(l)})\end{aligned}\end{align}

up to representation $$\mathbf{h}_i^{(L)}$$. The number of input channels of $$\mathbf{x}_i$$ needs to be less or equal than out_channels. $$e_{j,i}$$ denotes the edge weight from source node j to target node i (default: 1)

Parameters
• out_channels (int) – Size of each input sample.

• num_layers (int) – The sequence length $$L$$.

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "add")

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class ResGatedGraphConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, act: Optional[Callable] = Sigmoid(), root_weight: bool = True, bias: bool = True, **kwargs)[source]

The residual gated graph convolutional operator from the “Residual Gated Graph ConvNets” paper

$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \eta_{i,j} \odot \mathbf{W}_2 \mathbf{x}_j$

where the gate $$\eta_{i,j}$$ is defined as

$\eta_{i,j} = \sigma(\mathbf{W}_3 \mathbf{x}_i + \mathbf{W}_4 \mathbf{x}_j)$

with $$\sigma$$ denoting the sigmoid function.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• act (callable, optional) – Gating function $$\sigma$$. (default: torch.nn.Sigmoid())

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• root_weight (bool, optional) – If set to False, the layer will not add transformed root node features to the output. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class GATConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The graph attentional operator from the “Graph Attention Networks” paper

$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$

where the attention coefficients $$\alpha_{i,j}$$ are computed as

$\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.$
Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, return_attention_weights=None)[source]
Parameters

return_attention_weights (bool, optional) – If set to True, will additionally return the tuple (edge_index, attention_weights), holding the computed attention weights for each edge. (default: None)

class GATv2Conv(in_channels: int, out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, share_weights: bool = False, **kwargs)[source]

The GATv2 operator from the “How Attentive are Graph Attention Networks?” paper, which fixes the static attention problem of the standard GATConv layer: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.

$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$

where the attention coefficients $$\alpha_{i,j}$$ are computed as

$\alpha_{i,j} = \frac{ \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta} [\mathbf{x}_i \, \Vert \, \mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta} [\mathbf{x}_i \, \Vert \, \mathbf{x}_k] \right)\right)}.$
Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• share_weights (bool, optional) – If set to True, the same matrix will be applied to the source and the target node of every edge. (default: False)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, return_attention_weights: Optional[bool] = None)[source]
Parameters

return_attention_weights (bool, optional) – If set to True, will additionally return the tuple (edge_index, attention_weights), holding the computed attention weights for each edge. (default: None)

class TransformerConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, beta: bool = False, dropout: float = 0.0, edge_dim: Optional[int] = None, bias: bool = True, root_weight: bool = True, **kwargs)[source]

The graph transformer operator from the “Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification” paper

$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \mathbf{W}_2 \mathbf{x}_{j},$

where the attention coefficients $$\alpha_{i,j}$$ are computed via multi-head dot product attention:

$\alpha_{i,j} = \textrm{softmax} \left( \frac{(\mathbf{W}_3\mathbf{x}_i)^{\top} (\mathbf{W}_4\mathbf{x}_j)} {\sqrt{d}} \right)$
Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• beta (bool, optional) –

If set, will combine aggregation and skip information via

$\mathbf{x}^{\prime}_i = \beta_i \mathbf{W}_1 \mathbf{x}_i + (1 - \beta_i) \underbrace{\left(\sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \mathbf{W}_2 \vec{x}_j \right)}_{=\mathbf{m}_i}$

with $$\beta_i = \textrm{sigmoid}(\mathbf{w}_5^{\top} [ \mathbf{x}_i, \mathbf{m}_i, \mathbf{x}_i - \mathbf{m}_i ])$$ (default: False)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• edge_dim (int, optional) –

Edge feature dimensionality (in case there are any). Edge features are added to the keys after linear transformation, that is, prior to computing the attention dot product. They are also added to final values after the same linear transformation. The model is:

$\mathbf{x}^{\prime}_i = \mathbf{W}_1 \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j} \left( \mathbf{W}_2 \mathbf{x}_{j} + \mathbf{W}_6 \mathbf{e}_{ij} \right),$

where the attention coefficients $$\alpha_{i,j}$$ are now computed via:

$\alpha_{i,j} = \textrm{softmax} \left( \frac{(\mathbf{W}_3\mathbf{x}_i)^{\top} (\mathbf{W}_4\mathbf{x}_j + \mathbf{W}_6 \mathbf{e}_{ij})} {\sqrt{d}} \right)$

(default None)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• root_weight (bool, optional) – If set to False, the layer will not add the transformed root node features to the output and the option beta is set to False. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, return_attention_weights=None)[source]
Parameters

return_attention_weights (bool, optional) – If set to True, will additionally return the tuple (edge_index, attention_weights), holding the computed attention weights for each edge. (default: None)

The graph attentional propagation layer from the “Attention-based Graph Neural Network for Semi-Supervised Learning” paper

$\mathbf{X}^{\prime} = \mathbf{P} \mathbf{X},$

where the propagation matrix $$\mathbf{P}$$ is computed as

$P_{i,j} = \frac{\exp( \beta \cdot \cos(\mathbf{x}_i, \mathbf{x}_j))} {\sum_{k \in \mathcal{N}(i)\cup \{ i \}} \exp( \beta \cdot \cos(\mathbf{x}_i, \mathbf{x}_k))}$

with trainable parameter $$\beta$$.

Parameters
reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class TAGConv(in_channels: int, out_channels: int, K: int = 3, bias: bool = True, normalize: bool = True, **kwargs)[source]

The topology adaptive graph convolutional networks operator from the “Topology Adaptive Graph Convolutional Networks” paper

$\mathbf{X}^{\prime} = \sum_{k=0}^K \left( \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2} \right)^k \mathbf{X} \mathbf{\Theta}_{k},$

where $$\mathbf{A}$$ denotes the adjacency matrix and $$D_{ii} = \sum_{j=0} A_{ij}$$ its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• K (int, optional) – Number of hops $$K$$. (default: 3)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• normalize (bool, optional) – Whether to apply symmetric normalization. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class GINConv(nn: Callable, eps: float = 0.0, train_eps: bool = False, **kwargs)[source]

The graph isomorphism operator from the “How Powerful are Graph Neural Networks?” paper

$\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)$

or

$\mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} + (1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),$

here $$h_{\mathbf{\Theta}}$$ denotes a neural network, .i.e. an MLP.

Parameters
• nn (torch.nn.Module) – A neural network $$h_{\mathbf{\Theta}}$$ that maps node features x of shape [-1, in_channels] to shape [-1, out_channels], e.g., defined by torch.nn.Sequential.

• eps (float, optional) – (Initial) $$\epsilon$$-value. (default: 0.)

• train_eps (bool, optional) – If set to True, $$\epsilon$$ will be a trainable parameter. (default: False)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class GINEConv(nn: Callable, eps: float = 0.0, train_eps: bool = False, **kwargs)[source]

The modified GINConv operator from the “Strategies for Pre-training Graph Neural Networks” paper

$\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathrm{ReLU} ( \mathbf{x}_j + \mathbf{e}_{j,i} ) \right)$

that is able to incorporate edge features $$\mathbf{e}_{j,i}$$ into the aggregation procedure.

Parameters
• nn (torch.nn.Module) – A neural network $$h_{\mathbf{\Theta}}$$ that maps node features x of shape [-1, in_channels] to shape [-1, out_channels], e.g., defined by torch.nn.Sequential.

• eps (float, optional) – (Initial) $$\epsilon$$-value. (default: 0.)

• train_eps (bool, optional) – If set to True, $$\epsilon$$ will be a trainable parameter. (default: False)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class ARMAConv(in_channels: int, out_channels: int, num_stacks: int = 1, num_layers: int = 1, shared_weights: bool = False, act: Optional[Callable] = ReLU(), dropout: float = 0.0, bias: bool = True, **kwargs)[source]

The ARMA graph convolutional operator from the “Graph Neural Networks with Convolutional ARMA Filters” paper

$\mathbf{X}^{\prime} = \frac{1}{K} \sum_{k=1}^K \mathbf{X}_k^{(T)},$

with $$\mathbf{X}_k^{(T)}$$ being recursively defined by

$\mathbf{X}_k^{(t+1)} = \sigma \left( \mathbf{\hat{L}} \mathbf{X}_k^{(t)} \mathbf{W} + \mathbf{X}^{(0)} \mathbf{V} \right),$

where $$\mathbf{\hat{L}} = \mathbf{I} - \mathbf{L} = \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}$$ denotes the modified Laplacian $$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}$$.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample $$\mathbf{x}^{(t+1)}$$.

• num_stacks (int, optional) – Number of parallel stacks $$K$$. (default: 1).

• num_layers (int, optional) – Number of layers $$T$$. (default: 1)

• act (callable, optional) – Activation function $$\sigma$$. (default: torch.nn.ReLU())

• shared_weights (int, optional) – If set to True the layers in each stack will share the same parameters. (default: False)

• dropout (float, optional) – Dropout probability of the skip connection. (default: 0.)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
initialize_parameters(module, input)[source]
class SGConv(in_channels: int, out_channels: int, K: int = 1, cached: bool = False, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The simple graph convolutional operator from the “Simplifying Graph Convolutional Networks” paper

$\mathbf{X}^{\prime} = {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X} \mathbf{\Theta},$

where $$\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$$ denotes the adjacency matrix with inserted self-loops and $$\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$$ its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• K (int, optional) – Number of hops $$K$$. (default: 1)

• cached (bool, optional) – If set to True, the layer will cache the computation of $${\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class APPNP(K: int, alpha: float, dropout: float = 0.0, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, **kwargs)[source]

The approximate personalized propagation of neural predictions layer from the “Predict then Propagate: Graph Neural Networks meet Personalized PageRank” paper

\begin{align}\begin{aligned}\mathbf{X}^{(0)} &= \mathbf{X}\\\mathbf{X}^{(k)} &= (1 - \alpha) \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X}^{(k-1)} + \alpha \mathbf{X}^{(0)}\\\mathbf{X}^{\prime} &= \mathbf{X}^{(K)},\end{aligned}\end{align}

where $$\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$$ denotes the adjacency matrix with inserted self-loops and $$\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$$ its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters
• K (int) – Number of iterations $$K$$.

• alpha (float) – Teleport probability $$\alpha$$.

• dropout (float, optional) – Dropout probability of edges during training. (default: 0)

• cached (bool, optional) – If set to True, the layer will cache the computation of $$\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• normalize (bool, optional) – Whether to add self-loops and apply symmetric normalization. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class MFConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, max_degree: int = 10, bias=True, **kwargs)[source]

The graph neural network operator from the “Convolutional Networks on Graphs for Learning Molecular Fingerprints” paper

$\mathbf{x}^{\prime}_i = \mathbf{W}^{(\deg(i))}_1 \mathbf{x}_i + \mathbf{W}^{(\deg(i))}_2 \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j$

which trains a distinct weight matrix for each possible vertex degree.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• max_degree (int, optional) – The maximum node degree to consider when updating weights (default: 10)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class RGCNConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, num_relations: int, num_bases: Optional[int] = None, num_blocks: Optional[int] = None, aggr: str = 'mean', root_weight: bool = True, bias: bool = True, **kwargs)[source]

The relational graph convolutional operator from the “Modeling Relational Data with Graph Convolutional Networks” paper

$\mathbf{x}^{\prime}_i = \mathbf{\Theta}_{\textrm{root}} \cdot \mathbf{x}_i + \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_r(i)} \frac{1}{|\mathcal{N}_r(i)|} \mathbf{\Theta}_r \cdot \mathbf{x}_j,$

where $$\mathcal{R}$$ denotes the set of relations, i.e. edge types. Edge type needs to be a one-dimensional torch.long tensor which stores a relation identifier $$\in \{ 0, \ldots, |\mathcal{R}| - 1\}$$ for each edge.

Note

This implementation is as memory-efficient as possible by iterating over each individual relation type. Therefore, it may result in low GPU utilization in case the graph has a large number of relations. As an alternative approach, FastRGCNConv does not iterate over each individual type, but may consume a large amount of memory to compensate. We advise to check out both implementations to see which one fits your needs.

Parameters
• in_channels (int or tuple) – Size of each input sample. A tuple corresponds to the sizes of source and target dimensionalities. In case no input features are given, this argument should correspond to the number of nodes in your graph.

• out_channels (int) – Size of each output sample.

• num_relations (int) – Number of relations.

• num_bases (int, optional) – If set to not None, this layer will use the basis-decomposition regularization scheme where num_bases denotes the number of bases to use. (default: None)

• num_blocks (int, optional) – If set to not None, this layer will use the block-diagonal-decomposition regularization scheme where num_blocks denotes the number of blocks to use. (default: None)

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "mean")

• root_weight (bool, optional) – If set to False, the layer will not add transformed root node features to the output. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, None, Tuple[Optional[torch.Tensor], torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_type: Optional[torch.Tensor] = None)[source]
Parameters
• x – The input node features. Can be either a [num_nodes, in_channels] node feature matrix, or an optional one-dimensional node index tensor (in which case input features are treated as trainable node embeddings). Furthermore, x can be of type tuple denoting source and destination node features.

• edge_type – The one-dimensional relation type/index for each edge in edge_index. Should be only None in case edge_index is of type torch_sparse.tensor.SparseTensor. (default: None)

class FastRGCNConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, num_relations: int, num_bases: Optional[int] = None, num_blocks: Optional[int] = None, aggr: str = 'mean', root_weight: bool = True, bias: bool = True, **kwargs)[source]
forward(x: Union[torch.Tensor, None, Tuple[Optional[torch.Tensor], torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_type: Optional[torch.Tensor] = None)[source]
class SignedConv(in_channels: int, out_channels: int, first_aggr: bool, bias: bool = True, **kwargs)[source]

The signed graph convolutional operator from the “Signed Graph Convolutional Network” paper

\begin{align}\begin{aligned}\mathbf{x}_v^{(\textrm{pos})} &= \mathbf{\Theta}^{(\textrm{pos})} \left[ \frac{1}{|\mathcal{N}^{+}(v)|} \sum_{w \in \mathcal{N}^{+}(v)} \mathbf{x}_w , \mathbf{x}_v \right]\\\mathbf{x}_v^{(\textrm{neg})} &= \mathbf{\Theta}^{(\textrm{neg})} \left[ \frac{1}{|\mathcal{N}^{-}(v)|} \sum_{w \in \mathcal{N}^{-}(v)} \mathbf{x}_w , \mathbf{x}_v \right]\end{aligned}\end{align}

if first_aggr is set to True, and

\begin{align}\begin{aligned}\mathbf{x}_v^{(\textrm{pos})} &= \mathbf{\Theta}^{(\textrm{pos})} \left[ \frac{1}{|\mathcal{N}^{+}(v)|} \sum_{w \in \mathcal{N}^{+}(v)} \mathbf{x}_w^{(\textrm{pos})}, \frac{1}{|\mathcal{N}^{-}(v)|} \sum_{w \in \mathcal{N}^{-}(v)} \mathbf{x}_w^{(\textrm{neg})}, \mathbf{x}_v^{(\textrm{pos})} \right]\\\mathbf{x}_v^{(\textrm{neg})} &= \mathbf{\Theta}^{(\textrm{pos})} \left[ \frac{1}{|\mathcal{N}^{+}(v)|} \sum_{w \in \mathcal{N}^{+}(v)} \mathbf{x}_w^{(\textrm{neg})}, \frac{1}{|\mathcal{N}^{-}(v)|} \sum_{w \in \mathcal{N}^{-}(v)} \mathbf{x}_w^{(\textrm{pos})}, \mathbf{x}_v^{(\textrm{neg})} \right]\end{aligned}\end{align}

otherwise. In case first_aggr is False, the layer expects x to be a tensor where x[:, :in_channels] denotes the positive node features $$\mathbf{X}^{(\textrm{pos})}$$ and x[:, in_channels:] denotes the negative node features $$\mathbf{X}^{(\textrm{neg})}$$.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• first_aggr (bool) – Denotes which aggregation formula to use.

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], pos_edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], neg_edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])[source]
class DNAConv(channels: int, heads: int = 1, groups: int = 1, dropout: float = 0.0, cached: bool = False, normalize: bool = True, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The dynamic neighborhood aggregation operator from the “Just Jump: Towards Dynamic Neighborhood Aggregation in Graph Neural Networks” paper

$\mathbf{x}_v^{(t)} = h_{\mathbf{\Theta}}^{(t)} \left( \mathbf{x}_{v \leftarrow v}^{(t)}, \left\{ \mathbf{x}_{v \leftarrow w}^{(t)} : w \in \mathcal{N}(v) \right\} \right)$

$\mathbf{x}_{v \leftarrow w}^{(t)} = \textrm{Attention} \left( \mathbf{x}^{(t-1)}_v \, \mathbf{\Theta}_Q^{(t)}, [\mathbf{x}_w^{(1)}, \ldots, \mathbf{x}_w^{(t-1)}] \, \mathbf{\Theta}_K^{(t)}, \, [\mathbf{x}_w^{(1)}, \ldots, \mathbf{x}_w^{(t-1)}] \, \mathbf{\Theta}_V^{(t)} \right)$

with $$\mathbf{\Theta}_Q^{(t)}, \mathbf{\Theta}_K^{(t)}, \mathbf{\Theta}_V^{(t)}$$ denoting (grouped) projection matrices for query, key and value information, respectively. $$h^{(t)}_{\mathbf{\Theta}}$$ is implemented as a non-trainable version of torch_geometric.nn.conv.GCNConv.

Note

In contrast to other layers, this operator expects node features as shape [num_nodes, num_layers, channels].

Parameters
• channels (int) – Size of each input/output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• groups (int, optional) – Number of groups to use for all linear projections. (default: 1)

• dropout (float, optional) – Dropout probability of attention coefficients. (default: 0.)

• cached (bool, optional) – If set to True, the layer will cache the computation of $$\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• normalize (bool, optional) – Whether to add self-loops and apply symmetric normalization. (default: True)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
Parameters

x – The input node features of shape [num_nodes, num_layers, channels].

class PointNetConv(local_nn: Optional[Callable] = None, global_nn: Optional[Callable] = None, add_self_loops: bool = True, **kwargs)[source]

The PointNet set layer from the “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” and “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” papers

$\mathbf{x}^{\prime}_i = \gamma_{\mathbf{\Theta}} \left( \max_{j \in \mathcal{N}(i) \cup \{ i \}} h_{\mathbf{\Theta}} ( \mathbf{x}_j, \mathbf{p}_j - \mathbf{p}_i) \right),$

where $$\gamma_{\mathbf{\Theta}}$$ and $$h_{\mathbf{\Theta}}$$ denote neural networks, .i.e. MLPs, and $$\mathbf{P} \in \mathbb{R}^{N \times D}$$ defines the position of each point.

Parameters
• local_nn (torch.nn.Module, optional) – A neural network $$h_{\mathbf{\Theta}}$$ that maps node features x and relative spatial coordinates pos_j - pos_i of shape [-1, in_channels + num_dimensions] to shape [-1, out_channels], e.g., defined by torch.nn.Sequential. (default: None)

• global_nn (torch.nn.Module, optional) – A neural network $$\gamma_{\mathbf{\Theta}}$$ that maps aggregated node features of shape [-1, out_channels] to shape [-1, final_out_channels], e.g., defined by torch.nn.Sequential. (default: None)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, None, Tuple[Optional[torch.Tensor], Optional[torch.Tensor]]], pos: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
PointConv
class GMMConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, dim: int, kernel_size: int, separate_gaussians: bool = False, aggr: str = 'mean', root_weight: bool = True, bias: bool = True, **kwargs)[source]

The gaussian mixture model convolutional operator from the “Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs” paper

$\mathbf{x}^{\prime}_i = \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \frac{1}{K} \sum_{k=1}^K \mathbf{w}_k(\mathbf{e}_{i,j}) \odot \mathbf{\Theta}_k \mathbf{x}_j,$

where

$\mathbf{w}_k(\mathbf{e}) = \exp \left( -\frac{1}{2} {\left( \mathbf{e} - \mathbf{\mu}_k \right)}^{\top} \Sigma_k^{-1} \left( \mathbf{e} - \mathbf{\mu}_k \right) \right)$

denotes a weighting function based on trainable mean vector $$\mathbf{\mu}_k$$ and diagonal covariance matrix $$\mathbf{\Sigma}_k$$.

Note

The edge attribute $$\mathbf{e}_{ij}$$ is usually given by $$\mathbf{e}_{ij} = \mathbf{p}_j - \mathbf{p}_i$$, where $$\mathbf{p}_i$$ denotes the position of node $$i$$ (see torch_geometric.transform.Cartesian).

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• dim (int) – Pseudo-coordinate dimensionality.

• kernel_size (int) – Number of kernels $$K$$.

• separate_gaussians (bool, optional) – If set to True, will learn separate GMMs for every pair of input and output channel, inspired by traditional CNNs. (default: False)

• aggr (string, optional) – The aggregation operator to use ("add", "mean", "max"). (default: "mean")

• root_weight (bool, optional) – If set to False, the layer will not add transformed root node features to the output. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)[source]
initialize_parameters(module, input)[source]
class SplineConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, dim: int, kernel_size: Union[int, List[int]], is_open_spline: bool = True, degree: int = 1, aggr: str = 'mean', root_weight: bool = True, bias: bool = True, **kwargs)[source]

The spline-based convolutional operator from the “SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels” paper

$\mathbf{x}^{\prime}_i = \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \cdot h_{\mathbf{\Theta}}(\mathbf{e}_{i,j}),$

where $$h_{\mathbf{\Theta}}$$ denotes a kernel function defined over the weighted B-Spline tensor product basis.

Note

Pseudo-coordinates must lay in the fixed interval $$[0, 1]$$ for this method to work as intended.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• dim (int) – Pseudo-coordinate dimensionality.

• kernel_size (int or [int]) – Size of the convolving kernel.

• is_open_spline (bool or [bool], optional) – If set to False, the operator will use a closed B-spline basis in this dimension. (default True)

• degree (int, optional) – B-spline basis degrees. (default: 1)

• aggr (string, optional) – The aggregation operator to use ("add", "mean", "max"). (default: "mean")

• root_weight (bool, optional) – If set to False, the layer will not add transformed root node features to the output. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
initialize_parameters(module, input)[source]
class NNConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, nn: Callable, aggr: str = 'add', root_weight: bool = True, bias: bool = True, **kwargs)[source]

The continuous kernel-based convolutional operator from the “Neural Message Passing for Quantum Chemistry” paper. This convolution is also known as the edge-conditioned convolution from the “Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs” paper (see torch_geometric.nn.conv.ECConv for an alias):

$\mathbf{x}^{\prime}_i = \mathbf{\Theta} \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \cdot h_{\mathbf{\Theta}}(\mathbf{e}_{i,j}),$

where $$h_{\mathbf{\Theta}}$$ denotes a neural network, .i.e. a MLP.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• nn (torch.nn.Module) – A neural network $$h_{\mathbf{\Theta}}$$ that maps edge features edge_attr of shape [-1, num_edge_features] to shape [-1, in_channels * out_channels], e.g., defined by torch.nn.Sequential.

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "add")

• root_weight (bool, optional) – If set to False, the layer will not add the transformed root node features to the output. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
ECConv
class CGConv(channels: Union[int, Tuple[int, int]], dim: int = 0, aggr: str = 'add', batch_norm: bool = False, bias: bool = True, **kwargs)[source]

The crystal graph convolutional operator from the “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” paper

$\mathbf{x}^{\prime}_i = \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \sigma \left( \mathbf{z}_{i,j} \mathbf{W}_f + \mathbf{b}_f \right) \odot g \left( \mathbf{z}_{i,j} \mathbf{W}_s + \mathbf{b}_s \right)$

where $$\mathbf{z}_{i,j} = [ \mathbf{x}_i, \mathbf{x}_j, \mathbf{e}_{i,j} ]$$ denotes the concatenation of central node features, neighboring node features and edge features. In addition, $$\sigma$$ and $$g$$ denote the sigmoid and softplus functions, respectively.

Parameters
• channels (int or tuple) – Size of each input sample. A tuple corresponds to the sizes of source and target dimensionalities.

• dim (int, optional) – Edge feature dimensionality. (default: 0)

• aggr (string, optional) – The aggregation operator to use ("add", "mean", "max"). (default: "add")

• batch_norm (bool, optional) – If set to True, will make use of batch normalization. (default: False)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class EdgeConv(nn: Callable, aggr: str = 'max', **kwargs)[source]

The edge convolutional operator from the “Dynamic Graph CNN for Learning on Point Clouds” paper

$\mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(i)} h_{\mathbf{\Theta}}(\mathbf{x}_i \, \Vert \, \mathbf{x}_j - \mathbf{x}_i),$

where $$h_{\mathbf{\Theta}}$$ denotes a neural network, .i.e. a MLP.

Parameters
• nn (torch.nn.Module) – A neural network $$h_{\mathbf{\Theta}}$$ that maps pair-wise concatenated node features x of shape [-1, 2 * in_channels] to shape [-1, out_channels], e.g., defined by torch.nn.Sequential.

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "max")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class DynamicEdgeConv(nn: Callable, k: int, aggr: str = 'max', num_workers: int = 1, **kwargs)[source]

The dynamic edge convolutional operator from the “Dynamic Graph CNN for Learning on Point Clouds” paper (see torch_geometric.nn.conv.EdgeConv), where the graph is dynamically constructed using nearest neighbors in the feature space.

Parameters
• nn (torch.nn.Module) – A neural network $$h_{\mathbf{\Theta}}$$ that maps pair-wise concatenated node features x of shape :obj:[-1, 2 * in_channels] to shape [-1, out_channels], e.g. defined by torch.nn.Sequential.

• k (int) – Number of nearest neighbors.

• aggr (string) – The aggregation operator to use ("add", "mean", "max"). (default: "max")

• num_workers (int) – Number of workers to use for k-NN computation. Has no effect in case batch is not None, or the input lies on the GPU. (default: 1)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], batch: Union[torch.Tensor, None, Tuple[torch.Tensor, torch.Tensor]] = None)torch.Tensor[source]
class XConv(in_channels: int, out_channels: int, dim: int, kernel_size: int, hidden_channels: Optional[int] = None, dilation: int = 1, bias: bool = True, num_workers: int = 1)[source]

The convolutional operator on $$\mathcal{X}$$-transformed points from the “PointCNN: Convolution On X-Transformed Points” paper

$\mathbf{x}^{\prime}_i = \mathrm{Conv}\left(\mathbf{K}, \gamma_{\mathbf{\Theta}}(\mathbf{P}_i - \mathbf{p}_i) \times \left( h_\mathbf{\Theta}(\mathbf{P}_i - \mathbf{p}_i) \, \Vert \, \mathbf{x}_i \right) \right),$

where $$\mathbf{K}$$ and $$\mathbf{P}_i$$ denote the trainable filter and neighboring point positions of $$\mathbf{x}_i$$, respectively. $$\gamma_{\mathbf{\Theta}}$$ and $$h_{\mathbf{\Theta}}$$ describe neural networks, i.e. MLPs, where $$h_{\mathbf{\Theta}}$$ individually lifts each point into a higher-dimensional space, and $$\gamma_{\mathbf{\Theta}}$$ computes the $$\mathcal{X}$$- transformation matrix based on all points in a neighborhood.

Parameters
• in_channels (int) – Size of each input sample.

• out_channels (int) – Size of each output sample.

• dim (int) – Point cloud dimensionality.

• kernel_size (int) – Size of the convolving kernel, i.e. number of neighbors including self-loops.

• hidden_channels (int, optional) – Output size of $$h_{\mathbf{\Theta}}$$, i.e. dimensionality of lifted points. If set to None, will be automatically set to in_channels / 4. (default: None)

• dilation (int, optional) – The factor by which the neighborhood is extended, from which kernel_size neighbors are then uniformly sampled. Can be interpreted as the dilation rate of classical convolutional operators. (default: 1)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• num_workers (int) – Number of workers to use for k-NN computation. Has no effect in case batch is not None, or the input lies on the GPU. (default: 1)

reset_parameters()[source]
forward(x: torch.Tensor, pos: torch.Tensor, batch: Optional[torch.Tensor] = None)[source]
class PPFConv(local_nn: Optional[Callable] = None, global_nn: Optional[Callable] = None, add_self_loops: bool = True, **kwargs)[source]

The PPFNet operator from the “PPFNet: Global Context Aware Local Features for Robust 3D Point Matching” paper

$\mathbf{x}^{\prime}_i = \gamma_{\mathbf{\Theta}} \left( \max_{j \in \mathcal{N}(i) \cup \{ i \}} h_{\mathbf{\Theta}} ( \mathbf{x}_j, \| \mathbf{d_{j,i}} \|, \angle(\mathbf{n}_i, \mathbf{d_{j,i}}), \angle(\mathbf{n}_j, \mathbf{d_{j,i}}), \angle(\mathbf{n}_i, \mathbf{n}_j) \right)$

where $$\gamma_{\mathbf{\Theta}}$$ and $$h_{\mathbf{\Theta}}$$ denote neural networks, .i.e. MLPs, which takes in node features and torch_geometric.transforms.PointPairFeatures.

Parameters
• local_nn (torch.nn.Module, optional) – A neural network $$h_{\mathbf{\Theta}}$$ that maps node features x and relative spatial coordinates pos_j - pos_i of shape [-1, in_channels + num_dimensions] to shape [-1, out_channels], e.g., defined by torch.nn.Sequential. (default: None)

• global_nn (torch.nn.Module, optional) – A neural network $$\gamma_{\mathbf{\Theta}}$$ that maps aggregated node features of shape [-1, out_channels] to shape [-1, final_out_channels], e.g., defined by torch.nn.Sequential. (default: None)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, None, Tuple[Optional[torch.Tensor], Optional[torch.Tensor]]], pos: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], normal: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class FeaStConv(in_channels: int, out_channels: int, heads: int = 1, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The (translation-invariant) feature-steered convolutional operator from the “FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis” paper

$\mathbf{x}^{\prime}_i = \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \sum_{h=1}^H q_h(\mathbf{x}_i, \mathbf{x}_j) \mathbf{W}_h \mathbf{x}_j$

with $$q_h(\mathbf{x}_i, \mathbf{x}_j) = \mathrm{softmax}_j (\mathbf{u}_h^{\top} (\mathbf{x}_j - \mathbf{x}_i) + c_h)$$, where $$H$$ denotes the number of attention heads, and $$\mathbf{W}_h$$, $$\mathbf{u}_h$$ and $$c_h$$ are trainable parameters.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of attention heads $$H$$. (default: 1)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class HypergraphConv(in_channels, out_channels, use_attention=False, heads=1, concat=True, negative_slope=0.2, dropout=0, bias=True, **kwargs)[source]

The hypergraph convolutional operator from the “Hypergraph Convolution and Hypergraph Attention” paper

$\mathbf{X}^{\prime} = \mathbf{D}^{-1} \mathbf{H} \mathbf{W} \mathbf{B}^{-1} \mathbf{H}^{\top} \mathbf{X} \mathbf{\Theta}$

where $$\mathbf{H} \in {\{ 0, 1 \}}^{N \times M}$$ is the incidence matrix, $$\mathbf{W} \in \mathbb{R}^M$$ is the diagonal hyperedge weight matrix, and $$\mathbf{D}$$ and $$\mathbf{B}$$ are the corresponding degree matrices.

For example, in the hypergraph scenario $$\mathcal{G} = (\mathcal{V}, \mathcal{E})$$ with $$\mathcal{V} = \{ 0, 1, 2, 3 \}$$ and $$\mathcal{E} = \{ \{ 0, 1, 2 \}, \{ 1, 2, 3 \} \}$$, the hyperedge_index is represented as:

hyperedge_index = torch.tensor([
[0, 1, 2, 1, 2, 3],
[0, 0, 0, 1, 1, 1],
])

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• use_attention (bool, optional) – If set to True, attention will be added to this layer. (default: False)

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, hyperedge_index: torch.Tensor, hyperedge_weight: Optional[torch.Tensor] = None, hyperedge_attr: Optional[torch.Tensor] = None)torch.Tensor[source]
Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• hyperedge_index (LongTensor) – The hyperedge indices, i.e. the sparse incidence matrix $$\mathbf{H} \in {\{ 0, 1 \}}^{N \times M}$$ mapping from nodes to edges.

• hyperedge_weight (Tensor, optional) – Hyperedge weights $$\mathbf{W} \in \mathbb{R}^M$$. (default: None)

• hyperedge_attr (Tensor, optional) – Hyperedge feature matrix in $$\mathbb{R}^{M \times F}$$. These features only need to get passed in case use_attention=True. (default: None)

class LEConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, bias: bool = True, **kwargs)[source]

The local extremum graph neural network operator from the “ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations” paper, which finds the importance of nodes with respect to their neighbors using the difference operator:

$\mathbf{x}^{\prime}_i = \mathbf{x}_i \cdot \mathbf{\Theta}_1 + \sum_{j \in \mathcal{N}(i)} e_{j,i} \cdot (\mathbf{\Theta}_2 \mathbf{x}_i - \mathbf{\Theta}_3 \mathbf{x}_j)$

where $$e_{j,i}$$ denotes the edge weight from source node j to target node i (default: 1)

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True).

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class PNAConv(in_channels: int, out_channels: int, aggregators: List[str], scalers: List[str], deg: torch.Tensor, edge_dim: Optional[int] = None, towers: int = 1, pre_layers: int = 1, post_layers: int = 1, divide_input: bool = False, **kwargs)[source]

The Principal Neighbourhood Aggregation graph convolution operator from the “Principal Neighbourhood Aggregation for Graph Nets” paper

$\mathbf{x}_i^{\prime} = \gamma_{\mathbf{\Theta}} \left( \mathbf{x}_i, \underset{j \in \mathcal{N}(i)}{\bigoplus} h_{\mathbf{\Theta}} \left( \mathbf{x}_i, \mathbf{x}_j \right) \right)$

with

$\begin{split}\bigoplus = \underbrace{\begin{bmatrix} 1 \\ S(\mathbf{D}, \alpha=1) \\ S(\mathbf{D}, \alpha=-1) \end{bmatrix} }_{\text{scalers}} \otimes \underbrace{\begin{bmatrix} \mu \\ \sigma \\ \max \\ \min \end{bmatrix}}_{\text{aggregators}},\end{split}$

where $$\gamma_{\mathbf{\Theta}}$$ and $$h_{\mathbf{\Theta}}$$ denote MLPs.

Note

For an example of using PNAConv, see examples/pna.py.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• aggregators (list of str) – Set of aggregation function identifiers, namely "sum", "mean", "min", "max", "var" and "std".

• scalers – (list of str): Set of scaling function identifiers, namely "identity", "amplification", "attenuation", "linear" and "inverse_linear".

• deg (Tensor) – Histogram of in-degrees of nodes in the training set, used by scalers to normalize.

• edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default None)

• towers (int, optional) – Number of towers (default: 1).

• pre_layers (int, optional) – Number of transformation layers before aggregation (default: 1).

• post_layers (int, optional) – Number of transformation layers after aggregation (default: 1).

• divide_input (bool, optional) – Whether the input features should be split between towers or not (default: False).

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None)torch.Tensor[source]
class ClusterGCNConv(in_channels: int, out_channels: int, diag_lambda: float = 0.0, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The ClusterGCN graph convolutional operator from the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper

$\mathbf{X}^{\prime} = \left( \mathbf{\hat{A}} + \lambda \cdot \textrm{diag}(\mathbf{\hat{A}}) \right) \mathbf{X} \mathbf{W}_1 + \mathbf{X} \mathbf{W}_2$

where $$\mathbf{\hat{A}} = {(\mathbf{D} + \mathbf{I})}^{-1}(\mathbf{A} + \mathbf{I})$$.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• diag_lambda – (float, optional): Diagonal enhancement value $$\lambda$$. (default: 0.)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class GENConv(in_channels: int, out_channels: int, aggr: str = 'softmax', t: float = 1.0, learn_t: bool = False, p: float = 1.0, learn_p: bool = False, msg_norm: bool = False, learn_msg_scale: bool = False, norm: str = 'batch', num_layers: int = 2, eps: float = 1e-07, **kwargs)[source]

The GENeralized Graph Convolution (GENConv) from the “DeeperGCN: All You Need to Train Deeper GCNs” paper. Supports SoftMax & PowerMean aggregation. The message construction is:

$\mathbf{x}_i^{\prime} = \mathrm{MLP} \left( \mathbf{x}_i + \mathrm{AGG} \left( \left\{ \mathrm{ReLU} \left( \mathbf{x}_j + \mathbf{e_{ji}} \right) +\epsilon : j \in \mathcal{N}(i) \right\} \right) \right)$

Note

For an example of using GENConv, see examples/ogbn_proteins_deepgcn.py.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• aggr (str, optional) – The aggregation scheme to use ("softmax", "softmax_sg", "power", "add", "mean", max). (default: "softmax")

• t (float, optional) – Initial inverse temperature for softmax aggregation. (default: 1.0)

• learn_t (bool, optional) – If set to True, will learn the value t for softmax aggregation dynamically. (default: False)

• p (float, optional) – Initial power for power mean aggregation. (default: 1.0)

• learn_p (bool, optional) – If set to True, will learn the value p for power mean aggregation dynamically. (default: False)

• msg_norm (bool, optional) – If set to True, will use message normalization. (default: False)

• learn_msg_scale (bool, optional) – If set to True, will learn the scaling factor of message normalization. (default: False)

• norm (str, optional) – Norm layer of MLP layers ("batch", "layer", "instance") (default: batch)

• num_layers (int, optional) – The number of MLP layers. (default: 2)

• eps (float, optional) – The epsilon value of the message construction function. (default: 1e-7)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.GenMessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None, size: Optional[Tuple[int, int]] = None)torch.Tensor[source]
class GCN2Conv(channels: int, alpha: float, theta: Optional[float] = None, layer: Optional[int] = None, shared_weights: bool = True, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, **kwargs)[source]

The graph convolutional operator with initial residual connections and identity mapping (GCNII) from the “Simple and Deep Graph Convolutional Networks” paper

$\mathbf{X}^{\prime} = \left( (1 - \alpha) \mathbf{\hat{P}}\mathbf{X} + \alpha \mathbf{X^{(0)}}\right) \left( (1 - \beta) \mathbf{I} + \beta \mathbf{\Theta} \right)$

with $$\mathbf{\hat{P}} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$$, where $$\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}$$ denotes the adjacency matrix with inserted self-loops and $$\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}$$ its diagonal degree matrix, and $$\mathbf{X}^{(0)}$$ being the initial feature representation. Here, $$\alpha$$ models the strength of the initial residual connection, while $$\beta$$ models the strength of the identity mapping. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters
• channels (int) – Size of each input and output sample.

• alpha (float) – The strength of the initial residual connection $$\alpha$$.

• theta (float, optional) – The hyperparameter $$\theta$$ to compute the strength of the identity mapping $$\beta = \log \left( \frac{\theta}{\ell} + 1 \right)$$. (default: None)

• layer (int, optional) – The layer $$\ell$$ in which this module is executed. (default: None)

• shared_weights (bool, optional) – If set to False, will use different weight matrices for the smoothed representation and the initial residual (“GCNII*”). (default: True)

• cached (bool, optional) – If set to True, the layer will cache the computation of $$\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• normalize (bool, optional) – Whether to add self-loops and apply symmetric normalization. (default: True)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, x_0: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None)torch.Tensor[source]
class PANConv(in_channels: int, out_channels: int, filter_size: int, **kwargs)[source]

The path integral based convolutional operator from the “Path Integral Based Convolution and Pooling for Graph Neural Networks” paper

$\mathbf{X}^{\prime} = \mathbf{M} \mathbf{X} \mathbf{W}$

where $$\mathbf{M}$$ denotes the normalized and learned maximal entropy transition (MET) matrix that includes neighbors up to filter_size hops:

$\mathbf{M} = \mathbf{Z}^{-1/2} \sum_{n=0}^L e^{-\frac{E(n)}{T}} \mathbf{A}^n \mathbf{Z}^{-1/2}$
Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• filter_size (int) – The filter size $$L$$.

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])Tuple[torch.Tensor, torch_sparse.tensor.SparseTensor][source]
panentropy(adj_t: torch_sparse.tensor.SparseTensor, dtype: Optional[int] = None)torch_sparse.tensor.SparseTensor[source]
class WLConv[source]

The Weisfeiler Lehman operator from the “A Reduction of a Graph to a Canonical Form and an Algebra Arising During this Reduction” paper, which iteratively refines node colorings:

$\mathbf{x}^{\prime}_i = \textrm{hash} \left( \mathbf{x}_i, \{ \mathbf{x}_j \colon j \in \mathcal{N}(i) \} \right)$
reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
histogram(x: torch.Tensor, batch: Optional[torch.Tensor] = None, norm: bool = False)torch.Tensor[source]

Given a node coloring x, computes the color histograms of the respective graphs (separated by batch).

class FiLMConv(in_channels: Union[int, Tuple[int, int]], out_channels: int, num_relations: int = 1, nn: Optional[Callable] = None, act: Optional[Callable] = ReLU(), aggr: str = 'mean', **kwargs)[source]

The FiLM graph convolutional operator from the “GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation” paper

$\mathbf{x}^{\prime}_i = \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}(i)} \sigma \left( \boldsymbol{\gamma}_{r,i} \odot \mathbf{W}_r \mathbf{x}_j + \boldsymbol{\beta}_{r,i} \right)$

where $$\boldsymbol{\beta}_{r,i}, \boldsymbol{\gamma}_{r,i} = g(\mathbf{x}_i)$$ with $$g$$ being a single linear layer by default. Self-loops are automatically added to the input graph and represented as its own relation type.

Note

For an example of using FiLM, see examples/gcn.py.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• num_relations (int, optional) – Number of relations. (default: 1)

• nn (torch.nn.Module, optional) – The neural network $$g$$ that maps node features x_i of shape [-1, in_channels] to shape [-1, 2 * out_channels]. If set to None, $$g$$ will be implemented as a single linear layer. (default: None)

• act (callable, optional) – Activation function $$\sigma$$. (default: torch.nn.ReLU())

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "mean")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_type: Optional[torch.Tensor] = None)torch.Tensor[source]
class SuperGATConv(in_channels: int, out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, attention_type: str = 'MX', neg_sample_ratio: float = 0.5, edge_sample_ratio: float = 1.0, is_undirected: bool = False, **kwargs)[source]

The self-supervised graph attentional operator from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper

$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$

where the two types of attention $$\alpha_{i,j}^{\mathrm{MX\ or\ SD}}$$ are computed as:

\begin{align}\begin{aligned}\alpha_{i,j}^{\mathrm{MX\ or\ SD}} &= \frac{ \exp\left(\mathrm{LeakyReLU}\left( e_{i,j}^{\mathrm{MX\ or\ SD}} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( e_{i,k}^{\mathrm{MX\ or\ SD}} \right)\right)}\\e_{i,j}^{\mathrm{MX}} &= \mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \cdot \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\e_{i,j}^{\mathrm{SD}} &= \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} }\end{aligned}\end{align}

The self-supervised task is a link prediction using the attention values as input to predict the likelihood $$\phi_{i,j}^{\mathrm{MX\ or\ SD}}$$ that an edge exists between nodes:

\begin{align}\begin{aligned}\phi_{i,j}^{\mathrm{MX}} &= \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\\phi_{i,j}^{\mathrm{SD}} &= \sigma \left( \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} } \right)\end{aligned}\end{align}

Note

For an example of using SuperGAT, see examples/super_gat.py.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• attention_type (string, optional) – Type of attention to use. ('MX', 'SD'). (default: 'MX')

• neg_sample_ratio (float, optional) – The ratio of the number of sampled negative edges to the number of positive edges. (default: 0.5)

• edge_sample_ratio (float, optional) – The ratio of samples to use for training among the number of training edges. (default: 1.0)

• is_undirected (bool, optional) – Whether the input graph is undirected. If not given, will be automatically computed with the input graph when negative sampling is performed. (default: False)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

att_x: Optional[torch.Tensor]
att_y: Optional[torch.Tensor]
reset_parameters()[source]
forward(x: torch.Tensor, edge_index: torch.Tensor, neg_edge_index: Optional[torch.Tensor] = None, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
Parameters

neg_edge_index (Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

negative_sampling(edge_index: torch.Tensor, num_nodes: int, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
positive_sampling(edge_index: torch.Tensor)torch.Tensor[source]
get_attention(edge_index_i: torch.Tensor, x_i: torch.Tensor, x_j: torch.Tensor, num_nodes: Optional[int], return_logits: bool = False)torch.Tensor[source]
get_attention_loss()torch.Tensor[source]

Compute the self-supervised graph attention loss.

class FAConv(channels: int, eps: float = 0.1, dropout: float = 0.0, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, **kwargs)[source]

The Frequency Adaptive Graph Convolution operator from the “Beyond Low-Frequency Information in Graph Convolutional Networks” paper

$\mathbf{x}^{\prime}_i= \epsilon \cdot \mathbf{x}^{(0)}_i + \sum_{j \in \mathcal{N}(i)} \frac{\alpha_{i,j}}{\sqrt{d_i d_j}} \mathbf{x}_{j}$

where $$\mathbf{x}^{(0)}_i$$ and $$d_i$$ denote the initial feature representation and node degree of node $$i$$, respectively. The attention coefficients $$\alpha_{i,j}$$ are computed as

$\mathbf{\alpha}_{i,j} = \textrm{tanh}(\mathbf{a}^{\top}[\mathbf{x}_i, \mathbf{x}_j])$

based on the trainable parameter vector $$\mathbf{a}$$.

Parameters
• channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• eps (float, optional) – $$\epsilon$$-value. (default: 0.1)

• dropout (float, optional) – Dropout probability of the normalized coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0).

• cached (bool, optional) – If set to True, the layer will cache the computation of $$\sqrt{d_i d_j}$$ on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• normalize (bool, optional) – Whether to add self-loops (if add_self_loops is True) and compute symmetric normalization coefficients on the fly. If set to False, edge_weight needs to be provided in the layer’s forward() method. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, x_0: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_weight: Optional[torch.Tensor] = None, return_attention_weights=None)[source]
Parameters

return_attention_weights (bool, optional) – If set to True, will additionally return the tuple (edge_index, attention_weights), holding the computed attention weights for each edge. (default: None)

class EGConv(in_channels: int, out_channels: int, aggregators: List[str] = ['symnorm'], num_heads: int = 8, num_bases: int = 4, cached: bool = False, add_self_loops: bool = True, bias: bool = True, **kwargs)[source]

The Efficient Graph Convolution from the “Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions” paper.

Its node-wise formulation is given by:

$\mathbf{x}_i^{\prime} = {\LARGE ||}_{h=1}^H \sum_{\oplus \in \mathcal{A}} \sum_{b = 1}^B w_{i, h, \oplus, b} \; \underset{j \in \mathcal{N}(i) \cup \{i\}}{\bigoplus} \mathbf{\Theta}_b \mathbf{x}_{j}$

with $$\mathbf{\Theta}_b$$ denoting a basis weight, $$\oplus$$ denoting an aggregator, and $$w$$ denoting per-vertex weighting coefficients across different heads, bases and aggregators.

EGC retains $$\mathcal{O}(|\mathcal{V}|)$$ memory usage, making it a sensible alternative to GCNConv, SAGEConv or GINConv.

Note

For an example of using EGConv, see examples/egc.py.

Parameters
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• aggregators (List[str], optional) – Aggregators to be used. Supported aggregators are "sum", "mean", "symnorm", "max", "min", "std", "var". Multiple aggregators can be used to improve the performance. (default: ["symnorm"])

• num_heads (int, optional) – Number of heads $$H$$ to use. Must have out_channels % num_heads == 0. It is recommended to set num_heads >= num_bases. (default: 8)

• num_bases (int, optional) – Number of basis weights $$B$$ to use. (default: 4)

• cached (bool, optional) – If set to True, the layer will cache the computation of the edge index with added self loops on first execution, along with caching the calculation of the symmetric normalized edge weights if the "symnorm" aggregator is being used. This parameter should only be set to True in transductive learning scenarios. (default: False)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor])torch.Tensor[source]
class PDNConv(in_channels: int, out_channels: int, edge_dim: int, hidden_channels: int, add_self_loops: bool = True, normalize: bool = True, bias: bool = True, **kwargs)[source]

The pathfinder discovery network convolutional operator from the “Pathfinder Discovery Networks for Neural Message Passing” paper

$\mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(v) \cup \{i\}}f_{\Theta}(\textbf{e}_{(j,i)}) \cdot f_{\Omega}(\mathbf{x}_{j})$

where $$z_{i,j}$$ denotes the edge feature vector from source node $$j$$ to target node $$i$$, and $$\mathbf{x}_{j}$$ denotes the node feature vector of node $$j$$.

Parameters
• in_channels (int) – Size of each input sample.

• out_channels (int) – Size of each output sample.

• edge_dim (int) – Edge feature dimensionality.

• hidden_channels (int) – Hidden edge feature dimensionality.

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• normalize (bool, optional) – Whether to add self-loops and compute symmetric normalization coefficients on the fly. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], edge_attr: Optional[torch.Tensor] = None)torch.Tensor[source]
class GeneralConv(in_channels: Union[int, Tuple[int, int]], out_channels: Optional[int], in_edge_channels: Optional[int] = None, aggr: str = 'add', skip_linear: str = False, directed_msg: bool = True, heads: int = 1, attention: bool = False, attention_type: str = 'additive', l2_normalize: bool = False, bias: bool = True, **kwargs)[source]

A general GNN layer adapted from the “Design Space for Graph Neural Networks” paper.

Parameters
• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• in_edge_channels (int, optional) – Size of each input edge. (default: None)

• aggr (string, optional) – The aggregation scheme to use ("add", "mean", "max"). (default: "mean")

• skip_linear (bool, optional) – Whether apply linear function in skip connection. (default: False)

• directed_msg (bool, optional) – If message passing is directed; otherwise, message passing is bi-directed. (default: True)

• heads (int, optional) – Number of message passing ensembles. If heads > 1, the GNN layer will output an ensemble of multiple messages. If attention is used (attention=True), this corresponds to multi-head attention. (default: 1)

• attention (bool, optional) – Whether to add attention to message computation. (default: False)

• attention_type (str, optional) – Type of attention: "additive", "dot_product". (default: "additive")

• l2_normalize (bool, optional) – If set to True, output features will be $$\ell_2$$-normalized, i.e., $$\frac{\mathbf{x}^{\prime}_i} {\| \mathbf{x}^{\prime}_i \|_2}$$. (default: False)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x: Union[torch.Tensor, Tuple[torch.Tensor, Optional[torch.Tensor]]], edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, edge_feature: Optional[torch.Tensor] = None)torch.Tensor[source]
message_basic(x_i: torch.Tensor, x_j: torch.Tensor, edge_feature: torch.Tensor)[source]
class HGTConv(in_channels: Union[int, Dict[str, int]], out_channels: int, metadata: Tuple[List[str], List[Tuple[str, str, str]]], heads: int = 1, group: str = 'sum', **kwargs)[source]

The Heterogeneous Graph Transformer (HGT) operator from the “Heterogeneous Graph Transformer” paper.

Note

For an example of using HGT, see examples/hetero/hgt_dblp.py.

Parameters
• in_channels (int or Dict[str, int]) – Size of each input sample of every node type, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• metadata (Tuple[List[str], List[Tuple[str, str, str]]]) – The metadata of the heterogeneous graph, i.e. its node and edge types given by a list of strings and a list of string triplets, respectively. See torch_geometric.data.HeteroData.metadata() for more information.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• group (string, optional) – The aggregation scheme to use for grouping node embeddings generated by different relations. ("sum", "mean", "min", "max"). (default: "sum")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

reset_parameters()[source]
forward(x_dict: Dict[str, torch.Tensor], edge_index_dict: Union[Dict[Tuple[str, str, str], torch.Tensor], Dict[Tuple[str, str, str], torch_sparse.tensor.SparseTensor]])Dict[str, Optional[torch.Tensor]][source]
Parameters
• x_dict (Dict[str, Tensor]) – A dictionary holding input node features for each individual node type.

• edge_index_dict – (Dict[str, Union[Tensor, SparseTensor]]): A dictionary holding graph connectivity information for each individual edge type, either as a torch.LongTensor of shape [2, num_edges] or a torch_sparse.SparseTensor.

Return type

Dict[str, Optional[Tensor]] - The ouput node embeddings for each node type. In case a node type does not receive any message, its output will be set to None.

class HeteroConv(convs: Dict[Tuple[str, str, str], torch.nn.modules.module.Module], aggr: str = 'sum')[source]

A generic wrapper for computing graph convolution on heterogeneous graphs. This layer will pass messages from source nodes to target nodes based on the bipartite GNN layer given for a specific edge type. If multiple relations point to the same destination, their results will be aggregated according to aggr. In comparison to torch_geometric.nn.to_hetero(), this layer is especially useful if you want to apply different message passing modules for different edge types.

hetero_conv = HeteroConv({
('paper', 'cites', 'paper'): GCNConv(-1, 64),
('author', 'writes', 'paper'): SAGEConv((-1, -1), 64),
('paper', 'written_by', 'author'): GATConv((-1, -1), 64),
}, aggr='sum')

out_dict = hetero_conv(x_dict, edge_index_dict)

print(list(out_dict.keys()))
>>> ['paper', 'author']

Parameters
• convs (Dict[Tuple[str, str, str], Module]) – A dictionary holding a bipartite MessagePassing layer for each individual edge type.

• aggr (string, optional) – The aggregation scheme to use for grouping node embeddings generated by different relations. ("sum", "mean", "min", "max"). (default: "sum")

reset_parameters()[source]
forward(x_dict: Dict[str, torch.Tensor], edge_index_dict: Dict[Tuple[str, str, str], Union[torch.Tensor, torch_sparse.tensor.SparseTensor]], **kwargs_dict)Dict[str, torch.Tensor][source]
Parameters
• x_dict (Dict[str, Tensor]) – A dictionary holding node feature information for each individual node type.

• edge_index_dict (Dict[Tuple[str, str, str], Tensor]) – A dictionary holding graph connectivity information for each individual edge type.

• **kwargs_dict (optional) – Additional forward arguments of individual torch_geometric.nn.conv.MessagePassing layers. For example, if a specific GNN layer at edge type edge_type expects edge attributes edge_attr as a forward argument, then you can pass them to forward() via edge_attr_dict = { edge_type: edge_attr }.

class MetaLayer(edge_model=None, node_model=None, global_model=None)[source]

A meta layer for building any kind of graph network, inspired by the “Relational Inductive Biases, Deep Learning, and Graph Networks” paper.

A graph network takes a graph as input and returns an updated graph as output (with same connectivity). The input graph has node features x, edge features edge_attr as well as global-level features u. The output graph has the same structure, but updated features.

Edge features, node features as well as global features are updated by calling the modules edge_model, node_model and global_model, respectively.

To allow for batch-wise graph processing, all callable functions take an additional argument batch, which determines the assignment of edges or nodes to their specific graphs.

Parameters
• edge_model (Module, optional) – A callable which updates a graph’s edge features based on its source and target node features, its current edge features and its global features. (default: None)

• node_model (Module, optional) – A callable which updates a graph’s node features based on its current node features, its graph connectivity, its edge features and its global features. (default: None)

• global_model (Module, optional) – A callable which updates a graph’s global features based on its node features, its graph connectivity, its edge features and its current global features.

from torch.nn import Sequential as Seq, Linear as Lin, ReLU
from torch_scatter import scatter_mean
from torch_geometric.nn import MetaLayer

class EdgeModel(torch.nn.Module):
def __init__(self):
super(EdgeModel, self).__init__()
self.edge_mlp = Seq(Lin(..., ...), ReLU(), Lin(..., ...))

def forward(self, src, dest, edge_attr, u, batch):
# src, dest: [E, F_x], where E is the number of edges.
# edge_attr: [E, F_e]
# u: [B, F_u], where B is the number of graphs.
# batch: [E] with max entry B - 1.
out = torch.cat([src, dest, edge_attr, u[batch]], 1)
return self.edge_mlp(out)

class NodeModel(torch.nn.Module):
def __init__(self):
super(NodeModel, self).__init__()
self.node_mlp_1 = Seq(Lin(..., ...), ReLU(), Lin(..., ...))
self.node_mlp_2 = Seq(Lin(..., ...), ReLU(), Lin(..., ...))

def forward(self, x, edge_index, edge_attr, u, batch):
# x: [N, F_x], where N is the number of nodes.
# edge_index: [2, E] with max entry N - 1.
# edge_attr: [E, F_e]
# u: [B, F_u]
# batch: [N] with max entry B - 1.
row, col = edge_index
out = torch.cat([x[row], edge_attr], dim=1)
out = self.node_mlp_1(out)
out = scatter_mean(out, col, dim=0, dim_size=x.size(0))
out = torch.cat([x, out, u[batch]], dim=1)
return self.node_mlp_2(out)

class GlobalModel(torch.nn.Module):
def __init__(self):
super(GlobalModel, self).__init__()
self.global_mlp = Seq(Lin(..., ...), ReLU(), Lin(..., ...))

def forward(self, x, edge_index, edge_attr, u, batch):
# x: [N, F_x], where N is the number of nodes.
# edge_index: [2, E] with max entry N - 1.
# edge_attr: [E, F_e]
# u: [B, F_u]
# batch: [N] with max entry B - 1.
out = torch.cat([u, scatter_mean(x, batch, dim=0)], dim=1)
return self.global_mlp(out)

op = MetaLayer(EdgeModel(), NodeModel(), GlobalModel())
x, edge_attr, u = op(x, edge_index, edge_attr, u, batch)

reset_parameters()[source]
forward(x: torch.Tensor, edge_index: torch.Tensor, edge_attr: Optional[torch.Tensor] = None, u: Optional[torch.Tensor] = None, batch: Optional[torch.Tensor] = None)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

## Dense Convolutional Layers¶

 Linear Applies a linear tranformation to the incoming data DenseGCNConv DenseGINConv DenseGraphConv
class Linear(in_channels: int, out_channels: int, bias: bool = True, weight_initializer: Optional[str] = None, bias_initializer: Optional[str] = None)[source]

Applies a linear tranformation to the incoming data

$\mathbf{x}^{\prime} = \mathbf{x} \mathbf{W}^{\top} + \mathbf{b}$

similar to torch.nn.Linear. It supports lazy initialization and customizable weight and bias initialization.

Parameters
• in_channels (int) – Size of each input sample. Will be initialized lazily in case -1.

• out_channels (int) – Size of each output sample.

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• weight_initializer (str, optional) – The initializer for the weight matrix ("glorot", "uniform", "kaiming_uniform" or None). If set to None, will match default weight initialization of torch.nn.Linear. (default: None)

• bias_initializer (str, optional) – The initializer for the bias vector ("zeros" or None). If set to None, will match default bias initialization of torch.nn.Linear. (default: None)

reset_parameters()[source]
forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

initialize_parameters(module, input)[source]
class DenseGCNConv(in_channels, out_channels, improved=False, bias=True)[source]
reset_parameters()[source]
Parameters
• x (Tensor) – Node feature tensor $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$, with batch-size $$B$$, (maximum) number of nodes $$N$$ for each graph, and feature dimension $$F$$.

• adj (Tensor) – Adjacency tensor $$\mathbf{A} \in \mathbb{R}^{B \times N \times N}$$. The adjacency tensor is broadcastable in the batch dimension, resulting in a shared adjacency matrix for the complete batch.

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$ indicating the valid nodes for each graph. (default: None)

• add_loop (bool, optional) – If set to False, the layer will not automatically add self-loops to the adjacency matrices. (default: True)

class DenseGINConv(nn, eps=0, train_eps=False)[source]
Return type

Tensor

reset_parameters()[source]
Parameters
• x (Tensor) – Node feature tensor $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$, with batch-size $$B$$, (maximum) number of nodes $$N$$ for each graph, and feature dimension $$F$$.

• adj (Tensor) – Adjacency tensor $$\mathbf{A} \in \mathbb{R}^{B \times N \times N}$$. The adjacency tensor is broadcastable in the batch dimension, resulting in a shared adjacency matrix for the complete batch.

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$ indicating the valid nodes for each graph. (default: None)

• add_loop (bool, optional) – If set to False, the layer will not automatically add self-loops to the adjacency matrices. (default: True)

reset_parameters()[source]
Parameters
• x (Tensor) – Node feature tensor $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$, with batch-size $$B$$, (maximum) number of nodes $$N$$ for each graph, and feature dimension $$F$$.

• adj (Tensor) – Adjacency tensor $$\mathbf{A} \in \mathbb{R}^{B \times N \times N}$$. The adjacency tensor is broadcastable in the batch dimension, resulting in a shared adjacency matrix for the complete batch.

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$ indicating the valid nodes for each graph. (default: None)

## Normalization Layers¶

 BatchNorm Applies batch normalization over a batch of node features as described in the “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper InstanceNorm Applies instance normalization over each individual example in a batch of node features as described in the “Instance Normalization: The Missing Ingredient for Fast Stylization” paper LayerNorm Applies layer normalization over each individual example in a batch of node features as described in the “Layer Normalization” paper GraphNorm Applies graph normalization over individual graphs as described in the “GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training” paper GraphSizeNorm Applies Graph Size Normalization over each individual graph in a batch of node features as described in the “Benchmarking Graph Neural Networks” paper PairNorm Applies pair normalization over node features as described in the “PairNorm: Tackling Oversmoothing in GNNs” paper MessageNorm Applies message normalization over the aggregated messages as described in the “DeeperGCNs: All You Need to Train Deeper GCNs” paper DiffGroupNorm The differentiable group normalization layer from the “Towards Deeper Graph Neural Networks with Differentiable Group Normalization” paper, which normalizes node features group-wise via a learnable soft cluster assignment
class BatchNorm(in_channels, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]

Applies batch normalization over a batch of node features as described in the “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” paper

$\mathbf{x}^{\prime}_i = \frac{\mathbf{x} - \textrm{E}[\mathbf{x}]}{\sqrt{\textrm{Var}[\mathbf{x}] + \epsilon}} \odot \gamma + \beta$

The mean and standard-deviation are calculated per-dimension over all nodes inside the mini-batch.

Parameters
• in_channels (int) – Size of each input sample.

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

• momentum (float, optional) – The value used for the running mean and running variance computation. (default: 0.1)

• affine (bool, optional) – If set to True, this module has learnable affine parameters $$\gamma$$ and $$\beta$$. (default: True)

• track_running_stats (bool, optional) – If set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. (default: True)

reset_parameters()[source]
forward(x: torch.Tensor)torch.Tensor[source]
class InstanceNorm(in_channels, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)[source]

Applies instance normalization over each individual example in a batch of node features as described in the “Instance Normalization: The Missing Ingredient for Fast Stylization” paper

$\mathbf{x}^{\prime}_i = \frac{\mathbf{x} - \textrm{E}[\mathbf{x}]}{\sqrt{\textrm{Var}[\mathbf{x}] + \epsilon}} \odot \gamma + \beta$

The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch.

Parameters
• in_channels (int) – Size of each input sample.

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

• momentum (float, optional) – The value used for the running mean and running variance computation. (default: 0.1)

• affine (bool, optional) – If set to True, this module has learnable affine parameters $$\gamma$$ and $$\beta$$. (default: False)

• track_running_stats (bool, optional) – If set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses instance statistics in both training and eval modes. (default: False)

forward(x: torch.Tensor, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
num_features: int
eps: float
momentum: float
affine: bool
track_running_stats: bool
class LayerNorm(in_channels, eps=1e-05, affine=True)[source]

Applies layer normalization over each individual example in a batch of node features as described in the “Layer Normalization” paper

$\mathbf{x}^{\prime}_i = \frac{\mathbf{x} - \textrm{E}[\mathbf{x}]}{\sqrt{\textrm{Var}[\mathbf{x}] + \epsilon}} \odot \gamma + \beta$

The mean and standard-deviation are calculated across all nodes and all node channels separately for each object in a mini-batch.

Parameters
• in_channels (int) – Size of each input sample.

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

• affine (bool, optional) – If set to True, this module has learnable affine parameters $$\gamma$$ and $$\beta$$. (default: True)

reset_parameters()[source]
forward(x: torch.Tensor, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
class GraphNorm(in_channels: int, eps: float = 1e-05)[source]

Applies graph normalization over individual graphs as described in the “GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training” paper

$\mathbf{x}^{\prime}_i = \frac{\mathbf{x} - \alpha \odot \textrm{E}[\mathbf{x}]} {\sqrt{\textrm{Var}[\mathbf{x} - \alpha \odot \textrm{E}[\mathbf{x}]] + \epsilon}} \odot \gamma + \beta$

where $$\alpha$$ denotes parameters that learn how much information to keep in the mean.

Parameters
• in_channels (int) – Size of each input sample.

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

reset_parameters()[source]
forward(x: torch.Tensor, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
class GraphSizeNorm[source]

Applies Graph Size Normalization over each individual graph in a batch of node features as described in the “Benchmarking Graph Neural Networks” paper

$\mathbf{x}^{\prime}_i = \frac{\mathbf{x}_i}{\sqrt{|\mathcal{V}|}}$
forward(x, batch=None)[source]
class PairNorm(scale: float = 1.0, scale_individually: bool = False, eps: float = 1e-05)[source]

Applies pair normalization over node features as described in the “PairNorm: Tackling Oversmoothing in GNNs” paper

\begin{align}\begin{aligned}\begin{split}\mathbf{x}_i^c &= \mathbf{x}_i - \frac{1}{n} \sum_{i=1}^n \mathbf{x}_i \\\end{split}\\\mathbf{x}_i^{\prime} &= s \cdot \frac{\mathbf{x}_i^c}{\sqrt{\frac{1}{n} \sum_{i=1}^n {\| \mathbf{x}_i^c \|}^2_2}}\end{aligned}\end{align}
Parameters
• scale (float, optional) – Scaling factor $$s$$ of normalization. (default, 1.)

• scale_individually (bool, optional) – If set to True, will compute the scaling step as $$\mathbf{x}^{\prime}_i = s \cdot \frac{\mathbf{x}_i^c}{{\| \mathbf{x}_i^c \|}_2}$$. (default: False)

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

forward(x: torch.Tensor, batch: Optional[torch.Tensor] = None)torch.Tensor[source]
class MessageNorm(learn_scale: bool = False)[source]

Applies message normalization over the aggregated messages as described in the “DeeperGCNs: All You Need to Train Deeper GCNs” paper

$\mathbf{x}_i^{\prime} = \mathrm{MLP} \left( \mathbf{x}_{i} + s \cdot {\| \mathbf{x}_i \|}_2 \cdot \frac{\mathbf{m}_{i}}{{\|\mathbf{m}_i\|}_2} \right)$
Parameters

learn_scale (bool, optional) – If set to True, will learn the scaling factor $$s$$ of message normalization. (default: False)

reset_parameters()[source]
forward(x: torch.Tensor, msg: torch.Tensor, p: int = 2)[source]
class DiffGroupNorm(in_channels, groups, lamda=0.01, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)[source]

The differentiable group normalization layer from the “Towards Deeper Graph Neural Networks with Differentiable Group Normalization” paper, which normalizes node features group-wise via a learnable soft cluster assignment

$\mathbf{S} = \text{softmax} (\mathbf{X} \mathbf{W})$

where $$\mathbf{W} \in \mathbb{R}^{F \times G}$$ denotes a trainable weight matrix mapping each node into one of $$G$$ clusters. Normalization is then performed group-wise via:

$\mathbf{X}^{\prime} = \mathbf{X} + \lambda \sum_{i = 1}^G \text{BatchNorm}(\mathbf{S}[:, i] \odot \mathbf{X})$
Parameters
• in_channels (int) – Size of each input sample $$F$$.

• groups (int) – The number of groups $$G$$.

• lamda (float, optional) – The balancing factor $$\lambda$$ between input embeddings and normalized embeddings. (default: 0.01)

• eps (float, optional) – A value added to the denominator for numerical stability. (default: 1e-5)

• momentum (float, optional) – The value used for the running mean and running variance computation. (default: 0.1)

• affine (bool, optional) – If set to True, this module has learnable affine parameters $$\gamma$$ and $$\beta$$. (default: True)

• track_running_stats (bool, optional) – If set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. (default: True)

reset_parameters()[source]
forward(x: torch.Tensor)torch.Tensor[source]
static group_distance_ratio(x: torch.Tensor, y: torch.Tensor, eps: float = 1e-05)float[source]

Measures the ratio of inter-group distance over intra-group distance

$R_{\text{Group}} = \frac{\frac{1}{(C-1)^2} \sum_{i!=j} \frac{1}{|\mathbf{X}_i||\mathbf{X}_j|} \sum_{\mathbf{x}_{iv} \in \mathbf{X}_i } \sum_{\mathbf{x}_{jv^{\prime}} \in \mathbf{X}_j} {\| \mathbf{x}_{iv} - \mathbf{x}_{jv^{\prime}} \|}_2 }{ \frac{1}{C} \sum_{i} \frac{1}{{|\mathbf{X}_i|}^2} \sum_{\mathbf{x}_{iv}, \mathbf{x}_{iv^{\prime}} \in \mathbf{X}_i } {\| \mathbf{x}_{iv} - \mathbf{x}_{iv^{\prime}} \|}_2 }$

where $$\mathbf{X}_i$$ denotes the set of all nodes that belong to class $$i$$, and $$C$$ denotes the total number of classes in y.

## Global Pooling Layers¶

 global_add_pool Returns batch-wise graph-level-outputs by adding node features across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by global_mean_pool Returns batch-wise graph-level-outputs by averaging node features across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by global_max_pool Returns batch-wise graph-level-outputs by taking the channel-wise maximum across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by global_sort_pool The global pooling operator from the “An End-to-End Deep Learning Architecture for Graph Classification” paper, where node features are sorted in descending order based on their last feature channel. GlobalAttention Global soft attention layer from the “Gated Graph Sequence Neural Networks” paper Set2Set The global pooling operator based on iterative content-based attention from the “Order Matters: Sequence to sequence for sets” paper
global_add_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by adding node features across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by

$\mathbf{r}_i = \sum_{n=1}^{N_i} \mathbf{x}_n$
Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (int, optional) – Batch-size $$B$$. Automatically calculated if not given. (default: None)

Return type

Tensor

global_mean_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by averaging node features across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by

$\mathbf{r}_i = \frac{1}{N_i} \sum_{n=1}^{N_i} \mathbf{x}_n$
Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (int, optional) – Batch-size $$B$$. Automatically calculated if not given. (default: None)

Return type

Tensor

global_max_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by taking the channel-wise maximum across the node dimension, so that for a single graph $$\mathcal{G}_i$$ its output is computed by

$\mathbf{r}_i = \mathrm{max}_{n=1}^{N_i} \, \mathbf{x}_n$
Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (int, optional) – Batch-size $$B$$. Automatically calculated if not given. (default: None)

Return type

Tensor

global_sort_pool(x, batch, k)[source]

The global pooling operator from the “An End-to-End Deep Learning Architecture for Graph Classification” paper, where node features are sorted in descending order based on their last feature channel. The first $$k$$ nodes form the output of the layer.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• k (int) – The number of nodes to hold for each graph.

Return type

Tensor

class GlobalAttention(gate_nn, nn=None)[source]

Global soft attention layer from the “Gated Graph Sequence Neural Networks” paper

$\mathbf{r}_i = \sum_{n=1}^{N_i} \mathrm{softmax} \left( h_{\mathrm{gate}} ( \mathbf{x}_n ) \right) \odot h_{\mathbf{\Theta}} ( \mathbf{x}_n ),$

where $$h_{\mathrm{gate}} \colon \mathbb{R}^F \to \mathbb{R}$$ and $$h_{\mathbf{\Theta}}$$ denote neural networks, i.e. MLPS.

Parameters
• gate_nn (torch.nn.Module) – A neural network $$h_{\mathrm{gate}}$$ that computes attention scores by mapping node features x of shape [-1, in_channels] to shape [-1, 1], e.g., defined by torch.nn.Sequential.

• nn (torch.nn.Module, optional) – A neural network $$h_{\mathbf{\Theta}}$$ that maps node features x of shape [-1, in_channels] to shape [-1, out_channels] before combining them with the attention scores, e.g., defined by torch.nn.Sequential. (default: None)

reset_parameters()[source]
forward(x, batch, size=None)[source]
class Set2Set(in_channels, processing_steps, num_layers=1)[source]

The global pooling operator based on iterative content-based attention from the “Order Matters: Sequence to sequence for sets” paper

\begin{align}\begin{aligned}\mathbf{q}_t &= \mathrm{LSTM}(\mathbf{q}^{*}_{t-1})\\\alpha_{i,t} &= \mathrm{softmax}(\mathbf{x}_i \cdot \mathbf{q}_t)\\\mathbf{r}_t &= \sum_{i=1}^N \alpha_{i,t} \mathbf{x}_i\\\mathbf{q}^{*}_t &= \mathbf{q}_t \, \Vert \, \mathbf{r}_t,\end{aligned}\end{align}

where $$\mathbf{q}^{*}_T$$ defines the output of the layer with twice the dimensionality as the input.

Parameters
• in_channels (int) – Size of each input sample.

• processing_steps (int) – Number of iterations $$T$$.

• num_layers (int, optional) – Number of recurrent layers, .e.g, setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. (default: 1)

reset_parameters()[source]
forward(x, batch)[source]

## Pooling Layers¶

 TopKPooling $$\mathrm{top}_k$$ pooling operator from the “Graph U-Nets”, “Towards Sparse Hierarchical Graph Classifiers” and “Understanding Attention and Generalization in Graph Neural Networks” papers SAGPooling The self-attention pooling operator from the “Self-Attention Graph Pooling” and “Understanding Attention and Generalization in Graph Neural Networks” papers EdgePooling The edge pooling operator from the “Towards Graph Pooling by Edge Contraction” and “Edge Contraction Pooling for Graph Neural Networks” papers. ASAPooling The Adaptive Structure Aware Pooling operator from the “ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations” paper. PANPooling The path integral based pooling operator from the “Path Integral Based Convolution and Pooling for Graph Neural Networks” paper. MemPooling Memory based pooling layer from “Memory-Based Graph Networks” paper, which learns a coarsened graph representation based on soft cluster assignments max_pool Pools and coarsens a graph given by the torch_geometric.data.Data object according to the clustering defined in cluster. avg_pool Pools and coarsens a graph given by the torch_geometric.data.Data object according to the clustering defined in cluster. max_pool_x Max-Pools node features according to the clustering defined in cluster. max_pool_neighbor_x Max pools neighboring node features, where each feature in data.x is replaced by the feature value with the maximum value from the central node and its neighbors. avg_pool_x Average pools node features according to the clustering defined in cluster. avg_pool_neighbor_x Average pools neighboring node features, where each feature in data.x is replaced by the average feature values from the central node and its neighbors. graclus A greedy clustering algorithm from the “Weighted Graph Cuts without Eigenvectors: A Multilevel Approach” paper of picking an unmarked vertex and matching it with one of its unmarked neighbors (that maximizes its edge weight). voxel_grid Voxel grid pooling from the, e.g., Dynamic Edge-Conditioned Filters in Convolutional Networks on Graphs paper, which overlays a regular grid of user-defined size over a point cloud and clusters all points within the same voxel. fps A sampling algorithm from the “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” paper, which iteratively samples the most distant point with regard to the rest points. knn Finds for each element in y the k nearest points in x. knn_graph Computes graph edges to the nearest k points. radius Finds for each element in y all points in x within distance r. radius_graph Computes graph edges to all points within a given distance. nearest Clusters points in x together which are nearest to a given query point in y.
class TopKPooling(in_channels: int, ratio: Union[int, float] = 0.5, min_score: Optional[float] = None, multiplier: float = 1.0, nonlinearity: Callable = <built-in method tanh of type object>)[source]

$$\mathrm{top}_k$$ pooling operator from the “Graph U-Nets”, “Towards Sparse Hierarchical Graph Classifiers” and “Understanding Attention and Generalization in Graph Neural Networks” papers

if min_score $$\tilde{\alpha}$$ is None:

\begin{align}\begin{aligned}\mathbf{y} &= \frac{\mathbf{X}\mathbf{p}}{\| \mathbf{p} \|}\\\mathbf{i} &= \mathrm{top}_k(\mathbf{y})\\\mathbf{X}^{\prime} &= (\mathbf{X} \odot \mathrm{tanh}(\mathbf{y}))_{\mathbf{i}}\\\mathbf{A}^{\prime} &= \mathbf{A}_{\mathbf{i},\mathbf{i}}\end{aligned}\end{align}

if min_score $$\tilde{\alpha}$$ is a value in [0, 1]:

\begin{align}\begin{aligned}\mathbf{y} &= \mathrm{softmax}(\mathbf{X}\mathbf{p})\\\mathbf{i} &= \mathbf{y}_i > \tilde{\alpha}\\\mathbf{X}^{\prime} &= (\mathbf{X} \odot \mathbf{y})_{\mathbf{i}}\\\mathbf{A}^{\prime} &= \mathbf{A}_{\mathbf{i},\mathbf{i}},\end{aligned}\end{align}

where nodes are dropped based on a learnable projection score $$\mathbf{p}$$.

Parameters
• in_channels (int) – Size of each input sample.

• ratio (float or int) – Graph pooling ratio, which is used to compute $$k = \lceil \mathrm{ratio} \cdot N \rceil$$, or the value of $$k$$ itself, depending on whether the type of ratio is float or int. This value is ignored if min_score is not None. (default: 0.5)

• min_score (float, optional) – Minimal node score $$\tilde{\alpha}$$ which is used to compute indices of pooled nodes $$\mathbf{i} = \mathbf{y}_i > \tilde{\alpha}$$. When this value is not None, the ratio argument is ignored. (default: None)

• multiplier (float, optional) – Coefficient by which features gets multiplied after pooling. This can be useful for large graphs and when min_score is used. (default: 1)

• nonlinearity (torch.nn.functional, optional) – The nonlinearity to use. (default: torch.tanh)

reset_parameters()[source]
forward(x, edge_index, edge_attr=None, batch=None, attn=None)[source]
class SAGPooling(in_channels: int, ratio: Union[float, int] = 0.5, GNN: Callable = <class 'torch_geometric.nn.conv.graph_conv.GraphConv'>, min_score: Optional[float] = None, multiplier: float = 1.0, nonlinearity: Callable = <built-in method tanh of type object>, **kwargs)[source]

The self-attention pooling operator from the “Self-Attention Graph Pooling” and “Understanding Attention and Generalization in Graph Neural Networks” papers

if min_score $$\tilde{\alpha}$$ is None:

\begin{align}\begin{aligned}\mathbf{y} &= \textrm{GNN}(\mathbf{X}, \mathbf{A})\\\mathbf{i} &= \mathrm{top}_k(\mathbf{y})\\\mathbf{X}^{\prime} &= (\mathbf{X} \odot \mathrm{tanh}(\mathbf{y}))_{\mathbf{i}}\\\mathbf{A}^{\prime} &= \mathbf{A}_{\mathbf{i},\mathbf{i}}\end{aligned}\end{align}

if min_score $$\tilde{\alpha}$$ is a value in [0, 1]:

\begin{align}\begin{aligned}\mathbf{y} &= \mathrm{softmax}(\textrm{GNN}(\mathbf{X},\mathbf{A}))\\\mathbf{i} &= \mathbf{y}_i > \tilde{\alpha}\\\mathbf{X}^{\prime} &= (\mathbf{X} \odot \mathbf{y})_{\mathbf{i}}\\\mathbf{A}^{\prime} &= \mathbf{A}_{\mathbf{i},\mathbf{i}},\end{aligned}\end{align}

where nodes are dropped based on a learnable projection score $$\mathbf{p}$$. Projections scores are learned based on a graph neural network layer.

Parameters
reset_parameters()[source]
forward(x, edge_index, edge_attr=None, batch=None, attn=None)[source]

The edge pooling operator from the “Towards Graph Pooling by Edge Contraction” and “Edge Contraction Pooling for Graph Neural Networks” papers.

In short, a score is computed for each edge. Edges are contracted iteratively according to that score unless one of their nodes has already been part of a contracted edge.

To duplicate the configuration from the “Towards Graph Pooling by Edge Contraction” paper, use either EdgePooling.compute_edge_score_softmax() or EdgePooling.compute_edge_score_tanh(), and set add_to_edge_score to 0.

To duplicate the configuration from the “Edge Contraction Pooling for Graph Neural Networks” paper, set dropout to 0.2.

Parameters
unpool_description

alias of torch_geometric.nn.pool.edge_pool.UnpoolDescription

reset_parameters()[source]
static compute_edge_score_softmax(raw_edge_score, edge_index, num_nodes)[source]
static compute_edge_score_tanh(raw_edge_score, edge_index, num_nodes)[source]
static compute_edge_score_sigmoid(raw_edge_score, edge_index, num_nodes)[source]
forward(x, edge_index, batch)[source]

Forward computation which computes the raw edge score, normalizes it, and merges the edges.

Parameters
• x (Tensor) – The node features.

• edge_index (LongTensor) – The edge indices.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

Return types:
• x (Tensor) - The pooled node features.

• edge_index (LongTensor) - The coarsened edge indices.

• batch (LongTensor) - The coarsened batch vector.

• unpool_info (unpool_description) - Information that is consumed by EdgePooling.unpool() for unpooling.

unpool(x, unpool_info)[source]

Unpools a previous edge pooling step.

For unpooling, x should be of same shape as those produced by this layer’s forward() function. Then, it will produce an unpooled x in addition to edge_index and batch.

Parameters
Return types:
• x (Tensor) - The unpooled node features.

• edge_index (LongTensor) - The new edge indices.

• batch (LongTensor) - The new batch vector.

class ASAPooling(in_channels: int, ratio: Union[float, int] = 0.5, GNN: Optional[Callable] = None, dropout: float = 0.0, negative_slope: float = 0.2, add_self_loops: bool = False, **kwargs)[source]

The Adaptive Structure Aware Pooling operator from the “ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations” paper.

Parameters
• in_channels (int) – Size of each input sample.

• ratio (float or int) – Graph pooling ratio, which is used to compute $$k = \lceil \mathrm{ratio} \cdot N \rceil$$, or the value of $$k$$ itself, depending on whether the type of ratio is float or int. (default: 0.5)

• GNN (torch.nn.Module, optional) – A graph neural network layer for using intra-cluster properties. Especially helpful for graphs with higher degree of neighborhood (one of torch_geometric.nn.conv.GraphConv, torch_geometric.nn.conv.GCNConv or any GNN which supports the edge_weight parameter). (default: None)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• add_self_loops (bool, optional) – If set to True, will add self loops to the new graph connectivity. (default: False)

• **kwargs (optional) – Additional parameters for initializing the graph neural network layer.

Returns

A tuple of tensors containing

• x (Tensor): The pooled node embeddings.

• edge_index (Tensor): The coarsened graph connectivity.

• edge_weight (Tensor): The edge weights corresponding to the coarsened graph connectivity.

• batch (Tensor): The pooled batch vector.

• perm (Tensor): The top-$$k$$ node indices of nodes which are kept after pooling.

reset_parameters()[source]
forward(x, edge_index, edge_weight=None, batch=None)[source]
class PANPooling(in_channels, ratio=0.5, min_score=None, multiplier=1.0, nonlinearity=<built-in method tanh of type object>)[source]

The path integral based pooling operator from the “Path Integral Based Convolution and Pooling for Graph Neural Networks” paper. PAN pooling performs top-$$k$$ pooling where global node importance is measured based on node features and the MET matrix:

${\rm score} = \beta_1 \mathbf{X} \cdot \mathbf{p} + \beta_2 {\rm deg}(M)$
Parameters
• in_channels (int) – Size of each input sample.

• ratio (float) – Graph pooling ratio, which is used to compute $$k = \lceil \mathrm{ratio} \cdot N \rceil$$. This value is ignored if min_score is not None. (default: 0.5)

• min_score (float, optional) – Minimal node score $$\tilde{\alpha}$$ which is used to compute indices of pooled nodes $$\mathbf{i} = \mathbf{y}_i > \tilde{\alpha}$$. When this value is not None, the ratio argument is ignored. (default: None)

• multiplier (float, optional) – Coefficient by which features gets multiplied after pooling. This can be useful for large graphs and when min_score is used. (default: 1.0)

• nonlinearity (torch.nn.functional, optional) – The nonlinearity to use. (default: torch.tanh)

reset_parameters()[source]
forward(x: torch.Tensor, M: torch_sparse.tensor.SparseTensor, batch: Optional[torch.Tensor] = None)[source]
class MemPooling(in_channels: int, out_channels: int, heads: int, num_clusters: int, tau: float = 1.0)[source]

Memory based pooling layer from “Memory-Based Graph Networks” paper, which learns a coarsened graph representation based on soft cluster assignments

\begin{align}\begin{aligned}S_{i,j}^{(h)} &= \frac{ (1+{\| \mathbf{x}_i-\mathbf{k}^{(h)}_j \|}^2 / \tau)^{ -\frac{1+\tau}{2}}}{ \sum_{k=1}^K (1 + {\| \mathbf{x}_i-\mathbf{k}^{(h)}_k \|}^2 / \tau)^{ -\frac{1+\tau}{2}}}\\\mathbf{S} &= \textrm{softmax}(\textrm{Conv2d} (\Vert_{h=1}^H \mathbf{S}^{(h)})) \in \mathbb{R}^{N \times K}\\\mathbf{X}^{\prime} &= \mathbf{S}^{\top} \mathbf{X} \mathbf{W} \in \mathbb{R}^{K \times F^{\prime}}\end{aligned}\end{align}

Where $$H$$ denotes the number of heads, and $$K$$ denotes the number of clusters.

Parameters
• in_channels (int) – Size of each input sample $$F$$.

• out_channels (int) – Size of each output sample $$F^{\prime}$$.

• heads (int) – The number of heads $$H$$.

• num_clusters (int) – number of clusters $$K$$ per head.

• tau (int, optional) – The temperature $$\tau$$. (default: 1.)

reset_parameters()[source]
static kl_loss(S: torch.Tensor)torch.Tensor[source]

\begin{align}\begin{aligned}P_{i,j} &= \frac{S_{i,j}^2 / \sum_{n=1}^N S_{n,j}}{\sum_{k=1}^K S_{i,k}^2 / \sum_{n=1}^N S_{n,k}}\\\mathcal{L}_{\textrm{KL}} &= \textrm{KLDiv}(\mathbf{P} \Vert \mathbf{S})\end{aligned}\end{align}
forward(x: torch.Tensor, batch: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None)Tuple[torch.Tensor, torch.Tensor][source]
Parameters
• x (Tensor) – Dense or sparse node feature tensor $$\mathbf{X} \in \mathbb{R}^{N \times F}$$ or $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$, respectively.

• batch (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. This argument should be just to separate graphs when using sparse node features. (default: None)

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$, which indicates valid nodes for each graph when using dense node features. (default: None)

max_pool(cluster, data, transform=None)[source]

Pools and coarsens a graph given by the torch_geometric.data.Data object according to the clustering defined in cluster. All nodes within the same cluster will be represented as one node. Final node features are defined by the maximum features of all nodes within the same cluster, node positions are averaged and edge indices are defined to be the union of the edge indices of all nodes within the same cluster.

Parameters
• cluster (LongTensor) – Cluster vector $$\mathbf{c} \in \{ 0, \ldots, N - 1 \}^N$$, which assigns each node to a specific cluster.

• data (Data) – Graph data object.

• transform (callable, optional) – A function/transform that takes in the coarsened and pooled torch_geometric.data.Data object and returns a transformed version. (default: None)

Return type

torch_geometric.data.Data

avg_pool(cluster, data, transform=None)[source]

Pools and coarsens a graph given by the torch_geometric.data.Data object according to the clustering defined in cluster. Final node features are defined by the average features of all nodes within the same cluster. See torch_geometric.nn.pool.max_pool() for more details.

Parameters
• cluster (LongTensor) – Cluster vector $$\mathbf{c} \in \{ 0, \ldots, N - 1 \}^N$$, which assigns each node to a specific cluster.

• data (Data) – Graph data object.

• transform (callable, optional) – A function/transform that takes in the coarsened and pooled torch_geometric.data.Data object and returns a transformed version. (default: None)

Return type

torch_geometric.data.Data

max_pool_x(cluster, x, batch, size: Optional[int] = None)[source]

Max-Pools node features according to the clustering defined in cluster.

Parameters
• cluster (LongTensor) – Cluster vector $$\mathbf{c} \in \{ 0, \ldots, N - 1 \}^N$$, which assigns each node to a specific cluster.

• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (int, optional) – The maximum number of clusters in a single example. This property is useful to obtain a batch-wise dense representation, e.g. for applying FC layers, but should only be used if the size of the maximum number of clusters per example is known in advance. (default: None)

Return type

(Tensor, LongTensor) if size is None, else Tensor

max_pool_neighbor_x(data, flow='source_to_target')[source]

Max pools neighboring node features, where each feature in data.x is replaced by the feature value with the maximum value from the central node and its neighbors.

avg_pool_x(cluster, x, batch, size: Optional[int] = None)[source]

Average pools node features according to the clustering defined in cluster. See torch_geometric.nn.pool.max_pool_x() for more details.

Parameters
• cluster (LongTensor) – Cluster vector $$\mathbf{c} \in \{ 0, \ldots, N - 1 \}^N$$, which assigns each node to a specific cluster.

• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (int, optional) – The maximum number of clusters in a single example. (default: None)

Return type

(Tensor, LongTensor) if size is None, else Tensor

avg_pool_neighbor_x(data, flow='source_to_target')[source]

Average pools neighboring node features, where each feature in data.x is replaced by the average feature values from the central node and its neighbors.

graclus(edge_index, weight: Optional[torch.Tensor] = None, num_nodes: Optional[int] = None)[source]

A greedy clustering algorithm from the “Weighted Graph Cuts without Eigenvectors: A Multilevel Approach” paper of picking an unmarked vertex and matching it with one of its unmarked neighbors (that maximizes its edge weight). The GPU algoithm is adapted from the “A GPU Algorithm for Greedy Graph Matching” paper.

Parameters
• edge_index (LongTensor) – The edge indices.

• weight (Tensor, optional) – One-dimensional edge weights. (default: None)

• num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of edge_index. (default: None)

Return type

LongTensor

voxel_grid(pos, batch, size, start=None, end=None)[source]

Voxel grid pooling from the, e.g., Dynamic Edge-Conditioned Filters in Convolutional Networks on Graphs paper, which overlays a regular grid of user-defined size over a point cloud and clusters all points within the same voxel.

Parameters
• pos (Tensor) – Node position matrix $$\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times D}$$.

• batch (LongTensor) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example.

• size (float or [float] or Tensor) – Size of a voxel (in each dimension).

• start (float or [float] or Tensor, optional) – Start coordinates of the grid (in each dimension). If set to None, will be set to the minimum coordinates found in pos. (default: None)

• end (float or [float] or Tensor, optional) – End coordinates of the grid (in each dimension). If set to None, will be set to the maximum coordinates found in pos. (default: None)

Return type

LongTensor

fps(x, batch=None, ratio=0.5, random_start=True)[source]

A sampling algorithm from the “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” paper, which iteratively samples the most distant point with regard to the rest points.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• batch (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• ratio (float, optional) – Sampling ratio. (default: 0.5)

• random_start (bool, optional) – If set to False, use the first node in $$\mathbf{X}$$ as starting node. (default: obj:True)

Return type

LongTensor

import torch
from torch_geometric.nn import fps

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch = torch.tensor([0, 0, 0, 0])
index = fps(x, batch, ratio=0.5)

knn(x, y, k, batch_x=None, batch_y=None, cosine=False, num_workers=1)[source]

Finds for each element in y the k nearest points in x.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• y (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{M \times F}$$.

• k (int) – The number of neighbors.

• batch_x (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• batch_y (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^M$$, which assigns each node to a specific example. (default: None)

• cosine (boolean, optional) – If True, will use the cosine distance instead of euclidean distance to find nearest neighbors. (default: False)

• num_workers (int) – Number of workers to use for computation. Has no effect in case batch_x or batch_y is not None, or the input lies on the GPU. (default: 1)

Return type

LongTensor

import torch
from torch_geometric.nn import knn

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch_x = torch.tensor([0, 0, 0, 0])
y = torch.Tensor([[-1, 0], [1, 0]])
batch_y = torch.tensor([0, 0])
assign_index = knn(x, y, 2, batch_x, batch_y)

knn_graph(x, k, batch=None, loop=False, flow='source_to_target', cosine=False, num_workers=1)[source]

Computes graph edges to the nearest k points.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• k (int) – The number of neighbors.

• batch (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• loop (bool, optional) – If True, the graph will contain self-loops. (default: False)

• flow (string, optional) – The flow direction when using in combination with message passing ("source_to_target" or "target_to_source"). (default: "source_to_target")

• cosine (boolean, optional) – If True, will use the cosine distance instead of euclidean distance to find nearest neighbors. (default: False)

• num_workers (int) – Number of workers to use for computation. Has no effect in case batch is not None, or the input lies on the GPU. (default: 1)

Return type

LongTensor

import torch
from torch_geometric.nn import knn_graph

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch = torch.tensor([0, 0, 0, 0])
edge_index = knn_graph(x, k=2, batch=batch, loop=False)

radius(x, y, r, batch_x=None, batch_y=None, max_num_neighbors=32, num_workers=1)[source]

Finds for each element in y all points in x within distance r.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• y (Tensor) – Node feature matrix $$\mathbf{Y} \in \mathbb{R}^{M \times F}$$.

• r (float) – The radius.

• batch_x (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• batch_y (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^M$$, which assigns each node to a specific example. (default: None)

• max_num_neighbors (int, optional) – The maximum number of neighbors to return for each element in y. (default: 32)

• num_workers (int) – Number of workers to use for computation. Has no effect in case batch_x or batch_y is not None, or the input lies on the GPU. (default: 1)

Return type

LongTensor

import torch

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch_x = torch.tensor([0, 0, 0, 0])
y = torch.Tensor([[-1, 0], [1, 0]])
batch_y = torch.tensor([0, 0])
assign_index = radius(x, y, 1.5, batch_x, batch_y)

radius_graph(x, r, batch=None, loop=False, max_num_neighbors=32, flow='source_to_target', num_workers=1)[source]

Computes graph edges to all points within a given distance.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• r (float) – The radius.

• batch (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• loop (bool, optional) – If True, the graph will contain self-loops. (default: False)

• max_num_neighbors (int, optional) – The maximum number of neighbors to return for each element in y. (default: 32)

• flow (string, optional) – The flow direction when using in combination with message passing ("source_to_target" or "target_to_source"). (default: "source_to_target")

• num_workers (int) – Number of workers to use for computation. Has no effect in case batch is not None, or the input lies on the GPU. (default: 1)

Return type

LongTensor

import torch

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch = torch.tensor([0, 0, 0, 0])
edge_index = radius_graph(x, r=1.5, batch=batch, loop=False)

nearest(x, y, batch_x=None, batch_y=None)[source]

Clusters points in x together which are nearest to a given query point in y.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• y (Tensor) – Node feature matrix $$\mathbf{Y} \in \mathbb{R}^{M \times F}$$.

• batch_x (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node to a specific example. (default: None)

• batch_y (LongTensor, optional) – Batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^M$$, which assigns each node to a specific example. (default: None)

Return type

LongTensor

import torch
from torch_geometric.nn import nearest

x = torch.Tensor([[-1, -1], [-1, 1], [1, -1], [1, 1]])
batch_x = torch.tensor([0, 0, 0, 0])
y = torch.Tensor([[-1, 0], [1, 0]])
batch_y = torch.tensor([0, 0])
cluster = nearest(x, y, batch_x, batch_y)


## Dense Pooling Layers¶

 DenseSAGEConv dense_diff_pool Differentiable pooling operator from the “Hierarchical Graph Representation Learning with Differentiable Pooling” paper dense_mincut_pool MinCUt pooling operator from the “Mincut Pooling in Graph Neural Networks” paper

Differentiable pooling operator from the “Hierarchical Graph Representation Learning with Differentiable Pooling” paper

\begin{align}\begin{aligned}\mathbf{X}^{\prime} &= {\mathrm{softmax}(\mathbf{S})}^{\top} \cdot \mathbf{X}\\\mathbf{A}^{\prime} &= {\mathrm{softmax}(\mathbf{S})}^{\top} \cdot \mathbf{A} \cdot \mathrm{softmax}(\mathbf{S})\end{aligned}\end{align}

based on dense learned assignments $$\mathbf{S} \in \mathbb{R}^{B \times N \times C}$$. Returns pooled node feature matrix, coarsened adjacency matrix and two auxiliary objectives: (1) The link prediction loss

$\mathcal{L}_{LP} = {\| \mathbf{A} - \mathrm{softmax}(\mathbf{S}) {\mathrm{softmax}(\mathbf{S})}^{\top} \|}_F,$

and the entropy regularization

$\mathcal{L}_E = \frac{1}{N} \sum_{n=1}^N H(\mathbf{S}_n).$
Parameters
• x (Tensor) – Node feature tensor $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$ with batch-size $$B$$, (maximum) number of nodes $$N$$ for each graph, and feature dimension $$F$$.

• adj (Tensor) – Adjacency tensor $$\mathbf{A} \in \mathbb{R}^{B \times N \times N}$$.

• s (Tensor) – Assignment tensor $$\mathbf{S} \in \mathbb{R}^{B \times N \times C}$$ with number of clusters $$C$$. The softmax does not have to be applied beforehand, since it is executed within this method.

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$ indicating the valid nodes for each graph. (default: None)

Return type

(Tensor, Tensor, Tensor, Tensor)

MinCUt pooling operator from the “Mincut Pooling in Graph Neural Networks” paper

\begin{align}\begin{aligned}\mathbf{X}^{\prime} &= {\mathrm{softmax}(\mathbf{S})}^{\top} \cdot \mathbf{X}\\\mathbf{A}^{\prime} &= {\mathrm{softmax}(\mathbf{S})}^{\top} \cdot \mathbf{A} \cdot \mathrm{softmax}(\mathbf{S})\end{aligned}\end{align}

based on dense learned assignments $$\mathbf{S} \in \mathbb{R}^{B \times N \times C}$$. Returns pooled node feature matrix, coarsened symmetrically normalized adjacency matrix and two auxiliary objectives: (1) The minCUT loss

$\mathcal{L}_c = - \frac{\mathrm{Tr}(\mathbf{S}^{\top} \mathbf{A} \mathbf{S})} {\mathrm{Tr}(\mathbf{S}^{\top} \mathbf{D} \mathbf{S})}$

where $$\mathbf{D}$$ is the degree matrix, and (2) the orthogonality loss

$\mathcal{L}_o = {\left\| \frac{\mathbf{S}^{\top} \mathbf{S}} {{\|\mathbf{S}^{\top} \mathbf{S}\|}_F} -\frac{\mathbf{I}_C}{\sqrt{C}} \right\|}_F.$
Parameters
• x (Tensor) – Node feature tensor $$\mathbf{X} \in \mathbb{R}^{B \times N \times F}$$ with batch-size $$B$$, (maximum) number of nodes $$N$$ for each graph, and feature dimension $$F$$.

• adj (Tensor) – Symmetrically normalized adjacency tensor $$\mathbf{A} \in \mathbb{R}^{B \times N \times N}$$.

• s (Tensor) – Assignment tensor $$\mathbf{S} \in \mathbb{R}^{B \times N \times C}$$ with number of clusters $$C$$. The softmax does not have to be applied beforehand, since it is executed within this method.

• mask (BoolTensor, optional) – Mask matrix $$\mathbf{M} \in {\{ 0, 1 \}}^{B \times N}$$ indicating the valid nodes for each graph. (default: None)

Return type

(Tensor, Tensor, Tensor, Tensor)

## Unpooling Layers¶

 knn_interpolate The k-NN interpolation from the “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” paper.
knn_interpolate(x, pos_x, pos_y, batch_x=None, batch_y=None, k=3, num_workers=1)[source]

The k-NN interpolation from the “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” paper. For each point $$y$$ with position $$\mathbf{p}(y)$$, its interpolated features $$\mathbf{f}(y)$$ are given by

$\mathbf{f}(y) = \frac{\sum_{i=1}^k w(x_i) \mathbf{f}(x_i)}{\sum_{i=1}^k w(x_i)} \textrm{, where } w(x_i) = \frac{1}{d(\mathbf{p}(y), \mathbf{p}(x_i))^2}$

and $$\{ x_1, \ldots, x_k \}$$ denoting the $$k$$ nearest points to $$y$$.

Parameters
• x (Tensor) – Node feature matrix $$\mathbf{X} \in \mathbb{R}^{N \times F}$$.

• pos_x (Tensor) – Node position matrix $$\in \mathbb{R}^{N \times d}$$.

• pos_y (Tensor) – Upsampled node position matrix $$\in \mathbb{R}^{M \times d}$$.

• batch_x (LongTensor, optional) – Batch vector $$\mathbf{b_x} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node from $$\mathbf{X}$$ to a specific example. (default: None)

• batch_y (LongTensor, optional) – Batch vector $$\mathbf{b_y} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each node from $$\mathbf{Y}$$ to a specific example. (default: None)

• k (int, optional) – Number of neighbors. (default: 3)

• num_workers (int) – Number of workers to use for computation. Has no effect in case batch_x or batch_y is not None, or the input lies on the GPU. (default: 1)

## Models¶

 GCN The Graph Neural Network from the “Semi-supervised Classification with Graph Convolutional Networks” paper, using the GCNConv operator for message passing. GraphSAGE The Graph Neural Network from the “Inductive Representation Learning on Large Graphs” paper, using the SAGEConv operator for message passing. GIN The Graph Neural Network from the “How Powerful are Graph Neural Networks?” paper, using the GINConv operator for message passing. GAT The Graph Neural Network from the “Graph Attention Networks” paper, using the GATConv operator for message passing. JumpingKnowledge The Jumping Knowledge layer aggregation module from the “Representation Learning on Graphs with Jumping Knowledge Networks” paper based on either concatenation ("cat") Node2Vec The Node2Vec model from the “node2vec: Scalable Feature Learning for Networks” paper where random walks of length walk_length are sampled in a given graph, and node embeddings are learned via negative sampling optimization. DeepGraphInfomax The Deep Graph Infomax model from the “Deep Graph Infomax” paper based on user-defined encoder and summary model $$\mathcal{E}$$ and $$\mathcal{R}$$ respectively, and a corruption function $$\mathcal{C}$$. InnerProductDecoder The inner product decoder from the “Variational Graph Auto-Encoders” paper GAE The Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper based on user-defined encoder and decoder models. VGAE The Variational Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper. ARGA The Adversarially Regularized Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper. ARGVA The Adversarially Regularized Variational Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper. SignedGCN The signed graph convolutional network model from the “Signed Graph Convolutional Network” paper. RENet The Recurrent Event Network model from the “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper GraphUNet The Graph U-Net model from the “Graph U-Nets” paper which implements a U-Net like architecture with graph pooling and unpooling operations. SchNet The continuous-filter convolutional neural network SchNet from the “SchNet: A Continuous-filter Convolutional Neural Network for Modeling Quantum Interactions” paper that uses the interactions blocks of the form DimeNet The directional message passing neural network (DimeNet) from the “Directional Message Passing for Molecular Graphs” paper. GNNExplainer The GNN-Explainer model from the “GNNExplainer: Generating Explanations for Graph Neural Networks” paper for identifying compact subgraph structures and small subsets node features that play a crucial role in a GNN’s node-predictions. MetaPath2Vec The MetaPath2Vec model from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper where random walks based on a given metapath are sampled in a heterogeneous graph, and node embeddings are learned via negative sampling optimization. DeepGCNLayer The skip connection operations from the “DeepGCNs: Can GCNs Go as Deep as CNNs?” and “All You Need to Train Deeper GCNs” papers. TGNMemory The Temporal Graph Network (TGN) memory model from the “Temporal Graph Networks for Deep Learning on Dynamic Graphs” paper. LabelPropagation The label propagation operator from the “Learning from Labeled and Unlabeled Datawith Label Propagation” paper CorrectAndSmooth The correct and smooth (C&S) post-processing model from the “Combining Label Propagation And Simple Models Out-performs Graph Neural Networks” paper, where soft predictions $$\mathbf{Z}$$ (obtained from a simple base predictor) are first corrected based on ground-truth training label information $$\mathbf{Y}$$ and residual propagation AttentiveFP The Attentive FP model for molecular representation learning from the “Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism” paper, based on graph attention mechanisms. RECT_L The RECT model, i.e. its supervised RECT-L part, from the “Network Embedding with Completely-imbalanced Labels” paper.
class GCN(in_channels: int, hidden_channels: int, num_layers: int, out_channels: Optional[int] = None, dropout: float = 0.0, act: Optional[Callable] = ReLU(inplace=True), norm: Optional[torch.nn.modules.module.Module] = None, jk: str = 'last', **kwargs)[source]

The Graph Neural Network from the “Semi-supervised Classification with Graph Convolutional Networks” paper, using the GCNConv operator for message passing.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• num_layers (int) – Number of message passing layers.

• out_channels (int, optional) – If not set to None, will apply a final linear transformation to convert hidden node embeddings to output size out_channels. (default: None)

• dropout (float, optional) – Dropout probability. (default: 0.)

• act (Callable, optional) – The non-linear activation function to use. (default: torch.nn.ReLU(inplace=True)())

• norm (torch.nn.Module, optional) – The normalization operator to use. (default: None)

• jk (str, optional) – The Jumping Knowledge mode ("last", "cat", "max", "last"). (default: "last")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.GCNConv.

class GraphSAGE(in_channels: int, hidden_channels: int, num_layers: int, out_channels: Optional[int] = None, dropout: float = 0.0, act: Optional[Callable] = ReLU(inplace=True), norm: Optional[torch.nn.modules.module.Module] = None, jk: str = 'last', **kwargs)[source]

The Graph Neural Network from the “Inductive Representation Learning on Large Graphs” paper, using the SAGEConv operator for message passing.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• num_layers (int) – Number of message passing layers.

• out_channels (int, optional) – If not set to None, will apply a final linear transformation to convert hidden node embeddings to output size out_channels. (default: None)

• dropout (float, optional) – Dropout probability. (default: 0.)

• act (Callable, optional) – The non-linear activation function to use. (default: torch.nn.ReLU(inplace=True)())

• norm (torch.nn.Module, optional) – The normalization operator to use. (default: None)

• jk (str, optional) – The Jumping Knowledge mode ("last", "cat", "max", "last"). (default: "last")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.SAGEConv.

class GIN(in_channels: int, hidden_channels: int, num_layers: int, out_channels: Optional[int] = None, dropout: float = 0.0, act: Optional[Callable] = ReLU(inplace=True), norm: Optional[torch.nn.modules.module.Module] = None, jk: str = 'last', **kwargs)[source]

The Graph Neural Network from the “How Powerful are Graph Neural Networks?” paper, using the GINConv operator for message passing.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• num_layers (int) – Number of message passing layers.

• out_channels (int, optional) – If not set to None, will apply a final linear transformation to convert hidden node embeddings to output size out_channels. (default: None)

• dropout (float, optional) – Dropout probability. (default: 0.)

• act (Callable, optional) – The non-linear activation function to use. (default: torch.nn.ReLU(inplace=True)())

• norm (torch.nn.Module, optional) – The normalization operator to use. (default: None)

• jk (str, optional) – The Jumping Knowledge mode ("last", "cat", "max", "last"). (default: "last")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.GINConv.

static MLP(in_channels: int, out_channels: int)torch.nn.modules.module.Module[source]
class GAT(in_channels: int, hidden_channels: int, num_layers: int, out_channels: Optional[int] = None, dropout: float = 0.0, act: Optional[Callable] = ReLU(inplace=True), norm: Optional[torch.nn.modules.module.Module] = None, jk: str = 'last', **kwargs)[source]

The Graph Neural Network from the “Graph Attention Networks” paper, using the GATConv operator for message passing.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• num_layers (int) – Number of message passing layers.

• out_channels (int, optional) – If not set to None, will apply a final linear transformation to convert hidden node embeddings to output size out_channels. (default: None)

• dropout (float, optional) – Dropout probability. (default: 0.)

• act (Callable, optional) – The non-linear activation function to use. (default: torch.nn.ReLU(inplace=True)())

• norm (torch.nn.Module, optional) – The normalization operator to use. (default: None)

• jk (str, optional) – The Jumping Knowledge mode ("last", "cat", "max", "last"). (default: "last")

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.GATConv.

class JumpingKnowledge(mode, channels=None, num_layers=None)[source]

The Jumping Knowledge layer aggregation module from the “Representation Learning on Graphs with Jumping Knowledge Networks” paper based on either concatenation ("cat")

$\mathbf{x}_v^{(1)} \, \Vert \, \ldots \, \Vert \, \mathbf{x}_v^{(T)}$

max pooling ("max")

$\max \left( \mathbf{x}_v^{(1)}, \ldots, \mathbf{x}_v^{(T)} \right)$

or weighted summation

$\sum_{t=1}^T \alpha_v^{(t)} \mathbf{x}_v^{(t)}$

with attention scores $$\alpha_v^{(t)}$$ obtained from a bi-directional LSTM ("lstm").

Parameters
• mode (string) – The aggregation scheme to use ("cat", "max" or "lstm").

• channels (int, optional) – The number of channels per representation. Needs to be only set for LSTM-style aggregation. (default: None)

• num_layers (int, optional) – The number of layers to aggregate. Needs to be only set for LSTM-style aggregation. (default: None)

reset_parameters()[source]
forward(xs)[source]

Aggregates representations across different layers.

Parameters

xs (list or tuple) – List containing layer-wise representations.

class Node2Vec(edge_index, embedding_dim, walk_length, context_size, walks_per_node=1, p=1, q=1, num_negative_samples=1, num_nodes=None, sparse=False)[source]

The Node2Vec model from the “node2vec: Scalable Feature Learning for Networks” paper where random walks of length walk_length are sampled in a given graph, and node embeddings are learned via negative sampling optimization.

Note

For an example of using Node2Vec, see examples/node2vec.py.

Parameters
• edge_index (LongTensor) – The edge indices.

• embedding_dim (int) – The size of each embedding vector.

• walk_length (int) – The walk length.

• context_size (int) – The actual context size which is considered for positive samples. This parameter increases the effective sampling rate by reusing samples across different source nodes.

• walks_per_node (int, optional) – The number of walks to sample for each node. (default: 1)

• p (float, optional) – Likelihood of immediately revisiting a node in the walk. (default: 1)

• q (float, optional) – Control parameter to interpolate between breadth-first strategy and depth-first strategy (default: 1)

• num_negative_samples (int, optional) – The number of negative samples to use for each positive sample. (default: 1)

• num_nodes (int, optional) – The number of nodes. (default: None)

• sparse (bool, optional) – If set to True, gradients w.r.t. to the weight matrix will be sparse. (default: False)

reset_parameters()[source]
forward(batch=None)[source]

Returns the embeddings for the nodes in batch.

pos_sample(batch)[source]
neg_sample(batch)[source]
sample(batch)[source]
loss(pos_rw, neg_rw)[source]

Computes the loss given positive and negative random walks.

test(train_z, train_y, test_z, test_y, solver='lbfgs', multi_class='auto', *args, **kwargs)[source]

Evaluates latent space quality via a logistic regression downstream task.

class DeepGraphInfomax(hidden_channels, encoder, summary, corruption)[source]

The Deep Graph Infomax model from the “Deep Graph Infomax” paper based on user-defined encoder and summary model $$\mathcal{E}$$ and $$\mathcal{R}$$ respectively, and a corruption function $$\mathcal{C}$$.

Parameters
• hidden_channels (int) – The latent space dimensionality.

• encoder (Module) – The encoder module $$\mathcal{E}$$.

• summary (callable) – The readout function $$\mathcal{R}$$.

• corruption (callable) – The corruption function $$\mathcal{C}$$.

reset_parameters()[source]
forward(*args, **kwargs)[source]

Returns the latent space for the input arguments, their corruptions and their summary representation.

discriminate(z, summary, sigmoid=True)[source]

Given the patch-summary pair z and summary, computes the probability scores assigned to this patch-summary pair.

Parameters
loss(pos_z, neg_z, summary)[source]

Computes the mutual information maximization objective.

test(train_z, train_y, test_z, test_y, solver='lbfgs', multi_class='auto', *args, **kwargs)[source]

Evaluates latent space quality via a logistic regression downstream task.

class InnerProductDecoder[source]

The inner product decoder from the “Variational Graph Auto-Encoders” paper

$\sigma(\mathbf{Z}\mathbf{Z}^{\top})$

where $$\mathbf{Z} \in \mathbb{R}^{N \times d}$$ denotes the latent space produced by the encoder.

forward(z, edge_index, sigmoid=True)[source]

Decodes the latent variables z into edge probabilities for the given node-pairs edge_index.

Parameters
forward_all(z, sigmoid=True)[source]

Decodes the latent variables z into a probabilistic dense adjacency matrix.

Parameters
class GAE(encoder, decoder=None)[source]

The Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper based on user-defined encoder and decoder models.

Parameters
reset_parameters()[source]
encode(*args, **kwargs)[source]

Runs the encoder and computes node-wise latent variables.

decode(*args, **kwargs)[source]

Runs the decoder and computes edge probabilities.

recon_loss(z, pos_edge_index, neg_edge_index=None)[source]

Given latent variables z, computes the binary cross entropy loss for positive edges pos_edge_index and negative sampled edges.

Parameters
• z (Tensor) – The latent space $$\mathbf{Z}$$.

• pos_edge_index (LongTensor) – The positive edges to train against.

• neg_edge_index (LongTensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

test(z, pos_edge_index, neg_edge_index)[source]

Given latent variables z, positive edges pos_edge_index and negative edges neg_edge_index, computes area under the ROC curve (AUC) and average precision (AP) scores.

Parameters
• z (Tensor) – The latent space $$\mathbf{Z}$$.

• pos_edge_index (LongTensor) – The positive edges to evaluate against.

• neg_edge_index (LongTensor) – The negative edges to evaluate against.

class VGAE(encoder, decoder=None)[source]

The Variational Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper.

Parameters
reparametrize(mu, logstd)[source]
encode(*args, **kwargs)[source]
kl_loss(mu=None, logstd=None)[source]

Computes the KL loss, either for the passed arguments mu and logstd, or based on latent variables from last encoding.

Parameters
class ARGA(encoder, discriminator, decoder=None)[source]

The Adversarially Regularized Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper. paper.

Parameters
reset_parameters()[source]
reg_loss(z)[source]

Computes the regularization loss of the encoder.

Parameters

z (Tensor) – The latent space $$\mathbf{Z}$$.

discriminator_loss(z)[source]

Computes the loss of the discriminator.

Parameters

z (Tensor) – The latent space $$\mathbf{Z}$$.

class ARGVA(encoder, discriminator, decoder=None)[source]

The Adversarially Regularized Variational Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper. paper.

Parameters
reparametrize(mu, logstd)[source]
encode(*args, **kwargs)[source]
kl_loss(mu=None, logstd=None)[source]
class SignedGCN(in_channels, hidden_channels, num_layers, lamb=5, bias=True)[source]

The signed graph convolutional network model from the “Signed Graph Convolutional Network” paper. Internally, this module uses the torch_geometric.nn.conv.SignedConv operator.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• num_layers (int) – Number of layers.

• lamb (float, optional) – Balances the contributions of the overall objective. (default: 5)

• bias (bool, optional) – If set to False, all layers will not learn an additive bias. (default: True)

reset_parameters()[source]
split_edges(edge_index, test_ratio=0.2)[source]

Splits the edges edge_index into train and test edges.

Parameters
• edge_index (LongTensor) – The edge indices.

• test_ratio (float, optional) – The ratio of test edges. (default: 0.2)

create_spectral_features(pos_edge_index, neg_edge_index, num_nodes=None)[source]

Creates in_channels spectral node features based on positive and negative edges.

Parameters
• pos_edge_index (LongTensor) – The positive edge indices.

• neg_edge_index (LongTensor) – The negative edge indices.

• num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of pos_edge_index and neg_edge_index. (default: None)

forward(x, pos_edge_index, neg_edge_index)[source]

Computes node embeddings z based on positive edges pos_edge_index and negative edges neg_edge_index.

Parameters
• x (Tensor) – The input node features.

• pos_edge_index (LongTensor) – The positive edge indices.

• neg_edge_index (LongTensor) – The negative edge indices.

discriminate(z, edge_index)[source]

Given node embeddings z, classifies the link relation between node pairs edge_index to be either positive, negative or non-existent.

Parameters
• x (Tensor) – The input node features.

• edge_index (LongTensor) – The edge indices.

nll_loss(z, pos_edge_index, neg_edge_index)[source]

Computes the discriminator loss based on node embeddings z, and positive edges pos_edge_index and negative nedges neg_edge_index.

Parameters
• z (Tensor) – The node embeddings.

• pos_edge_index (LongTensor) – The positive edge indices.

• neg_edge_index (LongTensor) – The negative edge indices.

pos_embedding_loss(z, pos_edge_index)[source]

Computes the triplet loss between positive node pairs and sampled non-node pairs.

Parameters
• z (Tensor) – The node embeddings.

• pos_edge_index (LongTensor) – The positive edge indices.

neg_embedding_loss(z, neg_edge_index)[source]

Computes the triplet loss between negative node pairs and sampled non-node pairs.

Parameters
• z (Tensor) – The node embeddings.

• neg_edge_index (LongTensor) – The negative edge indices.

loss(z, pos_edge_index, neg_edge_index)[source]

Computes the overall objective.

Parameters
• z (Tensor) – The node embeddings.

• pos_edge_index (LongTensor) – The positive edge indices.

• neg_edge_index (LongTensor) – The negative edge indices.

test(z, pos_edge_index, neg_edge_index)[source]

Evaluates node embeddings z on positive and negative test edges by computing AUC and F1 scores.

Parameters
• z (Tensor) – The node embeddings.

• pos_edge_index (LongTensor) – The positive edge indices.

• neg_edge_index (LongTensor) – The negative edge indices.

class RENet(num_nodes, num_rels, hidden_channels, seq_len, num_layers=1, dropout=0.0, bias=True)[source]

The Recurrent Event Network model from the “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper

$f_{\mathbf{\Theta}}(\mathbf{e}_s, \mathbf{e}_r, \mathbf{h}^{(t-1)}(s, r))$

based on a RNN encoder

$\mathbf{h}^{(t)}(s, r) = \textrm{RNN}(\mathbf{e}_s, \mathbf{e}_r, g(\mathcal{O}^{(t)}_r(s)), \mathbf{h}^{(t-1)}(s, r))$

where $$\mathbf{e}_s$$ and $$\mathbf{e}_r$$ denote entity and relation embeddings, and $$\mathcal{O}^{(t)}_r(s)$$ represents the set of objects interacted with subject $$s$$ under relation $$r$$ at timestamp $$t$$. This model implements $$g$$ as the Mean Aggregator and $$f_{\mathbf{\Theta}}$$ as a linear projection.

Parameters
• num_nodes (int) – The number of nodes in the knowledge graph.

• num_rels (int) – The number of relations in the knowledge graph.

• hidden_channels (int) – Hidden size of node and relation embeddings.

• seq_len (int) – The sequence length of past events.

• num_layers (int, optional) – The number of recurrent layers. (default: 1)

• dropout (float) – If non-zero, introduces a dropout layer before the final prediction. (default: 0.)

• bias (bool, optional) – If set to False, all layers will not learn an additive bias. (default: True)

reset_parameters()[source]
static pre_transform(seq_len)[source]

Precomputes history objects

$\{ \mathcal{O}^{(t-k-1)}_r(s), \ldots, \mathcal{O}^{(t-1)}_r(s) \}$

of a torch_geometric.datasets.icews.EventDataset with $$k$$ denoting the sequence length seq_len.

forward(data)[source]

Given a data batch, computes the forward pass.

Parameters

data (torch_geometric.data.Data) – The input data, holding subject sub, relation rel and object obj information with shape [batch_size]. In addition, data needs to hold history information for subjects, given by a vector of node indices h_sub and their relative timestamps h_sub_t and batch assignments h_sub_batch. The same information must be given for objects (h_obj, h_obj_t, h_obj_batch).

test(logits, y)[source]

Given ground-truth y, computes Mean Reciprocal Rank (MRR) and Hits at 1/3/10.

class GraphUNet(in_channels, hidden_channels, out_channels, depth, pool_ratios=0.5, sum_res=True, act=<function relu>)[source]

The Graph U-Net model from the “Graph U-Nets” paper which implements a U-Net like architecture with graph pooling and unpooling operations.

Parameters
• in_channels (int) – Size of each input sample.

• hidden_channels (int) – Size of each hidden sample.

• out_channels (int) – Size of each output sample.

• depth (int) – The depth of the U-Net architecture.

• pool_ratios (float or [float], optional) – Graph pooling ratio for each depth. (default: 0.5)

• sum_res (bool, optional) – If set to False, will use concatenation for integration of skip connections instead summation. (default: True)

• act (torch.nn.functional, optional) – The nonlinearity to use. (default: torch.nn.functional.relu)

reset_parameters()[source]
forward(x, edge_index, batch=None)[source]
class SchNet(hidden_channels: int = 128, num_filters: int = 128, num_interactions: int = 6, num_gaussians: int = 50, cutoff: float = 10.0, max_num_neighbors: int = 32, readout: str = 'add', dipole: bool = False, mean: Optional[float] = None, std: Optional[float] = None, atomref: Optional[torch.Tensor] = None)[source]

The continuous-filter convolutional neural network SchNet from the “SchNet: A Continuous-filter Convolutional Neural Network for Modeling Quantum Interactions” paper that uses the interactions blocks of the form

$\mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \odot h_{\mathbf{\Theta}} ( \exp(-\gamma(\mathbf{e}_{j,i} - \mathbf{\mu}))),$

here $$h_{\mathbf{\Theta}}$$ denotes an MLP and $$\mathbf{e}_{j,i}$$ denotes the interatomic distances between atoms.

Note

For an example of using a pretrained SchNet variant, see examples/qm9_pretrained_schnet.py.

Parameters
• hidden_channels (int, optional) – Hidden embedding size. (default: 128)

• num_filters (int, optional) – The number of filters to use. (default: 128)

• num_interactions (int, optional) – The number of interaction blocks. (default: 6)

• num_gaussians (int, optional) – The number of gaussians $$\mu$$. (default: 50)

• cutoff (float, optional) – Cutoff distance for interatomic interactions. (default: 10.0)

• max_num_neighbors (int, optional) – The maximum number of neighbors to collect for each node within the cutoff distance. (default: 32)

• readout (string, optional) – Whether to apply "add" or "mean" global aggregation. (default: "add")

• dipole (bool, optional) – If set to True, will use the magnitude of the dipole moment to make the final prediction, e.g., for target 0 of torch_geometric.datasets.QM9. (default: False)

• mean (float, optional) – The mean of the property to predict. (default: None)

• std (float, optional) – The standard deviation of the property to predict. (default: None)

• atomref (torch.Tensor, optional) – The reference of single-atom properties. Expects a vector of shape (max_atomic_number, )`.

url = 'http://www.quantum-machine.org/datasets/trained_schnet_models.zip'
reset_parameters()[source]
static from_qm9_pretrained(root: str, dataset: torc