class SuperGATConv(in_channels: int, out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, attention_type: str = 'MX', neg_sample_ratio: float = 0.5, edge_sample_ratio: float = 1.0, is_undirected: bool = False, **kwargs)[source]

Bases: MessagePassing

The self-supervised graph attentional operator from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper.

\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]

where the two types of attention \(\alpha_{i,j}^{\mathrm{MX\ or\ SD}}\) are computed as:

\[ \begin{align}\begin{aligned}\alpha_{i,j}^{\mathrm{MX\ or\ SD}} &= \frac{ \exp\left(\mathrm{LeakyReLU}\left( e_{i,j}^{\mathrm{MX\ or\ SD}} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( e_{i,k}^{\mathrm{MX\ or\ SD}} \right)\right)}\\e_{i,j}^{\mathrm{MX}} &= \mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \cdot \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\e_{i,j}^{\mathrm{SD}} &= \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} }\end{aligned}\end{align} \]

The self-supervised task is a link prediction using the attention values as input to predict the likelihood \(\phi_{i,j}^{\mathrm{MX\ or\ SD}}\) that an edge exists between nodes:

\[ \begin{align}\begin{aligned}\phi_{i,j}^{\mathrm{MX}} &= \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\\phi_{i,j}^{\mathrm{SD}} &= \sigma \left( \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} } \right)\end{aligned}\end{align} \]


For an example of using SuperGAT, see examples/

  • in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

  • out_channels (int) – Size of each output sample.

  • heads (int, optional) – Number of multi-head-attentions. (default: 1)

  • concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

  • negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

  • dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

  • add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

  • bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

  • attention_type (str, optional) – Type of attention to use ('MX', 'SD'). (default: 'MX')

  • neg_sample_ratio (float, optional) – The ratio of the number of sampled negative edges to the number of positive edges. (default: 0.5)

  • edge_sample_ratio (float, optional) – The ratio of samples to use for training among the number of training edges. (default: 1.0)

  • is_undirected (bool, optional) – Whether the input graph is undirected. If not given, will be automatically computed with the input graph when negative sampling is performed. (default: False)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

  • input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), negative edge indices \((2, |\mathcal{E}^{(-)}|)\) (optional)

  • output: node features \((|\mathcal{V}|, H * F_{out})\)

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], neg_edge_index: Optional[Tensor] = None, batch: Optional[Tensor] = None) Tensor[source]

Runs the forward pass of the module.

  • x (torch.Tensor) – The input node features.

  • edge_index (torch.Tensor or SparseTensor) – The edge indices.

  • neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

  • batch (torch.Tensor, optional) – The batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each element to a specific example. Used when sampling negatives on-the-fly in mini-batch scenarios. (default: None)


Resets all learnable parameters of the module.

get_attention_loss() Tensor[source]

Computes the self-supervised graph attention loss.