# torch_geometric.nn.conv.SuperGATConv

class SuperGATConv(in_channels: int, out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, attention_type: str = 'MX', neg_sample_ratio: float = 0.5, edge_sample_ratio: float = 1.0, is_undirected: bool = False, **kwargs)[source]

Bases: MessagePassing

The self-supervised graph attentional operator from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper.

$\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},$

where the two types of attention $$\alpha_{i,j}^{\mathrm{MX\ or\ SD}}$$ are computed as:

\begin{align}\begin{aligned}\alpha_{i,j}^{\mathrm{MX\ or\ SD}} &= \frac{ \exp\left(\mathrm{LeakyReLU}\left( e_{i,j}^{\mathrm{MX\ or\ SD}} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( e_{i,k}^{\mathrm{MX\ or\ SD}} \right)\right)}\\e_{i,j}^{\mathrm{MX}} &= \mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \cdot \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\e_{i,j}^{\mathrm{SD}} &= \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} }\end{aligned}\end{align}

The self-supervised task is a link prediction using the attention values as input to predict the likelihood $$\phi_{i,j}^{\mathrm{MX\ or\ SD}}$$ that an edge exists between nodes:

\begin{align}\begin{aligned}\phi_{i,j}^{\mathrm{MX}} &= \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\\phi_{i,j}^{\mathrm{SD}} &= \sigma \left( \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} } \right)\end{aligned}\end{align}

Note

For an example of using SuperGAT, see examples/super_gat.py.

Parameters:
• in_channels (int) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• attention_type (str, optional) – Type of attention to use ('MX', 'SD'). (default: 'MX')

• neg_sample_ratio (float, optional) – The ratio of the number of sampled negative edges to the number of positive edges. (default: 0.5)

• edge_sample_ratio (float, optional) – The ratio of samples to use for training among the number of training edges. (default: 1.0)

• is_undirected (bool, optional) – Whether the input graph is undirected. If not given, will be automatically computed with the input graph when negative sampling is performed. (default: False)

• **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:
• input: node features $$(|\mathcal{V}|, F_{in})$$, edge indices $$(2, |\mathcal{E}|)$$, negative edge indices $$(2, |\mathcal{E}^{(-)}|)$$ (optional)

• output: node features $$(|\mathcal{V}|, H * F_{out})$$

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], neg_edge_index: = None, batch: = None) [source]

Runs the forward pass of the module.

Parameters:
• x (torch.Tensor) – The input node features.

• edge_index (torch.Tensor or SparseTensor) – The edge indices.

• neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

• batch (torch.Tensor, optional) – The batch vector $$\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N$$, which assigns each element to a specific example. Used when sampling negatives on-the-fly in mini-batch scenarios. (default: None)

reset_parameters()[source]

Resets all learnable parameters of the module.

get_attention_loss() [source]

Computes the self-supervised graph attention loss.