torch_geometric.nn.conv.SuperGATConv
- class SuperGATConv(in_channels: int, out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, bias: bool = True, attention_type: str = 'MX', neg_sample_ratio: float = 0.5, edge_sample_ratio: float = 1.0, is_undirected: bool = False, **kwargs)[source]
Bases:
MessagePassing
The self-supervised graph attentional operator from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper.
\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]where the two types of attention \(\alpha_{i,j}^{\mathrm{MX\ or\ SD}}\) are computed as:
\[ \begin{align}\begin{aligned}\alpha_{i,j}^{\mathrm{MX\ or\ SD}} &= \frac{ \exp\left(\mathrm{LeakyReLU}\left( e_{i,j}^{\mathrm{MX\ or\ SD}} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left( e_{i,k}^{\mathrm{MX\ or\ SD}} \right)\right)}\\e_{i,j}^{\mathrm{MX}} &= \mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \cdot \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\e_{i,j}^{\mathrm{SD}} &= \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} }\end{aligned}\end{align} \]The self-supervised task is a link prediction using the attention values as input to predict the likelihood \(\phi_{i,j}^{\mathrm{MX\ or\ SD}}\) that an edge exists between nodes:
\[ \begin{align}\begin{aligned}\phi_{i,j}^{\mathrm{MX}} &= \sigma \left( \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j \right)\\\phi_{i,j}^{\mathrm{SD}} &= \sigma \left( \frac{ \left( \mathbf{\Theta}\mathbf{x}_i \right)^{\top} \mathbf{\Theta}\mathbf{x}_j }{ \sqrt{d} } \right)\end{aligned}\end{align} \]Note
For an example of using SuperGAT, see examples/super_gat.py.
- Parameters:
in_channels (int) – Size of each input sample, or
-1
to derive the size from the first input(s) to the forward method.out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default:
1
)concat (bool, optional) – If set to
False
, the multi-head attentions are averaged instead of concatenated. (default:True
)negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default:
0.2
)dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default:
0
)add_self_loops (bool, optional) – If set to
False
, will not add self-loops to the input graph. (default:True
)bias (bool, optional) – If set to
False
, the layer will not learn an additive bias. (default:True
)attention_type (str, optional) – Type of attention to use (
'MX'
,'SD'
). (default:'MX'
)neg_sample_ratio (float, optional) – The ratio of the number of sampled negative edges to the number of positive edges. (default:
0.5
)edge_sample_ratio (float, optional) – The ratio of samples to use for training among the number of training edges. (default:
1.0
)is_undirected (bool, optional) – Whether the input graph is undirected. If not given, will be automatically computed with the input graph when negative sampling is performed. (default:
False
)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing
.
- Shapes:
input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), negative edge indices \((2, |\mathcal{E}^{(-)}|)\) (optional)
output: node features \((|\mathcal{V}|, H * F_{out})\)
- forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], neg_edge_index: Optional[Tensor] = None, batch: Optional[Tensor] = None) Tensor [source]
Runs the forward pass of the module.
- Parameters:
x (torch.Tensor) – The input node features.
edge_index (torch.Tensor or SparseTensor) – The edge indices.
neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default:
None
)batch (torch.Tensor, optional) – The batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each element to a specific example. Used when sampling negatives on-the-fly in mini-batch scenarios. (default:
None
)