torch_geometric.nn.conv.GATv2Conv

class GATv2Conv(in_channels: Union[int, Tuple[int, int]], out_channels: int, heads: int = 1, concat: bool = True, negative_slope: float = 0.2, dropout: float = 0.0, add_self_loops: bool = True, edge_dim: Optional[int] = None, fill_value: Union[float, Tensor, str] = 'mean', bias: bool = True, share_weights: bool = False, residual: bool = False, **kwargs)[source]

Bases: MessagePassing

The GATv2 operator from the “How Attentive are Graph Attention Networks?” paper, which fixes the static attention problem of the standard GATConv layer. Since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.

\[\mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \alpha_{i,j}\mathbf{\Theta}_{t}\mathbf{x}_{j},\]

where the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left( \mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_j \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left( \mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_k \right)\right)}.\]

If the graph has multi-dimensional edge features \(\mathbf{e}_{i,j}\), the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left( \mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_j + \mathbf{\Theta}_{e} \mathbf{e}_{i,j} \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left( \mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_k + \mathbf{\Theta}_{e} \mathbf{e}_{i,k}] \right)\right)}.\]

Parameters:

in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities in case of a bipartite graph.
out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default: 1)
concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)
negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)
dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)
add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)
edge_dim (int, optional) – Edge feature dimensionality (in case there are any). (default: None)
fill_value (float or torch.Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_dim != None). If given as float or torch.Tensor, edge features of self-loops will be directly given by fill_value. If given as str, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add", "mean", "min", "max", "mul"). (default: "mean")
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)
share_weights (bool, optional) – If set to True, the same matrix will be applied to the source and the target node of every edge, i.e. \(\mathbf{\Theta}_{s} = \mathbf{\Theta}_{t}\). (default: False)
residual (bool, optional) – If set to True, the layer will add a learnable skip-connection. (default: False)
**kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:

input: node features \((|\mathcal{V}|, F_{in})\) or \(((|\mathcal{V_s}|, F_{s}), (|\mathcal{V_t}|, F_{t}))\) if bipartite, edge indices \((2, |\mathcal{E}|)\), edge features \((|\mathcal{E}|, D)\) (optional)
output: node features \((|\mathcal{V}|, H * F_{out})\) or \(((|\mathcal{V}_t|, H * F_{out})\) if bipartite. If return_attention_weights=True, then \(((|\mathcal{V}|, H * F_{out}), ((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) or \(((|\mathcal{V_t}|, H * F_{out}), ((2, |\mathcal{E}|), (|\mathcal{E}|, H)))\) if bipartite

forward(x: Union[Tensor, Tuple[Tensor, Tensor]], edge_index: Union[Tensor, SparseTensor], edge_attr: Optional[Tensor] = None, return_attention_weights: Optional[Tensor] = None) → Tensor[source]

forward(x: Union[Tensor, Tuple[Tensor, Tensor]], edge_index: Tensor, edge_attr: Optional[Tensor] = None, return_attention_weights: bool = None) → Tuple[Tensor, Tuple[Tensor, Tensor]]

forward(x: Union[Tensor, Tuple[Tensor, Tensor]], edge_index: SparseTensor, edge_attr: Optional[Tensor] = None, return_attention_weights: bool = None) → Tuple[Tensor, SparseTensor]

Runs the forward pass of the module.

Parameters:

x (torch.Tensor or (torch.Tensor, torch.Tensor)) – The input node features.
edge_index (torch.Tensor or SparseTensor) – The edge indices.
edge_attr (torch.Tensor, optional) – The edge features. (default: None)
return_attention_weights (bool, optional) – If set to True, will additionally return the tuple (edge_index, attention_weights), holding the computed attention weights for each edge. (default: None)

Return type:

Union[Tensor, Tuple[Tensor, Tuple[Tensor, Tensor]], Tuple[Tensor, SparseTensor]]

reset_parameters()[source]: Resets all learnable parameters of the module.