torch_geometric.transforms.RandomLinkSplit

class RandomLinkSplit(num_val: Union[int, float] = 0.1, num_test: Union[int, float] = 0.2, is_undirected: bool = False, key: str = 'edge_label', split_labels: bool = False, add_negative_train_samples: bool = True, neg_sampling_ratio: float = 1.0, disjoint_train_ratio: Union[int, float] = 0.0, edge_types: Optional[Union[Tuple[str, str, str], List[Tuple[str, str, str]]]] = None, rev_edge_types: Optional[Union[Tuple[str, str, str], List[Optional[Tuple[str, str, str]]]]] = None)[source]

Bases: BaseTransform

Performs an edge-level random split into training, validation and test sets of a Data or a HeteroData object (functional name: random_link_split). The split is performed such that the training split does not include edges in validation and test splits; and the validation split does not include edges in the test split.

from torch_geometric.transforms import RandomLinkSplit

transform = RandomLinkSplit(is_undirected=True)
train_data, val_data, test_data = transform(data)

Parameters:

num_val (int or float, optional) – The number of validation edges. If set to a floating-point value in \([0, 1]\), it represents the ratio of edges to include in the validation set. (default: 0.1)
num_test (int or float, optional) – The number of test edges. If set to a floating-point value in \([0, 1]\), it represents the ratio of edges to include in the test set. (default: 0.2)
is_undirected (bool) – If set to True, the graph is assumed to be undirected, and positive and negative samples will not leak (reverse) edge connectivity across different splits. This only affects the graph split, label data will not be returned undirected. This option is ignored for bipartite edge types or whenever edge_type != rev_edge_type. (default: False)
key (str, optional) – The name of the attribute holding ground-truth labels. If data[key] does not exist, it will be automatically created and represents a binary classification task (1 = edge, 0 = no edge). If data[key] exists, it has to be a categorical label from 0 to num_classes - 1. After negative sampling, label 0 represents negative edges, and labels 1 to num_classes represent the labels of positive edges. (default: "edge_label")
split_labels (bool, optional) – If set to True, will split positive and negative labels and save them in distinct attributes "pos_edge_label" and "neg_edge_label", respectively. (default: False)
add_negative_train_samples (bool, optional) – Whether to add negative training samples for link prediction. If the model already performs negative sampling, then the option should be set to False. Otherwise, the added negative samples will be the same across training iterations unless negative sampling is performed again. (default: True)
neg_sampling_ratio (float, optional) – The ratio of sampled negative edges to the number of positive edges. (default: 1.0)
disjoint_train_ratio (int or float, optional) – If set to a value greater than 0.0, training edges will not be shared for message passing and supervision. Instead, disjoint_train_ratio edges are used as ground-truth labels for supervision during training. (default: 0.0)
edge_types (Tuple[EdgeType] or List[EdgeType], optional) – The edge types used for performing edge-level splitting in case of operating on HeteroData objects. (default: None)
rev_edge_types (Tuple[EdgeType] or List[Tuple[EdgeType]], optional) – The reverse edge types of edge_types in case of operating on HeteroData objects. This will ensure that edges of the reverse direction will be split accordingly to prevent any data leakage. Can be None in case no reverse connection exists. (default: None)