torch_geometric.transforms.RandomLinkSplit
- class RandomLinkSplit(num_val: Union[int, float] = 0.1, num_test: Union[int, float] = 0.2, is_undirected: bool = False, key: str = 'edge_label', split_labels: bool = False, add_negative_train_samples: bool = True, neg_sampling_ratio: float = 1.0, disjoint_train_ratio: Union[int, float] = 0.0, edge_types: Optional[Union[Tuple[str, str, str], List[Tuple[str, str, str]]]] = None, rev_edge_types: Optional[Union[Tuple[str, str, str], List[Optional[Tuple[str, str, str]]]]] = None)[source]
Bases:
BaseTransform
Performs an edge-level random split into training, validation and test sets of a
Data
or aHeteroData
object (functional name:random_link_split
). The split is performed such that the training split does not include edges in validation and test splits; and the validation split does not include edges in the test split.from torch_geometric.transforms import RandomLinkSplit transform = RandomLinkSplit(is_undirected=True) train_data, val_data, test_data = transform(data)
- Parameters:
num_val (int or float, optional) – The number of validation edges. If set to a floating-point value in \([0, 1]\), it represents the ratio of edges to include in the validation set. (default:
0.1
)num_test (int or float, optional) – The number of test edges. If set to a floating-point value in \([0, 1]\), it represents the ratio of edges to include in the test set. (default:
0.2
)is_undirected (bool) – If set to
True
, the graph is assumed to be undirected, and positive and negative samples will not leak (reverse) edge connectivity across different splits. This only affects the graph split, label data will not be returned undirected. This option is ignored for bipartite edge types or wheneveredge_type != rev_edge_type
. (default:False
)key (str, optional) – The name of the attribute holding ground-truth labels. If
data[key]
does not exist, it will be automatically created and represents a binary classification task (1
= edge,0
= no edge). Ifdata[key]
exists, it has to be a categorical label from0
tonum_classes - 1
. After negative sampling, label0
represents negative edges, and labels1
tonum_classes
represent the labels of positive edges. (default:"edge_label"
)split_labels (bool, optional) – If set to
True
, will split positive and negative labels and save them in distinct attributes"pos_edge_label"
and"neg_edge_label"
, respectively. (default:False
)add_negative_train_samples (bool, optional) – Whether to add negative training samples for link prediction. If the model already performs negative sampling, then the option should be set to
False
. Otherwise, the added negative samples will be the same across training iterations unless negative sampling is performed again. (default:True
)neg_sampling_ratio (float, optional) – The ratio of sampled negative edges to the number of positive edges. (default:
1.0
)disjoint_train_ratio (int or float, optional) – If set to a value greater than
0.0
, training edges will not be shared for message passing and supervision. Instead,disjoint_train_ratio
edges are used as ground-truth labels for supervision during training. (default:0.0
)edge_types (Tuple[EdgeType] or List[EdgeType], optional) – The edge types used for performing edge-level splitting in case of operating on
HeteroData
objects. (default:None
)rev_edge_types (Tuple[EdgeType] or List[Tuple[EdgeType]], optional) – The reverse edge types of
edge_types
in case of operating onHeteroData
objects. This will ensure that edges of the reverse direction will be split accordingly to prevent any data leakage. Can beNone
in case no reverse connection exists. (default:None
)