torch_geometric.datasets.OGB_MAG

class OGB_MAG(root: str, preprocess: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, force_reload: bool = False)[source]

Bases: InMemoryDataset

The ogbn-mag dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper. ogbn-mag is a heterogeneous graph composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensional word2vec feature vector, while all other node types are not associated with any input features. The task is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • preprocess (str, optional) – Pre-processes the original dataset by adding structural features ("metapath2vec", "TransE") to featureless nodes. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)