torch_geometric.datasets¶

`KarateClub`	Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges.
`TUDataset`	A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University.
`GNNBenchmarkDataset`	A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.
`Planetoid`	The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.
`NELL`	The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper.
`CitationFull`	The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper.
`CoraFull`	Alias for `torch_geometric.dataset.CitationFull` with `name="cora"`.
`Coauthor`	The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper.
`Amazon`	The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper.
`PPI`	The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).
`Reddit`	The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.
`Reddit2`	The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.
`Flickr`	The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.
`Yelp`	The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.
`AmazonProducts`	The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.
`QM7b`	The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.
`QM9`	The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets.
`ZINC`	The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms.
`MoleculeNet`	The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology.
`Entities`	The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper.
`GEDDataset`	The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper.
`MNISTSuperpixels`	MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each.
`FAUST`	The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.
`DynamicFAUST`	The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.
`ShapeNet`	The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories.
`ModelNet`	The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.
`CoMA`	The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.
`SHREC2016`	The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper.
`TOSCA`	The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes.
`PCPNetDataset`	The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points.
`S3DIS`	The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).
`GeometricShapes`	Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.
`BitcoinOTC`	The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.
`ICEWS18`	The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).
`GDELT`	The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).
`DBP15K`	The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version.
`WILLOWObjectClass`	The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category.
`PascalVOCKeypoints`	The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories.
`PascalPF`	The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.
`SNAPDataset`	A variety of graph datasets collected from SNAP at Stanford University.
`SuiteSparseMatrixCollection`	A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.
`TrackMLParticleTrackingDataset`	The TrackML Particle Tracking Challenge dataset to reconstruct particle tracks from 3D points left in the silicon detectors.
`AMiner`	The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type `"paper"`, `"author"` and `"venue"`.
`WordNet18`	The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.
`WordNet18RR`	The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.
`WikiCS`	The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.
`WebKB`	The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper.
`WikipediaNetwork`	The Wikipedia networks used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper.
`Actor`	The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper.
`JODIEDataset`
`MixHopSyntheticDataset`	The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9).
`UPFD`	The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper.

class AMiner(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type "paper", "author" and "venue". Venue categories and author research interests are available as ground truth labels for a subset of nodes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Actor(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Each node corresponds to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. The task is to classify the nodes into five categories in term of words of actor’s Wikipedia.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Amazon(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent goods and edges represent that two goods are frequently bought together. Given product reviews as bag-of-words node features, the task is to map goods to their respective product category.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Computers", "Photo").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class AmazonProducts(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class BitcoinOTC(root: str, edge_window_size: int = 10, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.

Parameters

root (string) – Root directory where the dataset should be saved.
edge_window_size (int, optional) – The window size for the existence of an edge in the graph sequence since its initial creation. (default: 10)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CitationFull(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper. Nodes represent documents and edges represent citation links. Datasets include citeseer, cora, cora_ml, dblp, pubmed.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cora", "Cora_ML" "CiteSeer", "DBLP", "PubMed").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CoMA(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Coauthor(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent authors that are connected by an edge if they co-authored a paper. Given paper keywords for each author’s papers, the task is to map authors to their respective field of study.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("CS", "Physics").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CoraFull(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶: Alias for torch_geometric.dataset.CitationFull with name="cora".

class DBP15K(root: str, pair: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. Node features are given by pre-trained and aligned monolingual word embeddings from the “Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network” paper.

Parameters

root (string) – Root directory where the dataset should be saved.
pair (string) – The pair of languages ("en_zh", "en_fr", "en_ja", "zh_en", "fr_en", "ja_en").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DynamicFAUST(root: str, subjects: Optional[List[str]] = None, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
subjects (list, optional) – List of subjects to include in the dataset. Can include the subjects "50002", "50004", "50007", "50009", "50020", "50021", "50022", "50025", "50026", "50027". If set to None, the dataset will contain all subjects. (default: None)
categories (list, optional) – List of categories to include in the dataset. Can include the categories "chicken_wings", "hips", "jiggle_on_toes", "jumping_jacks", "knees", "light_hopping_loose", "light_hopping_stiff", "one_leg_jump", "one_leg_loose", "personal_move", "punching", "running_on_spot", "running_on_spot_bugfix", "shake_arms", "shake_hips", "shoulders". If set to None, the dataset will contain all categories. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Entities(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by node indices.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("AIFB", "MUTAG", "BGS", "AM").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class FAUST(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Flickr(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GDELT(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GEDDataset(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper. GEDs can be accessed via the global attributes ged and norm_ged for all train/train graph pairs and all train/test graph pairs:

dataset = GEDDataset(root, name="LINUX")
data1, data2 = dataset[0], dataset[1]
ged = dataset.ged[data1.i, data2.i]  # GED between `data1` and `data2`.

Note

ALKANE is missing GEDs for train/test graph pairs since they are not provided in the official datasets.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "AIDS700nef", "LINUX", "ALKANE", "IMDBMulti").
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GNNBenchmarkDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.

Note

The ZINC dataset is provided via torch_geometric.datasets.ZINC.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "PATTERN", "CLUSTER", "MNIST", "CIFAR10", "TSP", "CSL")
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GeometricShapes(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ICEWS18(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]¶

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class JODIEDataset(root, name, transform=None, pre_transform=None)[source]¶

class KarateClub(transform=None)[source]¶

Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. Every node is labeled by one of four classes obtained via modularity-based clustering, following the “Semi-supervised Classification with Graph Convolutional Networks” paper. Training is based on a single labeled example per class, i.e. a total number of 4 labeled nodes.

Parameters: transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class MNISTSuperpixels(root, train=True, transform=None, pre_transform=None, pre_filter=None)[source]¶

MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each. Every graph is labeled by one of 10 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MixHopSyntheticDataset(root, homophily, transform=None, pre_transform=None)[source]¶

The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). All graphs have 5,000 nodes, where each node corresponds to 1 out of 10 classes. The feature values of the nodes are sampled from a 2D Gaussian distribution, which are distinct for each class.

Parameters

root (string) – Root directory where the dataset should be saved.
homophily (float) – The degree of homophily (one of 0.0, 0.1, …, 0.9).
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ModelNet(root, name='10', train=True, transform=None, pre_transform=None, pre_filter=None)[source]¶

The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string, optional) – The name of the dataset ("10" for ModelNet10, "40" for ModelNet40). (default: "10")
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MoleculeNet(root, name, transform=None, pre_transform=None, pre_filter=None)[source]¶

The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("ESOL", "FreeSolv", "Lipo", "PCBA", "MUV", "HIV", "BACE", "BBPB", "Tox21", "ToxCast", "SIDER", "ClinTox").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class NELL(root, transform=None, pre_transform=None)[source]¶

The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper. The dataset is processed as in the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.

Note

Entity nodes are described by sparse feature vectors of type torch_sparse.SparseTensor, which can be either used directly, or can be converted via data.x.to_dense(), data.x.to_scipy() or data.x.to_torch_sparse_coo_tensor().

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class PCPNetDataset(root, category, split='train', transform=None, pre_transform=None, pre_filter=None)[source]¶

The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. For each shape, surface normals and local curvatures are given as node features.

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The training set category (one of "NoNoise", "Noisy", "VarDensity", "NoisyAndVarDensity" for split="train" or split="val", or one of "All", "LowNoise", "MedNoise", "HighNoise", :obj:”VarDensityStriped”, "VarDensityGradient" for split="test").
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PPI(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]¶

The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalPF(root, category, transform=None, pre_transform=None, pre_filter=None)[source]¶

The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalVOCKeypoints(root, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]¶

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories. The dataset is pre-filtered to exclude difficult, occluded and truncated objects. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Planetoid(root: str, name: str, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]¶

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").
split (string) –
The type of dataset split ("public", "full", "random"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "random", train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: "public")
num_train_per_class (int, optional) – The number of training samples per class in case of "random" split. (default: 20)
num_val (int, optional) – The number of validation samples in case of "random" split. (default: 500)
num_test (int, optional) – The number of test samples in case of "random" split. (default: 1000)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class QM7b(root, transform=None, pre_transform=None, pre_filter=None)[source]¶

The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class QM9(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]¶

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.

Target	Property	Description	Unit
0	\(\mu\)	Dipole moment	\(\textrm{D}\)
1	\(\alpha\)	Isotropic polarizability	\({a_0}^3\)
2	\(\epsilon_{\textrm{HOMO}}\)	Highest occupied molecular orbital energy	\(\textrm{eV}\)
3	\(\epsilon_{\textrm{LUMO}}\)	Lowest unoccupied molecular orbital energy	\(\textrm{eV}\)
4	\(\Delta \epsilon\)	Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)	\(\textrm{eV}\)
5	\(\langle R^2 \rangle\)	Electronic spatial extent	\({a_0}^2\)
6	\(\textrm{ZPVE}\)	Zero point vibrational energy	\(\textrm{eV}\)
7	\(U_0\)	Internal energy at 0K	\(\textrm{eV}\)
8	\(U\)	Internal energy at 298.15K	\(\textrm{eV}\)
9	\(H\)	Enthalpy at 298.15K	\(\textrm{eV}\)
10	\(G\)	Free energy at 298.15K	\(\textrm{eV}\)
11	\(c_{\textrm{v}}\)	Heat capavity at 298.15K	\(\frac{\textrm{cal}}{\textrm{mol K}}\)
12	\(U_0^{\textrm{ATOM}}\)	Atomization energy at 0K	\(\textrm{eV}\)
13	\(U^{\textrm{ATOM}}\)	Atomization energy at 298.15K	\(\textrm{eV}\)
14	\(H^{\textrm{ATOM}}\)	Atomization enthalpy at 298.15K	\(\textrm{eV}\)
15	\(G^{\textrm{ATOM}}\)	Atomization free energy at 298.15K	\(\textrm{eV}\)
16	\(A\)	Rotational constant	\(\textrm{GHz}\)
17	\(B\)	Rotational constant	\(\textrm{GHz}\)
18	\(C\)	Rotational constant	\(\textrm{GHz}\)

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Reddit(root, transform=None, pre_transform=None)[source]¶

The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Reddit2(root, transform=None, pre_transform=None)[source]¶

The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.

Note

This is a sparser version of the original Reddit dataset (~23M edges instead of ~114M edges), and is used in papers such as SGC and GraphSAINT.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class S3DIS(root, test_area=6, train=True, transform=None, pre_transform=None, pre_filter=None)[source]¶

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

Parameters

root (string) – Root directory where the dataset should be saved.
test_area (int, optional) – Which area to use for testing (1-6). (default: 6)
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SHREC2016(root, partiality, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]¶

The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper. The reference shape can be referenced via dataset.ref.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
partiality (string) – The partiality of the dataset (one of "Holes", "Cuts").
category (string) – The category of the dataset (one of "Cat", "Centaur", "David", "Dog", "Horse", "Michael", "Victoria", "Wolf").
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SNAPDataset(root, name, transform=None, pre_transform=None, pre_filter=None)[source]¶

A variety of graph datasets collected from SNAP at Stanford University.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ShapeNet(root, categories=None, include_normals=True, split='trainval', transform=None, pre_transform=None, pre_filter=None)[source]¶

The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories. Each category is annotated with 2 to 6 parts.

Parameters

root (string) – Root directory where the dataset should be saved.
categories (string or [string], optional) – The category of the CAD models (one or a combination of "Airplane", "Bag", "Cap", "Car", "Chair", "Earphone", "Guitar", "Knife", "Lamp", "Laptop", "Motorbike", "Mug", "Pistol", "Rocket", "Skateboard", "Table"). Can be explicitly set to None to load all categories. (default: None)
include_normals (bool, optional) – If set to False, will not include normal vectors as input features. (default: True)
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. (default: "trainval")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SuiteSparseMatrixCollection(root, group, name, transform=None, pre_transform=None)[source]¶

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

Parameters

root (string) – Root directory where the dataset should be saved.
group (string) – The group of the sparse matrix.
name (string) – The name of the sparse matrix.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class TOSCA(root, categories=None, transform=None, pre_transform=None, pre_filter=None)[source]¶

The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes. Meshes within the same category have the same triangulation and an equal number of vertices numbered in a compatible way.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
categories (list, optional) – List of categories to include in the dataset. Can include the categories "Cat", "Centaur", "David", "Dog", "Gorilla", "Horse", "Michael", "Victoria", "Wolf". If set to None, the dataset will contain all categories. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]¶

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)
use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)
cleaned – (bool, optional): If True, the dataset will contain only non-isomorphic graphs. (default: False)

class TrackMLParticleTrackingDataset(root, transform=None)[source]¶

The TrackML Particle Tracking Challenge dataset to reconstruct particle tracks from 3D points left in the silicon detectors.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class UPFD(root, name, feature, split='train', transform=None, pre_transform=None, pre_filter=None)[source]¶

The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper. It includes two sets of tree-structured fake & real news propagation graphs extracted from Twitter. For a single graph, the root node represents the source news, and leaf nodes represent Twitter users who retweeted the same root news. A user node has an edge to the news node if and only if the user retweeted the root news directly. Two user nodes have an edge if and only if one user retweeted the root news from the other user. Four different node features are encoded using different encoders. Please refer to GNN-FakeNews repo for more details.

Note

For an example of using UPFD, see examples/upfd.py.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the graph set ("politifact", "gossipcop").
feature (string) – The node feature type ("profile", "spacy", "bert", "content"). If set to "profile", the 10-dimensional node feature is composed of ten Twitter user profile attributes. If set to "spacy", the 300-dimensional node feature is composed of Twitter user historical tweets encoded by the spaCy word2vec encoder. If set to "bert", the 768-dimensional node feature is composed of Twitter user historical tweets encoded by the bert-as-service. If set to "content", the 310-dimensional node feature is composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector.
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class WILLOWObjectClass(root, category, transform=None, pre_transform=None, pre_filter=None)[source]¶

The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Car", "Duck", "Face", "Motorbike", "Winebottle").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class WebKB(root, name, transform=None, pre_transform=None)[source]¶

The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The task is to classify the nodes into one of the five categories, student, project, course, staff, and faculty.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cornell", "Texas", "Wisconsin").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikiCS(root, transform=None, pre_transform=None)[source]¶

The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikipediaNetwork(root, name, transform=None, pre_transform=None)[source]¶

The Wikipedia networks used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features represent several informative nouns in the Wikipedia pages. The task is to classify the nodes into five categories in term of the number of average monthly traffic of the web page.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Chameleon", "Squirrel").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18(root, transform=None, pre_transform=None)[source]¶

The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

Note

The original WordNet18 dataset suffers from test leakage, i.e. more than 80% of test triplets can be found in the training set with another relation type. Therefore, it should not be used for research evaluation anymore. We recommend to use its cleaned version WordNet18RR instead.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18RR(root, transform=None, pre_transform=None)[source]¶

The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Yelp(root, transform=None, pre_transform=None)[source]¶

The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ZINC(root, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None)[source]¶

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress a synthetic computed property dubbed as the constrained solubility.

Parameters

root (string) – Root directory where the dataset should be saved.
subset (boolean, optional) –
If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)