torch_geometric.datasets

KarateClub

Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges.

TUDataset

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University.

GNNBenchmarkDataset

A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.

Planetoid

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.

NELL

The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper.

CitationFull

The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper.

CoraFull

Alias for torch_geometric.dataset.CitationFull with name="cora".

Coauthor

The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper.

Amazon

The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper.

PPI

The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Reddit

The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.

Reddit2

The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.

Flickr

The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.

Yelp

The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.

AmazonProducts

The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.

QM7b

The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.

QM9

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets.

MD17

A variety of ab-initio molecular dynamics trajectories from the authors of sGDML.

ZINC

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms.

MoleculeNet

The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology.

Entities

The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper.

RelLinkPredDataset

The relational link prediction datasets from the “Modeling Relational Data with Graph Convolutional Networks” paper.

GEDDataset

The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper.

MNISTSuperpixels

MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each.

FAUST

The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

DynamicFAUST

The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.

ShapeNet

The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories.

ModelNet

The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.

CoMA

The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

SHREC2016

The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper.

TOSCA

The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes.

PCPNetDataset

The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points.

S3DIS

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

GeometricShapes

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

BitcoinOTC

The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.

ICEWS18

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

GDELT

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

DBP15K

The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version.

WILLOWObjectClass

The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category.

PascalVOCKeypoints

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories.

PascalPF

The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.

SNAPDataset

A variety of graph datasets collected from SNAP at Stanford University.

SuiteSparseMatrixCollection

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

TrackMLParticleTrackingDataset

The TrackML Particle Tracking Challenge dataset to reconstruct particle tracks from 3D points left in the silicon detectors.

AMiner

The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type "paper", "author" and "venue".

WordNet18

The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

WordNet18RR

The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

WikiCS

The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

WebKB

The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper.

WikipediaNetwork

The Wikipedia networks introduced in the “Multi-scale Attributed Node Embedding” paper.

Actor

The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper.

JODIEDataset

MixHopSyntheticDataset

The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9).

UPFD

The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper.

GitHub

The GitHub Web and ML Developers dataset introduced in the “Multi-scale Attributed Node Embedding” paper.

FacebookPagePage

The Facebook Page-Page network dataset introduced in the “Multi-scale Attributed Node Embedding” paper.

LastFMAsia

The LastFM Asia Network dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper.

DeezerEurope

The Deezer Europe dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper.

GemsecDeezer

The Deezer User Network datasets introduced in the “GEMSEC: Graph Embedding with Self Clustering” paper.

Twitch

The Twitch Gamer networks introduced in the “Multi-scale Attributed Node Embedding” paper.

class AMiner(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type "paper", "author" and "venue". Venue categories and author research interests are available as ground truth labels for a subset of nodes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Actor(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Each node corresponds to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. The task is to classify the nodes into five categories in term of words of actor’s Wikipedia.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Amazon(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent goods and edges represent that two goods are frequently bought together. Given product reviews as bag-of-words node features, the task is to map goods to their respective product category.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Computers", "Photo").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class AmazonProducts(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class BitcoinOTC(root: str, edge_window_size: int = 10, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • edge_window_size (int, optional) – The window size for the existence of an edge in the graph sequence since its initial creation. (default: 10)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CitationFull(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper. Nodes represent documents and edges represent citation links. Datasets include citeseer, cora, cora_ml, dblp, pubmed.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cora", "Cora_ML" "CiteSeer", "DBLP", "PubMed").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CoMA(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Coauthor(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent authors that are connected by an edge if they co-authored a paper. Given paper keywords for each author’s papers, the task is to map authors to their respective field of study.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("CS", "Physics").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class CoraFull(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Alias for torch_geometric.dataset.CitationFull with name="cora".

class DBP15K(root: str, pair: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. Node features are given by pre-trained and aligned monolingual word embeddings from the “Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • pair (string) – The pair of languages ("en_zh", "en_fr", "en_ja", "zh_en", "fr_en", "ja_en").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DeezerEurope(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer Europe dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent European users of Deezer and edges are mutual follower relationships. It contains 28,281 nodes, 185,504 edges, 128 node features and 2 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DynamicFAUST(root: str, subjects: Optional[List[str]] = None, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • subjects (list, optional) – List of subjects to include in the dataset. Can include the subjects "50002", "50004", "50007", "50009", "50020", "50021", "50022", "50025", "50026", "50027". If set to None, the dataset will contain all subjects. (default: None)

  • categories (list, optional) – List of categories to include in the dataset. Can include the categories "chicken_wings", "hips", "jiggle_on_toes", "jumping_jacks", "knees", "light_hopping_loose", "light_hopping_stiff", "one_leg_jump", "one_leg_loose", "personal_move", "punching", "running_on_spot", "running_on_spot_bugfix", "shake_arms", "shake_hips", "shoulders". If set to None, the dataset will contain all categories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Entities(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by node indices.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("AIFB", "MUTAG", "BGS", "AM").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class FAUST(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class FacebookPagePage(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Facebook Page-Page network dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent verified pages on Facebook and edges are mutual likes. It contains 22,470 nodes, 342,004 edges, 128 node features and 4 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Flickr(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GDELT(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GEDDataset(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper. GEDs can be accessed via the global attributes ged and norm_ged for all train/train graph pairs and all train/test graph pairs:

dataset = GEDDataset(root, name="LINUX")
data1, data2 = dataset[0], dataset[1]
ged = dataset.ged[data1.i, data2.i]  # GED between `data1` and `data2`.

Note that GEDs are not available if both graphs are from the test set. For evaluation, it is recommended to pair up each graph from the test set with each graph in the training set.

Note

ALKANE is missing GEDs for train/test graph pairs since they are not provided in the official datasets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset (one of "AIDS700nef", "LINUX", "ALKANE", "IMDBMulti").

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GNNBenchmarkDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.

Note

The ZINC dataset is provided via torch_geometric.datasets.ZINC.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset (one of "PATTERN", "CLUSTER", "MNIST", "CIFAR10", "TSP", "CSL")

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GemsecDeezer(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer User Network datasets introduced in the “GEMSEC: Graph Embedding with Self Clustering” paper. Nodes represent Deezer user and edges are mutual friendships. The task is multi-label multi-class node classification about the genres liked by the users.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("HU", "HR", "RO").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GeometricShapes(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GitHub(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The GitHub Web and ML Developers dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent developers on GitHub and edges are mutual follower relationships. It contains 37,300 nodes, 578,006 edges, 128 node features and 2 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ICEWS18(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class JODIEDataset(root, name, transform=None, pre_transform=None)[source]
class KarateClub(transform=None)[source]

Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. Every node is labeled by one of four classes obtained via modularity-based clustering, following the “Semi-supervised Classification with Graph Convolutional Networks” paper. Training is based on a single labeled example per class, i.e. a total number of 4 labeled nodes.

Parameters

transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class LastFMAsia(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The LastFM Asia Network dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent LastFM users from Asia and edges are friendships. It contains 7,624 nodes, 55,612 edges, 128 node features and 18 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class MD17(root: str, name: str, train: Optional[bool] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets as well as all other datasets released by sGDML since then (15 in total).

For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection.

Note

Data objects contain no edge indices as these are most commonly constructed via the torch_geometric.transforms.RadiusGraph transform, with its cut-off being a hyperparameter.

Some of the trajectories were computed at different levels of theory, and for most molecules there exists two versions: a long trajectory on DFT level of theory and a short trajectory on coupled cluster level of theory. Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the name argument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via the train argument.

When using these datasets, make sure to cite the appropriate publications listed on the sGDML website.

Molecule

Level of Theory

Name

#Examples

Benzene

DFT

benzene

49,863

Benzene

DFT FHI-aims

benzene FHI-aims

627,983

Benzene

CCSD(T)

benzene CCSD(T)

1,500

Uracil

DFT

uracil

133,770

Naphthalene

DFT

naphthalene

326,250

Aspirin

DFT

aspirin

627,983

Aspirin

CCSD

aspirin CCSD

1,500

Salicylic acid

DFT

salicylic acid

320,231

Malonaldehyde

DFT

malonaldehyde

993,237

Malonaldehyde

CCSD(T)

malonaldehyde CCSD(T)

1,500

Ethanol

DFT

ethanol

555,092

Ethanol

CCSD(T)

ethanol CCSD(T)

2,000

Toluene

DFT

toluene

442,790

Toluene

CCSD(T)

toluene CCSD(T)

1,501

Paracetamol

DFT

paracetamol

106,490

Azobenzene

DFT

azobenzene

99,999

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – Keyword of the trajectory that should be loaded.

  • train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MNISTSuperpixels(root, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each. Every graph is labeled by one of 10 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MixHopSyntheticDataset(root, homophily, transform=None, pre_transform=None)[source]

The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). All graphs have 5,000 nodes, where each node corresponds to 1 out of 10 classes. The feature values of the nodes are sampled from a 2D Gaussian distribution, which are distinct for each class.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • homophily (float) – The degree of homophily (one of 0.0, 0.1, …, 0.9).

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ModelNet(root, name='10', train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string, optional) – The name of the dataset ("10" for ModelNet10, "40" for ModelNet40). (default: "10")

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MoleculeNet(root, name, transform=None, pre_transform=None, pre_filter=None)[source]

The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("ESOL", "FreeSolv", "Lipo", "PCBA", "MUV", "HIV", "BACE", "BBPB", "Tox21", "ToxCast", "SIDER", "ClinTox").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class NELL(root, transform=None, pre_transform=None)[source]

The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper. The dataset is processed as in the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.

Note

Entity nodes are described by sparse feature vectors of type torch_sparse.SparseTensor, which can be either used directly, or can be converted via data.x.to_dense(), data.x.to_scipy() or data.x.to_torch_sparse_coo_tensor().

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class PCPNetDataset(root, category, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. For each shape, surface normals and local curvatures are given as node features.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The training set category (one of "NoNoise", "Noisy", "VarDensity", "NoisyAndVarDensity" for split="train" or split="val", or one of "All", "LowNoise", "MedNoise", "HighNoise", :obj:”VarDensityStriped”, "VarDensityGradient" for split="test").

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PPI(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalPF(root, category, transform=None, pre_transform=None, pre_filter=None)[source]

The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalVOCKeypoints(root, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories. The dataset is pre-filtered to exclude difficult, occluded and truncated objects. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Planetoid(root: str, name: str, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").

  • split (string) –

    The type of dataset split ("public", "full", "random"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "random", train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: "public")

  • num_train_per_class (int, optional) – The number of training samples per class in case of "random" split. (default: 20)

  • num_val (int, optional) – The number of validation samples in case of "random" split. (default: 500)

  • num_test (int, optional) – The number of test samples in case of "random" split. (default: 1000)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class QM7b(root, transform=None, pre_transform=None, pre_filter=None)[source]

The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class QM9(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.

Target

Property

Description

Unit

0

\(\mu\)

Dipole moment

\(\textrm{D}\)

1

\(\alpha\)

Isotropic polarizability

\({a_0}^3\)

2

\(\epsilon_{\textrm{HOMO}}\)

Highest occupied molecular orbital energy

\(\textrm{eV}\)

3

\(\epsilon_{\textrm{LUMO}}\)

Lowest unoccupied molecular orbital energy

\(\textrm{eV}\)

4

\(\Delta \epsilon\)

Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)

\(\textrm{eV}\)

5

\(\langle R^2 \rangle\)

Electronic spatial extent

\({a_0}^2\)

6

\(\textrm{ZPVE}\)

Zero point vibrational energy

\(\textrm{eV}\)

7

\(U_0\)

Internal energy at 0K

\(\textrm{eV}\)

8

\(U\)

Internal energy at 298.15K

\(\textrm{eV}\)

9

\(H\)

Enthalpy at 298.15K

\(\textrm{eV}\)

10

\(G\)

Free energy at 298.15K

\(\textrm{eV}\)

11

\(c_{\textrm{v}}\)

Heat capavity at 298.15K

\(\frac{\textrm{cal}}{\textrm{mol K}}\)

12

\(U_0^{\textrm{ATOM}}\)

Atomization energy at 0K

\(\textrm{eV}\)

13

\(U^{\textrm{ATOM}}\)

Atomization energy at 298.15K

\(\textrm{eV}\)

14

\(H^{\textrm{ATOM}}\)

Atomization enthalpy at 298.15K

\(\textrm{eV}\)

15

\(G^{\textrm{ATOM}}\)

Atomization free energy at 298.15K

\(\textrm{eV}\)

16

\(A\)

Rotational constant

\(\textrm{GHz}\)

17

\(B\)

Rotational constant

\(\textrm{GHz}\)

18

\(C\)

Rotational constant

\(\textrm{GHz}\)

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Reddit(root, transform=None, pre_transform=None)[source]

The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Reddit2(root, transform=None, pre_transform=None)[source]

The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.

Note

This is a sparser version of the original Reddit dataset (~23M edges instead of ~114M edges), and is used in papers such as SGC and GraphSAINT.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class RelLinkPredDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational link prediction datasets from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by sets of triplets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("FB15k-237").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class S3DIS(root, test_area=6, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • test_area (int, optional) – Which area to use for testing (1-6). (default: 6)

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SHREC2016(root, partiality, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper. The reference shape can be referenced via dataset.ref.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • partiality (string) – The partiality of the dataset (one of "Holes", "Cuts").

  • category (string) – The category of the dataset (one of "Cat", "Centaur", "David", "Dog", "Horse", "Michael", "Victoria", "Wolf").

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SNAPDataset(root, name, transform=None, pre_transform=None, pre_filter=None)[source]

A variety of graph datasets collected from SNAP at Stanford University.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ShapeNet(root, categories=None, include_normals=True, split='trainval', transform=None, pre_transform=None, pre_filter=None)[source]

The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories. Each category is annotated with 2 to 6 parts.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • categories (string or [string], optional) – The category of the CAD models (one or a combination of "Airplane", "Bag", "Cap", "Car", "Chair", "Earphone", "Guitar", "Knife", "Lamp", "Laptop", "Motorbike", "Mug", "Pistol", "Rocket", "Skateboard", "Table"). Can be explicitly set to None to load all categories. (default: None)

  • include_normals (bool, optional) – If set to False, will not include normal vectors as input features. (default: True)

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. (default: "trainval")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SuiteSparseMatrixCollection(root, group, name, transform=None, pre_transform=None)[source]

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • group (string) – The group of the sparse matrix.

  • name (string) – The name of the sparse matrix.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class TOSCA(root, categories=None, transform=None, pre_transform=None, pre_filter=None)[source]

The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes. Meshes within the same category have the same triangulation and an equal number of vertices numbered in a compatible way.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • categories (list, optional) – List of categories to include in the dataset. Can include the categories "Cat", "Centaur", "David", "Dog", "Gorilla", "Horse", "Michael", "Victoria", "Wolf". If set to None, the dataset will contain all categories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

  • cleaned – (bool, optional): If True, the dataset will contain only non-isomorphic graphs. (default: False)

class TrackMLParticleTrackingDataset(root, transform=None)[source]

The TrackML Particle Tracking Challenge dataset to reconstruct particle tracks from 3D points left in the silicon detectors.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class Twitch(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Twitch Gamer networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent gamers on Twitch and edges are followerships between them. Node features represent embeddings of games played by the Twitch users. The task is to predict whether a user streams mature content.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("DE", "EN", "ES", "FR", "PT", "RU").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class UPFD(root, name, feature, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper. It includes two sets of tree-structured fake & real news propagation graphs extracted from Twitter. For a single graph, the root node represents the source news, and leaf nodes represent Twitter users who retweeted the same root news. A user node has an edge to the news node if and only if the user retweeted the root news directly. Two user nodes have an edge if and only if one user retweeted the root news from the other user. Four different node features are encoded using different encoders. Please refer to GNN-FakeNews repo for more details.

Note

For an example of using UPFD, see examples/upfd.py.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the graph set ("politifact", "gossipcop").

  • feature (string) – The node feature type ("profile", "spacy", "bert", "content"). If set to "profile", the 10-dimensional node feature is composed of ten Twitter user profile attributes. If set to "spacy", the 300-dimensional node feature is composed of Twitter user historical tweets encoded by the spaCy word2vec encoder. If set to "bert", the 768-dimensional node feature is composed of Twitter user historical tweets encoded by the bert-as-service. If set to "content", the 310-dimensional node feature is composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector.

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class WILLOWObjectClass(root, category, transform=None, pre_transform=None, pre_filter=None)[source]

The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Car", "Duck", "Face", "Motorbike", "Winebottle").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class WebKB(root, name, transform=None, pre_transform=None)[source]

The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The task is to classify the nodes into one of the five categories, student, project, course, staff, and faculty.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cornell", "Texas", "Wisconsin").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikiCS(root, transform=None, pre_transform=None)[source]

The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikipediaNetwork(root: str, name: str, geom_gcn_preprocess: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Wikipedia networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features represent several informative nouns in the Wikipedia pages. The task is to predict the average daily traffic of the web page.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("chameleon", "crocodile", "squirrel").

  • geom_gcn_preprocess (bool) – If set to True, will apply the pre-processing technique introduced in the “Geom-GCN: Geometric Graph Convolutional Networks” <https://arxiv.org/abs/2002.05287>_, in which the average monthly traffic of the web page is converted into five categories to predict.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18(root, transform=None, pre_transform=None)[source]

The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

Note

The original WordNet18 dataset suffers from test leakage, i.e. more than 80% of test triplets can be found in the training set with another relation type. Therefore, it should not be used for research evaluation anymore. We recommend to use its cleaned version WordNet18RR instead.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18RR(root, transform=None, pre_transform=None)[source]

The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Yelp(root, transform=None, pre_transform=None)[source]

The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ZINC(root, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress a synthetic computed property dubbed as the constrained solubility.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • subset (boolean, optional) –

    If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)