torch_geometric.datasets

KarateClub

Zachary's karate club network from the "An Information Flow Model for Conflict and Fission in Small Groups" paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges.

TUDataset

A variety of graph kernel benchmark datasets, .e.g. "IMDB-BINARY", "REDDIT-BINARY" or "PROTEINS", collected from the TU Dortmund University.

GNNBenchmarkDataset

A variety of artificially and semi-artificially generated graph datasets from the "Benchmarking Graph Neural Networks" paper.

Planetoid

The citation network datasets "Cora", "CiteSeer" and "PubMed" from the "Revisiting Semi-Supervised Learning with Graph Embeddings" paper.

FakeDataset

A fake dataset that returns randomly generated Data objects.

FakeHeteroDataset

A fake dataset that returns randomly generated HeteroData objects.

NELL

The NELL dataset, a knowledge graph from the "Toward an Architecture for Never-Ending Language Learning" paper.

CitationFull

The full citation network datasets from the "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking" paper.

CoraFull

Alias for torch_geometric.datasets.CitationFull with name="cora".

Coauthor

The Coauthor CS and Coauthor Physics networks from the "Pitfalls of Graph Neural Network Evaluation" paper.

Amazon

The Amazon Computers and Amazon Photo networks from the "Pitfalls of Graph Neural Network Evaluation" paper.

PPI

The protein-protein interaction networks from the "Predicting Multicellular Function through Multi-layer Tissue Networks" paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Reddit

The Reddit dataset from the "Inductive Representation Learning on Large Graphs" paper, containing Reddit posts belonging to different communities.

Reddit2

The Reddit dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing Reddit posts belonging to different communities.

Flickr

The Flickr dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing descriptions and common properties of images.

Yelp

The Yelp dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing customer reviewers and their friendship.

AmazonProducts

The Amazon dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing products and its categories.

QM7b

The QM7b dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of 7,211 molecules with 14 regression targets.

QM9

The QM9 dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of about 130,000 molecules with 19 regression targets.

MD17

A variety of ab-initio molecular dynamics trajectories from the authors of sGDML.

ZINC

The ZINC dataset from the ZINC database and the "Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules" paper, containing about 250,000 molecular graphs with up to 38 heavy atoms.

AQSOL

The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.

MoleculeNet

The MoleculeNet benchmark collection from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, containing datasets from physical chemistry, biophysics and physiology.

Entities

The relational entities networks "AIFB", "MUTAG", "BGS" and "AM" from the "Modeling Relational Data with Graph Convolutional Networks" paper.

RelLinkPredDataset

The relational link prediction datasets from the "Modeling Relational Data with Graph Convolutional Networks" paper.

GEDDataset

The GED datasets from the "Graph Edit Distance Computation via Graph Neural Networks" paper.

AttributedGraphDataset

A variety of attributed graph datasets from the "Scaling Attributed Network Embedding to Massive Graphs" paper.

MNISTSuperpixels

MNIST superpixels dataset from the "Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs" paper, containing 70,000 graphs with 75 nodes each.

FAUST

The FAUST humans dataset from the "FAUST: Dataset and Evaluation for 3D Mesh Registration" paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

DynamicFAUST

The dynamic FAUST humans dataset from the "Dynamic FAUST: Registering Human Bodies in Motion" paper.

ShapeNet

The ShapeNet part level segmentation dataset from the "A Scalable Active Framework for Region Annotation in 3D Shape Collections" paper, containing about 17,000 3D shape point clouds from 16 shape categories.

ModelNet

The ModelNet10/40 datasets from the "3D ShapeNets: A Deep Representation for Volumetric Shapes" paper, containing CAD models of 10 and 40 categories, respectively.

CoMA

The CoMA 3D faces dataset from the "Generating 3D faces using Convolutional Mesh Autoencoders" paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

SHREC2016

The SHREC 2016 partial matching dataset from the "SHREC'16: Partial Matching of Deformable Shapes" paper.

TOSCA

The TOSCA dataset from the "Numerical Geometry of Non-Ridig Shapes" book, containing 80 meshes.

PCPNetDataset

The PCPNet dataset from the "PCPNet: Learning Local Shape Properties from Raw Point Clouds" paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points.

S3DIS

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the "3D Semantic Parsing of Large-Scale Indoor Spaces" paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

GeometricShapes

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

BitcoinOTC

The Bitcoin-OTC dataset from the "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" paper, consisting of 138 who-trusts-whom networks of sequential time steps.

ICEWS18

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

GDELT

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

DBP15K

The DBP15K dataset from the "Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding" paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version.

WILLOWObjectClass

The WILLOW-ObjectClass dataset from the "Learning Graphs to Match" paper, containing 10 equal keypoints of at least 40 images in each category.

PascalVOCKeypoints

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations" paper, containing 0 to 23 keypoints per example over 20 categories.

PascalPF

The Pascal-PF dataset from the "Proposal Flow" paper, containing 4 to 16 keypoints per example over 20 categories.

SNAPDataset

A variety of graph datasets collected from SNAP at Stanford University.

SuiteSparseMatrixCollection

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

AMiner

The heterogeneous AMiner dataset from the "metapath2vec: Scalable Representation Learning for Heterogeneous Networks" paper, consisting of nodes from type "paper", "author" and "venue".

WordNet18

The WordNet18 dataset from the "Translating Embeddings for Modeling Multi-Relational Data" paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

WordNet18RR

The WordNet18RR dataset from the "Convolutional 2D Knowledge Graph Embeddings" paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

WikiCS

The semi-supervised Wikipedia-based dataset from the "Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks" paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

WebKB

The WebKB datasets used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper.

WikipediaNetwork

The Wikipedia networks introduced in the "Multi-scale Attributed Node Embedding" paper.

Actor

The actor-only induced subgraph of the film-director-actor-writer network used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper.

OGB_MAG

The ogbn-mag dataset from the "Open Graph Benchmark: Datasets for Machine Learning on Graphs" paper.

DBLP

A subset of the DBLP computer science bibliography website, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.

MovieLens

A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type "movie" and "user".

IMDB

A subset of the Internet Movie Database (IMDB), as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.

LastFM

A subset of the last.fm music website keeping track of users' listining information from various sources, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.

HGBDataset

A variety of heterogeneous graph benchmark datasets from the "Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks" paper.

JODIEDataset

MixHopSyntheticDataset

The MixHop synthetic dataset from the "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9).

UPFD

The tree-structured fake news propagation graph classification dataset from the "User Preference-aware Fake News Detection" paper.

GitHub

The GitHub Web and ML Developers dataset introduced in the "Multi-scale Attributed Node Embedding" paper.

FacebookPagePage

The Facebook Page-Page network dataset introduced in the "Multi-scale Attributed Node Embedding" paper.

LastFMAsia

The LastFM Asia Network dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper.

DeezerEurope

The Deezer Europe dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper.

GemsecDeezer

The Deezer User Network datasets introduced in the "GEMSEC: Graph Embedding with Self Clustering" paper.

Twitch

The Twitch Gamer networks introduced in the "Multi-scale Attributed Node Embedding" paper.

Airports

The Airports dataset from the "struc2vec: Learning Node Representations from Structural Identity" paper, where nodes denote airports and labels correspond to activity levels.

BAShapes

The BA-Shapes dataset from the "GNNExplainer: Generating Explanations for Graph Neural Networks" paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 "house"-structured graphs connected to it.

MalNetTiny

The MalNet Tiny dataset from the "A Large-Scale Database for Graph Representation Learning" paper.

OMDB

The Organic Materials Database (OMDB) of bulk organic crystals.

PolBlogs

The Political Blogs dataset from the "The Political Blogosphere and the 2004 US Election: Divided they Blog" paper.

EmailEUCore

An e-mail communication network of a large European research institution, taken from the "Local Higher-order Graph Clustering" paper.

StochasticBlockModelDataset

A synthetic graph dataset generated by the stochastic block model.

RandomPartitionGraphDataset

The random partition graph dataset from the "How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision" paper.

LINKXDataset

A variety of non-homophilous graph datasets from the "Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods" paper.

EllipticBitcoinDataset

The Elliptic Bitcoin dataset of Bitcoin transactions from the "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics" paper.

class KarateClub(transform: Optional[Callable] = None)[source]

Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. Every node is labeled by one of four classes obtained via modularity-based clustering, following the “Semi-supervised Classification with Graph Convolutional Networks” paper. Training is based on a single labeled example per class, i.e. a total number of 4 labeled nodes.

Parameters

transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

Stats:

#nodes

#edges

#features

#classes

34

156

34

4

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

  • cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)

Stats:

Name

#graphs

#nodes

#edges

#features

#classes

MUTAG

188

~17.9

~39.6

7

2

ENZYMES

600

~32.6

~124.3

3

6

PROTEINS

1,113

~39.1

~145.6

3

2

COLLAB

5,000

~74.5

~4914.4

0

3

IMDB-BINARY

1,000

~19.8

~193.1

0

2

REDDIT-BINARY

2,000

~429.6

~995.5

0

2

class GNNBenchmarkDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.

Note

The ZINC dataset is provided via torch_geometric.datasets.ZINC.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset (one of "PATTERN", "CLUSTER", "MNIST", "CIFAR10", "TSP", "CSL")

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name

#graphs

#nodes

#edges

#features

#classes

PATTERN

10,000

~118.9

~6,098.9

3

2

CLUSTER

10,000

~117.2

~4,303.9

7

6

MNIST

55,000

~70.6

~564.5

3

10

CIFAR10

45,000

~117.6

~941.2

5

10

TSP

10,000

~275.4

~6,885.0

2

2

CSL

150

~41.0

~164.0

0

10

class Planetoid(root: str, name: str, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").

  • split (string) –

    The type of dataset split ("public", "full", "geom-gcn", "random"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "geom-gcn", the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given. If set to "random", train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: "public")

  • num_train_per_class (int, optional) – The number of training samples per class in case of "random" split. (default: 20)

  • num_val (int, optional) – The number of validation samples in case of "random" split. (default: 500)

  • num_test (int, optional) – The number of test samples in case of "random" split. (default: 1000)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name

#nodes

#edges

#features

#classes

Cora

2,708

10,556

1,433

7

CiteSeer

3,327

9,104

3,703

6

PubMed

19,717

88,648

500

3

class FakeDataset(num_graphs: int = 1, avg_num_nodes: int = 1000, avg_degree: int = 10, num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A fake dataset that returns randomly generated Data objects.

Parameters
  • num_graphs (int, optional) – The number of graphs. (default: 1)

  • avg_num_nodes (int, optional) – The average number of nodes in a graph. (default: 1000)

  • avg_degree (int, optional) – The average degree per node. (default: 10)

  • num_channels (int, optional) – The number of node features. (default: 64)

  • edge_dim (int, optional) – The number of edge features. (default: 0)

  • num_classes (int, optional) – The number of classes in the dataset. (default: 10)

  • task (str, optional) – Whether to return node-level or graph-level labels ("node", "graph", "auto"). If set to "auto", will return graph-level labels if num_graphs > 1, and node-level labels other-wise. (default: "auto")

  • is_undirected (bool, optional) – Whether the graphs to generate are undirected. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • **kwargs (optional) – Additional attributes and their shapes e.g. global_features=5.

class FakeHeteroDataset(num_graphs: int = 1, num_node_types: int = 3, num_edge_types: int = 6, avg_num_nodes: int = 1000, avg_degree: int = 10, avg_num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A fake dataset that returns randomly generated HeteroData objects.

Parameters
  • num_graphs (int, optional) – The number of graphs. (default: 1)

  • num_node_types (int, optional) – The number of node types. (default: 3)

  • num_edge_types (int, optional) – The number of edge types. (default: 6)

  • avg_num_nodes (int, optional) – The average number of nodes in a graph. (default: 1000)

  • avg_degree (int, optional) – The average degree per node. (default: 10)

  • avg_num_channels (int, optional) – The average number of node features. (default: 64)

  • edge_dim (int, optional) – The number of edge features. (default: 0)

  • num_classes (int, optional) – The number of classes in the dataset. (default: 10)

  • task (str, optional) – Whether to return node-level or graph-level labels ("node", "graph", "auto"). If set to "auto", will return graph-level labels if num_graphs > 1, and node-level labels other-wise. (default: "auto")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • **kwargs (optional) – Additional attributes and their shapes e.g. global_features=5.

class NELL(root, transform=None, pre_transform=None)[source]

The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper. The dataset is processed as in the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.

Note

Entity nodes are described by sparse feature vectors of type torch_sparse.SparseTensor, which can be either used directly, or can be converted via data.x.to_dense(), data.x.to_scipy() or data.x.to_torch_sparse_coo_tensor().

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

65,755

251,550

61,278

186

class CitationFull(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper. Nodes represent documents and edges represent citation links. Datasets include citeseer, cora, cora_ml, dblp, pubmed.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cora", "Cora_ML" "CiteSeer", "DBLP", "PubMed").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name

#nodes

#edges

#features

#classes

Cora

19,793

126,842

8,710

70

Cora_ML

2,995

16,316

2,879

7

CiteSeer

4,230

10,674

602

6

DBLP

17,716

105,734

1,639

4

PubMed

19,717

88,648

500

3

class CoraFull(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Alias for torch_geometric.datasets.CitationFull with name="cora".

Stats:

#nodes

#edges

#features

#classes

19,793

126,842

8,710

70

class Coauthor(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent authors that are connected by an edge if they co-authored a paper. Given paper keywords for each author’s papers, the task is to map authors to their respective field of study.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("CS", "Physics").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name

#nodes

#edges

#features

#classes

CS

18,333

163,788

6,805

15

Physics

34,493

495,924

8,415

5

class Amazon(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent goods and edges represent that two goods are frequently bought together. Given product reviews as bag-of-words node features, the task is to map goods to their respective product category.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Computers", "Photo").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name

#nodes

#edges

#features

#classes

Computers

13,752

491,722

767

10

Photo

7,650

238,162

745

8

class PPI(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs

#nodes

#edges

#features

#tasks

20

~2,245.3

~61,318.4

50

121

class Reddit(root, transform=None, pre_transform=None)[source]

The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

232,965

114,615,892

602

41

class Reddit2(root, transform=None, pre_transform=None)[source]

The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.

Note

This is a sparser version of the original Reddit dataset (~23M edges instead of ~114M edges), and is used in papers such as SGC and GraphSAINT.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

232,965

23,213,838

602

41

class Flickr(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

89,250

899,756

500

7

class Yelp(root, transform=None, pre_transform=None)[source]

The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#tasks

716,847

13,954,819

300

100

class AmazonProducts(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

1,569,960

264,339,468

200

107

class QM7b(root, transform=None, pre_transform=None, pre_filter=None)[source]

The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs

#nodes

#edges

#features

#tasks

7,211

~15.4

~245.0

0

14

class QM9(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.

Target

Property

Description

Unit

0

\(\mu\)

Dipole moment

\(\textrm{D}\)

1

\(\alpha\)

Isotropic polarizability

\({a_0}^3\)

2

\(\epsilon_{\textrm{HOMO}}\)

Highest occupied molecular orbital energy

\(\textrm{eV}\)

3

\(\epsilon_{\textrm{LUMO}}\)

Lowest unoccupied molecular orbital energy

\(\textrm{eV}\)

4

\(\Delta \epsilon\)

Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)

\(\textrm{eV}\)

5

\(\langle R^2 \rangle\)

Electronic spatial extent

\({a_0}^2\)

6

\(\textrm{ZPVE}\)

Zero point vibrational energy

\(\textrm{eV}\)

7

\(U_0\)

Internal energy at 0K

\(\textrm{eV}\)

8

\(U\)

Internal energy at 298.15K

\(\textrm{eV}\)

9

\(H\)

Enthalpy at 298.15K

\(\textrm{eV}\)

10

\(G\)

Free energy at 298.15K

\(\textrm{eV}\)

11

\(c_{\textrm{v}}\)

Heat capavity at 298.15K

\(\frac{\textrm{cal}}{\textrm{mol K}}\)

12

\(U_0^{\textrm{ATOM}}\)

Atomization energy at 0K

\(\textrm{eV}\)

13

\(U^{\textrm{ATOM}}\)

Atomization energy at 298.15K

\(\textrm{eV}\)

14

\(H^{\textrm{ATOM}}\)

Atomization enthalpy at 298.15K

\(\textrm{eV}\)

15

\(G^{\textrm{ATOM}}\)

Atomization free energy at 298.15K

\(\textrm{eV}\)

16

\(A\)

Rotational constant

\(\textrm{GHz}\)

17

\(B\)

Rotational constant

\(\textrm{GHz}\)

18

\(C\)

Rotational constant

\(\textrm{GHz}\)

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs

#nodes

#edges

#features

#tasks

130,831

~18.0

~37.3

11

19

class MD17(root: str, name: str, train: Optional[bool] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets as well as all other datasets released by sGDML since then (15 in total).

For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection.

Note

Data objects contain no edge indices as these are most commonly constructed via the torch_geometric.transforms.RadiusGraph transform, with its cut-off being a hyperparameter.

Some of the trajectories were computed at different levels of theory, and for most molecules there exists two versions: a long trajectory on DFT level of theory and a short trajectory on coupled cluster level of theory. Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the name argument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via the train argument.

When using these datasets, make sure to cite the appropriate publications listed on the sGDML website.

Molecule

Level of Theory

Name

#Examples

Benzene

DFT

benzene

49,863

Benzene

DFT FHI-aims

benzene FHI-aims

627,983

Benzene

CCSD(T)

benzene CCSD(T)

1,500

Uracil

DFT

uracil

133,770

Naphthalene

DFT

napthalene

326,250

Aspirin

DFT

aspirin

211,762

Aspirin

CCSD

aspirin CCSD

1,500

Salicylic acid

DFT

salicylic acid

320,231

Malonaldehyde

DFT

malonaldehyde

993,237

Malonaldehyde

CCSD(T)

malonaldehyde CCSD(T)

1,500

Ethanol

DFT

ethanol

555,092

Ethanol

CCSD(T)

ethanol CCSD(T)

2,000

Toluene

DFT

toluene

442,790

Toluene

CCSD(T)

toluene CCSD(T)

1,501

Paracetamol

DFT

paracetamol

106,490

Azobenzene

DFT

azobenzene

99,999

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – Keyword of the trajectory that should be loaded.

  • train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ZINC(root, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized logP (also called constrained solubility in some works), given by y = logP - SAS - cycles, where logP is the water-octanol partition coefficient, SAS is the synthetic accessibility score, and cycles denotes the number of cycles with more than six atoms. Penalized logP is a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • subset (boolean, optional) –

    If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class AQSOL(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.

The aqueous solubility targets are collected from experimental measurements and standardized to LogS units in AqSolDB. These final values denote the property to regress in the AQSOL dataset. After filtering out few graphs with no bonds/edges, the total number of molecular graphs is 9,833. For each molecular graph, the node features are the types of heavy atoms and the edge features are the types of bonds between them, similar as in the ZINC dataset.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class MoleculeNet(root, name, transform=None, pre_transform=None, pre_filter=None)[source]

The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("ESOL", "FreeSolv", "Lipo", "PCBA", "MUV", "HIV", "BACE", "BBPB", "Tox21", "ToxCast", "SIDER", "ClinTox").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class Entities(root: str, name: str, hetero: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by node indices.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("AIFB", "MUTAG", "BGS", "AM").

  • hetero (bool, optional) – If set to True, will save the dataset as a HeteroData object. (default: False)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class RelLinkPredDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational link prediction datasets from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by sets of triplets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("FB15k-237").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GEDDataset(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper. GEDs can be accessed via the global attributes ged and norm_ged for all train/train graph pairs and all train/test graph pairs:

dataset = GEDDataset(root, name="LINUX")
data1, data2 = dataset[0], dataset[1]
ged = dataset.ged[data1.i, data2.i]  # GED between `data1` and `data2`.

Note that GEDs are not available if both graphs are from the test set. For evaluation, it is recommended to pair up each graph from the test set with each graph in the training set.

Note

ALKANE is missing GEDs for train/test graph pairs since they are not provided in the official datasets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset (one of "AIDS700nef", "LINUX", "ALKANE", "IMDBMulti").

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class AttributedGraphDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of attributed graph datasets from the “Scaling Attributed Network Embedding to Massive Graphs” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Wiki", "Cora" "CiteSeer", "PubMed", "BlogCatalog", "PPI", "Flickr", "Facebook", "Twitter", "TWeibo", "MAG").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class MNISTSuperpixels(root, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each. Every graph is labeled by one of 10 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class FAUST(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class DynamicFAUST(root: str, subjects: Optional[List[str]] = None, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • subjects (list, optional) – List of subjects to include in the dataset. Can include the subjects "50002", "50004", "50007", "50009", "50020", "50021", "50022", "50025", "50026", "50027". If set to None, the dataset will contain all subjects. (default: None)

  • categories (list, optional) – List of categories to include in the dataset. Can include the categories "chicken_wings", "hips", "jiggle_on_toes", "jumping_jacks", "knees", "light_hopping_loose", "light_hopping_stiff", "one_leg_jump", "one_leg_loose", "personal_move", "punching", "running_on_spot", "running_on_spot_bugfix", "shake_arms", "shake_hips", "shoulders". If set to None, the dataset will contain all categories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ShapeNet(root, categories=None, include_normals=True, split='trainval', transform=None, pre_transform=None, pre_filter=None)[source]

The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories. Each category is annotated with 2 to 6 parts.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • categories (string or [string], optional) – The category of the CAD models (one or a combination of "Airplane", "Bag", "Cap", "Car", "Chair", "Earphone", "Guitar", "Knife", "Lamp", "Laptop", "Motorbike", "Mug", "Pistol", "Rocket", "Skateboard", "Table"). Can be explicitly set to None to load all categories. (default: None)

  • include_normals (bool, optional) – If set to False, will not include normal vectors as input features to data.x. As a result, data.x will be None. (default: True)

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. (default: "trainval")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ModelNet(root, name='10', train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string, optional) – The name of the dataset ("10" for ModelNet10, "40" for ModelNet40). (default: "10")

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class CoMA(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SHREC2016(root, partiality, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper. The reference shape can be referenced via dataset.ref.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • partiality (string) – The partiality of the dataset (one of "Holes", "Cuts").

  • category (string) – The category of the dataset (one of "Cat", "Centaur", "David", "Dog", "Horse", "Michael", "Victoria", "Wolf").

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class TOSCA(root, categories=None, transform=None, pre_transform=None, pre_filter=None)[source]

The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes. Meshes within the same category have the same triangulation and an equal number of vertices numbered in a compatible way.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • categories (list, optional) – List of categories to include in the dataset. Can include the categories "Cat", "Centaur", "David", "Dog", "Gorilla", "Horse", "Michael", "Victoria", "Wolf". If set to None, the dataset will contain all categories. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PCPNetDataset(root, category, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. For each shape, surface normals and local curvatures are given as node features.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The training set category (one of "NoNoise", "Noisy", "VarDensity", "NoisyAndVarDensity" for split="train" or split="val", or one of "All", "LowNoise", "MedNoise", "HighNoise", :obj:”VarDensityStriped”, "VarDensityGradient" for split="test").

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class S3DIS(root, test_area=6, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • test_area (int, optional) – Which area to use for testing (1-6). (default: 6)

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GeometricShapes(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class BitcoinOTC(root: str, edge_window_size: int = 10, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • edge_window_size (int, optional) – The window size for the existence of an edge in the graph sequence since its initial creation. (default: 10)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ICEWS18(root, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GDELT(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class DBP15K(root: str, pair: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. Node features are given by pre-trained and aligned monolingual word embeddings from the “Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network” paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • pair (string) – The pair of languages ("en_zh", "en_fr", "en_ja", "zh_en", "fr_en", "ja_en").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WILLOWObjectClass(root, category, transform=None, pre_transform=None, pre_filter=None)[source]

The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Car", "Duck", "Face", "Motorbike", "Winebottle").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalVOCKeypoints(root, category, train=True, transform=None, pre_transform=None, pre_filter=None)[source]

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories. The dataset is pre-filtered to exclude difficult, occluded and truncated objects. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalPF(root, category, transform=None, pre_transform=None, pre_filter=None)[source]

The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SNAPDataset(root, name, transform=None, pre_transform=None, pre_filter=None)[source]

A variety of graph datasets collected from SNAP at Stanford University.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SuiteSparseMatrixCollection(root, group, name, transform=None, pre_transform=None)[source]

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • group (string) – The group of the sparse matrix.

  • name (string) – The name of the sparse matrix.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class AMiner(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type "paper", "author" and "venue". Venue categories and author research interests are available as ground truth labels for a subset of nodes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18(root, transform=None, pre_transform=None)[source]

The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

Note

The original WordNet18 dataset suffers from test leakage, i.e. more than 80% of test triplets can be found in the training set with another relation type. Therefore, it should not be used for research evaluation anymore. We recommend to use its cleaned version WordNet18RR instead.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18RR(root, transform=None, pre_transform=None)[source]

The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikiCS(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, is_undirected: Optional[bool] = None)[source]

The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • is_undirected (bool, optional) – Whether the graph is undirected. (default: True)

class WebKB(root, name, transform=None, pre_transform=None)[source]

The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The task is to classify the nodes into one of the five categories, student, project, course, staff, and faculty.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("Cornell", "Texas", "Wisconsin").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikipediaNetwork(root: str, name: str, geom_gcn_preprocess: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Wikipedia networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features represent several informative nouns in the Wikipedia pages. The task is to predict the average daily traffic of the web page.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("chameleon", "crocodile", "squirrel").

  • geom_gcn_preprocess (bool) – If set to True, will load the pre-processed data as introduced in the “Geom-GCN: Geometric Graph Convolutional Networks” <https://arxiv.org/abs/2002.05287>_, in which the average monthly traffic of the web page is converted into five categories to predict. If set to True, the dataset "crocodile" is not available.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Actor(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Each node corresponds to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. The task is to classify the nodes into five categories in term of words of actor’s Wikipedia.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class OGB_MAG(root: str, preprocess: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The ogbn-mag dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper. ogbn-mag is a heterogeneous graph composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensional word2vec feature vector, while all other node types are not associated with any input features. The task is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • preprocess (string, optional) – Pre-processes the original dataset by adding structural features ("metapath2vec", :obj:”TransE”) to featureless nodes. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DBLP(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the DBLP computer science bibliography website, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. DBLP is a heterogeneous graph containing four types of entities - authors (4,057 nodes), papers (14,328 nodes), terms (7,723 nodes), and conferences (20 nodes). The authors are divided into four research areas (database, data mining, artificial intelligence, information retrieval). Each author is described by a bag-of-words representation of their paper keywords.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class MovieLens(root, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, model_name: Optional[str] = 'all-MiniLM-L6-v2')[source]

A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type "movie" and "user". User ratings for movies are available as ground truth labels for the edges between the users and the movies ("user", "rates", "movie").

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • model_name (str) – Name of model used to transform movie titles to node features. The model comes from the`Huggingface SentenceTransformer <https://huggingface.co/sentence-transformers>`_.

class IMDB(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the Internet Movie Database (IMDB), as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. IMDB is a heterogeneous graph containing three types of entities - movies (4,278 nodes), actors (5,257 nodes), and directors (2,081 nodes). The movies are divided into three classes (action, comedy, drama) according to their genre. Movie features correspond to elements of a bag-of-words representation of its plot keywords.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class LastFM(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the last.fm music website keeping track of users’ listining information from various sources, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. last.fm is a heterogeneous graph containing three types of entities - users (1,892 nodes), artists (17,632 nodes), and artist tags (1,088 nodes). This dataset can be used for link prediction, and no labels or features are provided.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class HGBDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of heterogeneous graph benchmark datasets from the “Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks” paper.

Note

Test labels are randomly given to prevent data leakage issues. If you want to obtain final test performance, you will need to submit your model predictions to the HGB leaderboard.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset (one of "ACM", "DBLP", "Freebase", "IMDB")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class JODIEDataset(root, name, transform=None, pre_transform=None)[source]
class MixHopSyntheticDataset(root, homophily, transform=None, pre_transform=None)[source]

The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). All graphs have 5,000 nodes, where each node corresponds to 1 out of 10 classes. The feature values of the nodes are sampled from a 2D Gaussian distribution, which are distinct for each class.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • homophily (float) – The degree of homophily (one of 0.0, 0.1, …, 0.9).

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class UPFD(root, name, feature, split='train', transform=None, pre_transform=None, pre_filter=None)[source]

The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper. It includes two sets of tree-structured fake & real news propagation graphs extracted from Twitter. For a single graph, the root node represents the source news, and leaf nodes represent Twitter users who retweeted the same root news. A user node has an edge to the news node if and only if the user retweeted the root news directly. Two user nodes have an edge if and only if one user retweeted the root news from the other user. Four different node features are encoded using different encoders. Please refer to GNN-FakeNews repo for more details.

Note

For an example of using UPFD, see examples/upfd.py.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the graph set ("politifact", "gossipcop").

  • feature (string) – The node feature type ("profile", "spacy", "bert", "content"). If set to "profile", the 10-dimensional node feature is composed of ten Twitter user profile attributes. If set to "spacy", the 300-dimensional node feature is composed of Twitter user historical tweets encoded by the spaCy word2vec encoder. If set to "bert", the 768-dimensional node feature is composed of Twitter user historical tweets encoded by the bert-as-service. If set to "content", the 310-dimensional node feature is composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector.

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GitHub(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The GitHub Web and ML Developers dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent developers on GitHub and edges are mutual follower relationships. It contains 37,300 nodes, 578,006 edges, 128 node features and 2 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class FacebookPagePage(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Facebook Page-Page network dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent verified pages on Facebook and edges are mutual likes. It contains 22,470 nodes, 342,004 edges, 128 node features and 4 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class LastFMAsia(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The LastFM Asia Network dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent LastFM users from Asia and edges are friendships. It contains 7,624 nodes, 55,612 edges, 128 node features and 18 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DeezerEurope(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer Europe dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent European users of Deezer and edges are mutual follower relationships. It contains 28,281 nodes, 185,504 edges, 128 node features and 2 classes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GemsecDeezer(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer User Network datasets introduced in the “GEMSEC: Graph Embedding with Self Clustering” paper. Nodes represent Deezer user and edges are mutual friendships. The task is multi-label multi-class node classification about the genres liked by the users.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("HU", "HR", "RO").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Twitch(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Twitch Gamer networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent gamers on Twitch and edges are followerships between them. Node features represent embeddings of games played by the Twitch users. The task is to predict whether a user streams mature content.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("DE", "EN", "ES", "FR", "PT", "RU").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Airports(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Airports dataset from the “struc2vec: Learning Node Representations from Structural Identity” paper, where nodes denote airports and labels correspond to activity levels. Features are given by one-hot encoded node identifiers, as described in the “GraLSP: Graph Neural Networks with Local Structural Patterns” ` paper.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("USA", "Brazil", "Europe").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class BAShapes(connection_distribution: str = 'random', transform: Optional[Callable] = None)[source]

The BA-Shapes dataset from the “GNNExplainer: Generating Explanations for Graph Neural Networks” paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 “house”-structured graphs connected to it.

Parameters
  • connection_distribution (string, optional) – Specifies how the houses and the BA graph get connected. Valid inputs are "random" (random BA graph nodes are selected for connection to the houses), and "uniform" (uniformly distributed BA graph nodes are selected for connection to the houses). (default: "random")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class MalNetTiny(root: str, split: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The MalNet Tiny dataset from the “A Large-Scale Database for Graph Representation Learning” paper. MalNetTiny contains 5,000 malicious and benign software function call graphs across 5 different types. Each graph contains at most 5k nodes.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. If None, loads the entire dataset. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class OMDB(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Organic Materials Database (OMDB) of bulk organic crystals.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PolBlogs(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Political Blogs dataset from the “The Political Blogosphere and the 2004 US Election: Divided they Blog” paper.

Polblogs is a graph with 1,490 vertices (representing political blogs) and 19,025 edges (links between blogs). The links are automatically extracted from a crawl of the front page of the blog. Each vertex receives a label indicating the political leaning of the blog: liberal or conservative.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class EmailEUCore(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

An e-mail communication network of a large European research institution, taken from the “Local Higher-order Graph Clustering” paper. Nodes indicate members of the institution. An edge between a pair of members indicates that they exchanged at least one email. Node labels indicate membership to one of the 42 departments.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class StochasticBlockModelDataset(root: str, block_sizes: Union[List[int], Tensor], edge_probs: Union[List[List[float]], Tensor], num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A synthetic graph dataset generated by the stochastic block model. The node features of each block are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the sklearn.datasets.make_classification() method.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • block_sizes ([int] or LongTensor) – The sizes of blocks.

  • edge_probs ([[float]] or FloatTensor) – The density of edges going from each block to each other block. Must be symmetric if the graph is undirected.

  • num_channels (int, optional) – The number of node features. If given as None, node features are not generated. (default: None)

  • is_undirected (bool, optional) – Whether the graph to generate is undirected. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • **kwargs (optional) – The keyword arguments that are passed down to the sklearn.datasets.make_classification() method for drawing node features.

class RandomPartitionGraphDataset(root, num_classes: int, num_nodes_per_class: int, node_homophily_ratio: float, average_degree: float, num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

The random partition graph dataset from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper. This is a synthetic graph of communities controlled by the node homophily and the average degree, and each community is considered as a class. The node features are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the sklearn.datasets.make_classification() method.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • num_classes (int) – The number of classes.

  • num_nodes_per_class (int) – The number of nodes per class.

  • node_homophily_ratio (float) – The degree of node homophily.

  • average_degree (float) – The average degree of the graph.

  • num_channels (int, optional) – The number of node features. If given as None, node features are not generated. (default: None)

  • is_undirected (bool, optional) – Whether the graph to generate is undirected. (default: True)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • **kwargs (optional) – The keyword arguments that are passed down to sklearn.datasets.make_classification() method in drawing node features.

class LINKXDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of non-homophilous graph datasets from the “Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods” paper.

Note

Some of the datasets provided in LINKXDataset are from other sources, but have been updated with new features and/or labels.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • name (string) – The name of the dataset ("penn94", "reed98", "amherst41", "cornell5", "johnshopkins55", "genius").

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class EllipticBitcoinDataset(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Elliptic Bitcoin dataset of Bitcoin transactions from the “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics” paper.

EllipticBitcoinDataset maps Bitcoin transactions to real entities belonging to licit categories (exchanges, wallet providers, miners, licit services, etc.) versus illicit ones (scams, malware, terrorist organizations, ransomware, Ponzi schemes, etc.)

There exists 203,769 node transactions and 234,355 directed edge payments flows, with two percent of nodes (4,545) labelled as illicit, and twenty-one percent of nodes (42,019) labelled as licit. The remaining transactions are unknown.

Parameters
  • root (string) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes

#edges

#features

#classes

203,769

234,355

165

2