torch_geometric.datasets

`KarateClub`	Zachary's karate club network from the "An Information Flow Model for Conflict and Fission in Small Groups" paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges.
`TUDataset`	A variety of graph kernel benchmark datasets, .e.g. "IMDB-BINARY", "REDDIT-BINARY" or "PROTEINS", collected from the TU Dortmund University.
`GNNBenchmarkDataset`	A variety of artificially and semi-artificially generated graph datasets from the "Benchmarking Graph Neural Networks" paper.
`Planetoid`	The citation network datasets "Cora", "CiteSeer" and "PubMed" from the "Revisiting Semi-Supervised Learning with Graph Embeddings" paper.
`FakeDataset`	A fake dataset that returns randomly generated `Data` objects.
`FakeHeteroDataset`	A fake dataset that returns randomly generated `HeteroData` objects.
`NELL`	The NELL dataset, a knowledge graph from the "Toward an Architecture for Never-Ending Language Learning" paper.
`CitationFull`	The full citation network datasets from the "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking" paper.
`CoraFull`	Alias for `torch_geometric.datasets.CitationFull` with `name="cora"`.
`Coauthor`	The Coauthor CS and Coauthor Physics networks from the "Pitfalls of Graph Neural Network Evaluation" paper.
`Amazon`	The Amazon Computers and Amazon Photo networks from the "Pitfalls of Graph Neural Network Evaluation" paper.
`PPI`	The protein-protein interaction networks from the "Predicting Multicellular Function through Multi-layer Tissue Networks" paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).
`Reddit`	The Reddit dataset from the "Inductive Representation Learning on Large Graphs" paper, containing Reddit posts belonging to different communities.
`Reddit2`	The Reddit dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing Reddit posts belonging to different communities.
`Flickr`	The Flickr dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing descriptions and common properties of images.
`Yelp`	The Yelp dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing customer reviewers and their friendship.
`AmazonProducts`	The Amazon dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing products and its categories.
`QM7b`	The QM7b dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of 7,211 molecules with 14 regression targets.
`QM9`	The QM9 dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of about 130,000 molecules with 19 regression targets.
`MD17`	A variety of ab-initio molecular dynamics trajectories from the authors of sGDML.
`ZINC`	The ZINC dataset from the ZINC database and the "Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules" paper, containing about 250,000 molecular graphs with up to 38 heavy atoms.
`AQSOL`	The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.
`MoleculeNet`	The MoleculeNet benchmark collection from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, containing datasets from physical chemistry, biophysics and physiology.
`Entities`	The relational entities networks "AIFB", "MUTAG", "BGS" and "AM" from the "Modeling Relational Data with Graph Convolutional Networks" paper.
`RelLinkPredDataset`	The relational link prediction datasets from the "Modeling Relational Data with Graph Convolutional Networks" paper.
`GEDDataset`	The GED datasets from the "Graph Edit Distance Computation via Graph Neural Networks" paper.
`AttributedGraphDataset`	A variety of attributed graph datasets from the "Scaling Attributed Network Embedding to Massive Graphs" paper.
`MNISTSuperpixels`	MNIST superpixels dataset from the "Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs" paper, containing 70,000 graphs with 75 nodes each.
`FAUST`	The FAUST humans dataset from the "FAUST: Dataset and Evaluation for 3D Mesh Registration" paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.
`DynamicFAUST`	The dynamic FAUST humans dataset from the "Dynamic FAUST: Registering Human Bodies in Motion" paper.
`ShapeNet`	The ShapeNet part level segmentation dataset from the "A Scalable Active Framework for Region Annotation in 3D Shape Collections" paper, containing about 17,000 3D shape point clouds from 16 shape categories.
`ModelNet`	The ModelNet10/40 datasets from the "3D ShapeNets: A Deep Representation for Volumetric Shapes" paper, containing CAD models of 10 and 40 categories, respectively.
`CoMA`	The CoMA 3D faces dataset from the "Generating 3D faces using Convolutional Mesh Autoencoders" paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.
`SHREC2016`	The SHREC 2016 partial matching dataset from the "SHREC'16: Partial Matching of Deformable Shapes" paper.
`TOSCA`	The TOSCA dataset from the "Numerical Geometry of Non-Ridig Shapes" book, containing 80 meshes.
`PCPNetDataset`	The PCPNet dataset from the "PCPNet: Learning Local Shape Properties from Raw Point Clouds" paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points.
`S3DIS`	The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the "3D Semantic Parsing of Large-Scale Indoor Spaces" paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).
`GeometricShapes`	Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.
`BitcoinOTC`	The Bitcoin-OTC dataset from the "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" paper, consisting of 138 who-trusts-whom networks of sequential time steps.
`ICEWS18`	The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).
`GDELT`	The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).
`DBP15K`	The DBP15K dataset from the "Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding" paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version.
`WILLOWObjectClass`	The WILLOW-ObjectClass dataset from the "Learning Graphs to Match" paper, containing 10 equal keypoints of at least 40 images in each category.
`PascalVOCKeypoints`	The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations" paper, containing 0 to 23 keypoints per example over 20 categories.
`PascalPF`	The Pascal-PF dataset from the "Proposal Flow" paper, containing 4 to 16 keypoints per example over 20 categories.
`SNAPDataset`	A variety of graph datasets collected from SNAP at Stanford University.
`SuiteSparseMatrixCollection`	A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.
`AMiner`	The heterogeneous AMiner dataset from the "metapath2vec: Scalable Representation Learning for Heterogeneous Networks" paper, consisting of nodes from type `"paper"`, `"author"` and `"venue"`.
`WordNet18`	The WordNet18 dataset from the "Translating Embeddings for Modeling Multi-Relational Data" paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.
`WordNet18RR`	The WordNet18RR dataset from the "Convolutional 2D Knowledge Graph Embeddings" paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.
`WikiCS`	The semi-supervised Wikipedia-based dataset from the "Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks" paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.
`WebKB`	The WebKB datasets used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper.
`WikipediaNetwork`	The Wikipedia networks introduced in the "Multi-scale Attributed Node Embedding" paper.
`Actor`	The actor-only induced subgraph of the film-director-actor-writer network used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper.
`OGB_MAG`	The ogbn-mag dataset from the "Open Graph Benchmark: Datasets for Machine Learning on Graphs" paper.
`DBLP`	A subset of the DBLP computer science bibliography website, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.
`MovieLens`	A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type `"movie"` and `"user"`.
`IMDB`	A subset of the Internet Movie Database (IMDB), as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.
`LastFM`	A subset of the last.fm music website keeping track of users' listining information from various sources, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper.
`HGBDataset`	A variety of heterogeneous graph benchmark datasets from the "Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks" paper.
`JODIEDataset`
`MixHopSyntheticDataset`	The MixHop synthetic dataset from the "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9).
`UPFD`	The tree-structured fake news propagation graph classification dataset from the "User Preference-aware Fake News Detection" paper.
`GitHub`	The GitHub Web and ML Developers dataset introduced in the "Multi-scale Attributed Node Embedding" paper.
`FacebookPagePage`	The Facebook Page-Page network dataset introduced in the "Multi-scale Attributed Node Embedding" paper.
`LastFMAsia`	The LastFM Asia Network dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper.
`DeezerEurope`	The Deezer Europe dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper.
`GemsecDeezer`	The Deezer User Network datasets introduced in the "GEMSEC: Graph Embedding with Self Clustering" paper.
`Twitch`	The Twitch Gamer networks introduced in the "Multi-scale Attributed Node Embedding" paper.
`Airports`	The Airports dataset from the "struc2vec: Learning Node Representations from Structural Identity" paper, where nodes denote airports and labels correspond to activity levels.
`BAShapes`	The BA-Shapes dataset from the "GNNExplainer: Generating Explanations for Graph Neural Networks" paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 "house"-structured graphs connected to it.
`LRGBDataset`	The "Long Range Graph Benchmark (LRGB)" datasets which is a collection of 5 graph learning datasets with tasks that are based on long-range dependencies in graphs.
`MalNetTiny`	The MalNet Tiny dataset from the "A Large-Scale Database for Graph Representation Learning" paper.
`OMDB`	The Organic Materials Database (OMDB) of bulk organic crystals.
`PolBlogs`	The Political Blogs dataset from the "The Political Blogosphere and the 2004 US Election: Divided they Blog" paper.
`EmailEUCore`	An e-mail communication network of a large European research institution, taken from the "Local Higher-order Graph Clustering" paper.
`StochasticBlockModelDataset`	A synthetic graph dataset generated by the stochastic block model.
`RandomPartitionGraphDataset`	The random partition graph dataset from the "How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision" paper.
`LINKXDataset`	A variety of non-homophilous graph datasets from the "Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods" paper.
`EllipticBitcoinDataset`	The Elliptic Bitcoin dataset of Bitcoin transactions from the "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics" paper.
`DGraphFin`	The DGraphFin networks from the "DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection" paper.
`HydroNet`	The HydroNet dataest from the "HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data" paper, consisting of 5 million water clusters held together by hydrogen bonding networks.

class KarateClub(transform: Optional[Callable] = None)[source]

Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. Every node is labeled by one of four classes obtained via modularity-based clustering, following the “Semi-supervised Classification with Graph Convolutional Networks” paper. Training is based on a single labeled example per class, i.e. a total number of 4 labeled nodes.

Parameters: transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

Stats:

#nodes	#edges	#features	#classes
34	156	34	4

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)
use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)
cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)

Stats:

Name	#graphs	#nodes	#edges	#features	#classes
MUTAG	188	~17.9	~39.6	7	2
ENZYMES	600	~32.6	~124.3	3	6
PROTEINS	1,113	~39.1	~145.6	3	2
COLLAB	5,000	~74.5	~4914.4	0	3
IMDB-BINARY	1,000	~19.8	~193.1	0	2
REDDIT-BINARY	2,000	~429.6	~995.5	0	2
…

class GNNBenchmarkDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.

Note

The ZINC dataset is provided via torch_geometric.datasets.ZINC.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "PATTERN", "CLUSTER", "MNIST", "CIFAR10", "TSP", "CSL")
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#features	#classes
PATTERN	10,000	~118.9	~6,098.9	3	2
CLUSTER	10,000	~117.2	~4,303.9	7	6
MNIST	55,000	~70.6	~564.5	3	10
CIFAR10	45,000	~117.6	~941.2	5	10
TSP	10,000	~275.4	~6,885.0	2	2
CSL	150	~41.0	~164.0	0	10

class Planetoid(root: str, name: str, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cora", "CiteSeer", "PubMed").
split (string) –
The type of dataset split ("public", "full", "geom-gcn", "random"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "geom-gcn", the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given. If set to "random", train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: "public")
num_train_per_class (int, optional) – The number of training samples per class in case of "random" split. (default: 20)
num_val (int, optional) – The number of validation samples in case of "random" split. (default: 500)
num_test (int, optional) – The number of test samples in case of "random" split. (default: 1000)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name	#nodes	#edges	#features	#classes
Cora	2,708	10,556	1,433	7
CiteSeer	3,327	9,104	3,703	6
PubMed	19,717	88,648	500	3

class FakeDataset(num_graphs: int = 1, avg_num_nodes: int = 1000, avg_degree: int = 10, num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A fake dataset that returns randomly generated Data objects.

Parameters

num_graphs (int, optional) – The number of graphs. (default: 1)
avg_num_nodes (int, optional) – The average number of nodes in a graph. (default: 1000)
avg_degree (int, optional) – The average degree per node. (default: 10)
num_channels (int, optional) – The number of node features. (default: 64)
edge_dim (int, optional) – The number of edge features. (default: 0)
num_classes (int, optional) – The number of classes in the dataset. (default: 10)
task (str, optional) – Whether to return node-level or graph-level labels ("node", "graph", "auto"). If set to "auto", will return graph-level labels if num_graphs > 1, and node-level labels other-wise. (default: "auto")
is_undirected (bool, optional) – Whether the graphs to generate are undirected. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
**kwargs (optional) – Additional attributes and their shapes e.g. global_features=5.

class FakeHeteroDataset(num_graphs: int = 1, num_node_types: int = 3, num_edge_types: int = 6, avg_num_nodes: int = 1000, avg_degree: int = 10, avg_num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A fake dataset that returns randomly generated HeteroData objects.

Parameters

num_graphs (int, optional) – The number of graphs. (default: 1)
num_node_types (int, optional) – The number of node types. (default: 3)
num_edge_types (int, optional) – The number of edge types. (default: 6)
avg_num_nodes (int, optional) – The average number of nodes in a graph. (default: 1000)
avg_degree (int, optional) – The average degree per node. (default: 10)
avg_num_channels (int, optional) – The average number of node features. (default: 64)
edge_dim (int, optional) – The number of edge features. (default: 0)
num_classes (int, optional) – The number of classes in the dataset. (default: 10)
task (str, optional) – Whether to return node-level or graph-level labels ("node", "graph", "auto"). If set to "auto", will return graph-level labels if num_graphs > 1, and node-level labels other-wise. (default: "auto")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
**kwargs (optional) – Additional attributes and their shapes e.g. global_features=5.

class NELL(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper. The dataset is processed as in the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.

Note

Entity nodes are described by sparse feature vectors of type torch_sparse.SparseTensor, which can be either used directly, or can be converted via data.x.to_dense(), data.x.to_scipy() or data.x.to_torch_sparse_coo_tensor().

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
65,755	251,550	61,278	186

class CitationFull(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper. Nodes represent documents and edges represent citation links. Datasets include citeseer, cora, cora_ml, dblp, pubmed.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cora", "Cora_ML" "CiteSeer", "DBLP", "PubMed").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name	#nodes	#edges	#features	#classes
Cora	19,793	126,842	8,710	70
Cora_ML	2,995	16,316	2,879	7
CiteSeer	4,230	10,674	602	6
DBLP	17,716	105,734	1,639	4
PubMed	19,717	88,648	500	3

class CoraFull(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Alias for torch_geometric.datasets.CitationFull with name="cora".

Stats:

#nodes	#edges	#features	#classes
19,793	126,842	8,710	70

class Coauthor(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent authors that are connected by an edge if they co-authored a paper. Given paper keywords for each author’s papers, the task is to map authors to their respective field of study.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("CS", "Physics").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name	#nodes	#edges	#features	#classes
CS	18,333	163,788	6,805	15
Physics	34,493	495,924	8,415	5

class Amazon(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent goods and edges represent that two goods are frequently bought together. Given product reviews as bag-of-words node features, the task is to map goods to their respective product category.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Computers", "Photo").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name	#nodes	#edges	#features	#classes
Computers	13,752	491,722	767	10
Photo	7,650	238,162	745	8

class PPI(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#tasks
20	~2,245.3	~61,318.4	50	121

class Reddit(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
232,965	114,615,892	602	41

class Reddit2(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.

Note

This is a sparser version of the original Reddit dataset (~23M edges instead of ~114M edges), and is used in papers such as SGC and GraphSAINT.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
232,965	23,213,838	602	41

class Flickr(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
89,250	899,756	500	7

class Yelp(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#tasks
716,847	13,954,819	300	100

class AmazonProducts(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
1,569,960	264,339,468	200	107

class QM7b(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#tasks
7,211	~15.4	~245.0	0	14

class QM9(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.

Target	Property	Description	Unit
0	\(\mu\)	Dipole moment	\(\textrm{D}\)
1	\(\alpha\)	Isotropic polarizability	\({a_0}^3\)
2	\(\epsilon_{\textrm{HOMO}}\)	Highest occupied molecular orbital energy	\(\textrm{eV}\)
3	\(\epsilon_{\textrm{LUMO}}\)	Lowest unoccupied molecular orbital energy	\(\textrm{eV}\)
4	\(\Delta \epsilon\)	Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)	\(\textrm{eV}\)
5	\(\langle R^2 \rangle\)	Electronic spatial extent	\({a_0}^2\)
6	\(\textrm{ZPVE}\)	Zero point vibrational energy	\(\textrm{eV}\)
7	\(U_0\)	Internal energy at 0K	\(\textrm{eV}\)
8	\(U\)	Internal energy at 298.15K	\(\textrm{eV}\)
9	\(H\)	Enthalpy at 298.15K	\(\textrm{eV}\)
10	\(G\)	Free energy at 298.15K	\(\textrm{eV}\)
11	\(c_{\textrm{v}}\)	Heat capavity at 298.15K	\(\frac{\textrm{cal}}{\textrm{mol K}}\)
12	\(U_0^{\textrm{ATOM}}\)	Atomization energy at 0K	\(\textrm{eV}\)
13	\(U^{\textrm{ATOM}}\)	Atomization energy at 298.15K	\(\textrm{eV}\)
14	\(H^{\textrm{ATOM}}\)	Atomization enthalpy at 298.15K	\(\textrm{eV}\)
15	\(G^{\textrm{ATOM}}\)	Atomization free energy at 298.15K	\(\textrm{eV}\)
16	\(A\)	Rotational constant	\(\textrm{GHz}\)
17	\(B\)	Rotational constant	\(\textrm{GHz}\)
18	\(C\)	Rotational constant	\(\textrm{GHz}\)

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#tasks
130,831	~18.0	~37.3	11	19

class MD17(root: str, name: str, train: Optional[bool] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets as well as all other datasets released by sGDML since then (15 in total).

For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection.

Note

Data objects contain no edge indices as these are most commonly constructed via the torch_geometric.transforms.RadiusGraph transform, with its cut-off being a hyperparameter.

Some of the trajectories were computed at different levels of theory, and for most molecules there exists two versions: a long trajectory on DFT level of theory and a short trajectory on coupled cluster level of theory. Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the name argument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via the train argument.

When using these datasets, make sure to cite the appropriate publications listed on the sGDML website.

Molecule	Level of Theory	Name	#Examples
Benzene	DFT	`benzene`	49,863
Benzene	DFT FHI-aims	`benzene FHI-aims`	627,983
Benzene	CCSD(T)	`benzene CCSD(T)`	1,500
Uracil	DFT	`uracil`	133,770
Naphthalene	DFT	`napthalene`	326,250
Aspirin	DFT	`aspirin`	211,762
Aspirin	CCSD	`aspirin CCSD`	1,500
Salicylic acid	DFT	`salicylic acid`	320,231
Malonaldehyde	DFT	`malonaldehyde`	993,237
Malonaldehyde	CCSD(T)	`malonaldehyde CCSD(T)`	1,500
Ethanol	DFT	`ethanol`	555,092
Ethanol	CCSD(T)	`ethanol CCSD(T)`	2,000
Toluene	DFT	`toluene`	442,790
Toluene	CCSD(T)	`toluene CCSD(T)`	1,501
Paracetamol	DFT	`paracetamol`	106,490
Azobenzene	DFT	`azobenzene`	99,999

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – Keyword of the trajectory that should be loaded.
train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes
Benzene FHI-aims	49,863	12
Benzene	627,983	12
Benzene CCSD-T	1,500	12
Uracil	133,770	12
Naphthalene	326,250	10
Aspirin	211,762	21
Aspirin CCSD-T	1,500	21
Salicylic acid	320,231	16
Malonaldehyde	993,237	9
Malonaldehyde CCSD-T	1,500	9
Ethanol	555,092	9
Ethanol CCSD-T	2000	9
Toluene	442,790	15
Toluene CCSD-T	1,501	15
Paracetamol	106,490	20
Azobenzene	99,999	24

class ZINC(root: str, subset: bool = False, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized logP (also called constrained solubility in some works), given by y = logP - SAS - cycles, where logP is the water-octanol partition coefficient, SAS is the synthetic accessibility score, and cycles denotes the number of cycles with more than six atoms. Penalized logP is a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.

Parameters

root (string) – Root directory where the dataset should be saved.
subset (boolean, optional) –
If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#features	#classes
ZINC Full	249,456	~23.2	~49.8	1	1
ZINC Subset	12,000	~23.2	~49.8	1	1

class AQSOL(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.

The aqueous solubility targets are collected from experimental measurements and standardized to LogS units in AqSolDB. These final values denote the property to regress in the AQSOL dataset. After filtering out few graphs with no bonds/edges, the total number of molecular graphs is 9,833. For each molecular graph, the node features are the types of heavy atoms and the edge features are the types of bonds between them, similar as in the ZINC dataset.

Parameters

root (string) – Root directory where the dataset should be saved.
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#classes
9,833	~17.6	~35.8	1	1

class MoleculeNet(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("ESOL", "FreeSolv", "Lipo", "PCBA", "MUV", "HIV", "BACE", "BBPB", "Tox21", "ToxCast", "SIDER", "ClinTox").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#features	#classes
ESOL	1,128	~13.3	~27.4	9	1
FreeSolv	642	~8.7	~16.8	9	1
Lipophilicity	4,200	~27.0	~59.0	9	1
PCBA	437,929	~26.0	~56.2	9	128
MUV	93,087	~24.2	~52.6	9	17
HIV	41,127	~25.5	~54.9	9	1
BACE	1513	~34.1	~73.7	9	1
BBPB	2,050	~23.9	~51.6	9	1
Tox21	7,831	~18.6	~38.6	9	12
ToxCast	8,597	~18.7	~38.4	9	617
SIDER	1,427	~33.6	~70.7	9	27
ClinTox	1,484	~26.1	~55.5	9	2

class Entities(root: str, name: str, hetero: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by node indices.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("AIFB", "MUTAG", "BGS", "AM").
hetero (bool, optional) – If set to True, will save the dataset as a HeteroData object. (default: False)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

Name	#nodes	#edges	#classes
AIFB	8,285	58,086	4
AM	1,666,764	11,976,642	11
MUTAG	23,644	148,454	2
BGS	333,845	1,832,398	2

class RelLinkPredDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The relational link prediction datasets from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by sets of triplets.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("FB15k-237").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
14,541	544,230	0	0

class GEDDataset(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper. GEDs can be accessed via the global attributes ged and norm_ged for all train/train graph pairs and all train/test graph pairs:

dataset = GEDDataset(root, name="LINUX")
data1, data2 = dataset[0], dataset[1]
ged = dataset.ged[data1.i, data2.i]  # GED between `data1` and `data2`.

Note that GEDs are not available if both graphs are from the test set. For evaluation, it is recommended to pair up each graph from the test set with each graph in the training set.

Note

ALKANE is missing GEDs for train/test graph pairs since they are not provided in the official datasets.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "AIDS700nef", "LINUX", "ALKANE", "IMDBMulti").
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#features
AIDS700nef	700	~8.9	~17.6	29
LINUX	1,000	~7.6	~13.9	0
ALKANE	150	~8.9	~15.8	0
IMDBMulti	1,500	~13.0	~131.9	0

class AttributedGraphDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of attributed graph datasets from the “Scaling Attributed Network Embedding to Massive Graphs” paper.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Wiki", "Cora" "CiteSeer", "PubMed", "BlogCatalog", "PPI", "Flickr", "Facebook", "Twitter", "TWeibo", "MAG").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class MNISTSuperpixels(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each. Every graph is labeled by one of 10 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#classes
70,000	75	~1,393.0	1	10

class FAUST(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#classes
100	6,890	41,328	3	10

class DynamicFAUST(root: str, subjects: Optional[List[str]] = None, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
subjects (list, optional) – List of subjects to include in the dataset. Can include the subjects "50002", "50004", "50007", "50009", "50020", "50021", "50022", "50025", "50026", "50027". If set to None, the dataset will contain all subjects. (default: None)
categories (list, optional) – List of categories to include in the dataset. Can include the categories "chicken_wings", "hips", "jiggle_on_toes", "jumping_jacks", "knees", "light_hopping_loose", "light_hopping_stiff", "one_leg_jump", "one_leg_loose", "personal_move", "punching", "running_on_spot", "running_on_spot_bugfix", "shake_arms", "shake_hips", "shoulders". If set to None, the dataset will contain all categories. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class ShapeNet(root: str, categories: Optional[Union[str, List[str]]] = None, include_normals: bool = True, split: str = 'trainval', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories. Each category is annotated with 2 to 6 parts.

Parameters

root (string) – Root directory where the dataset should be saved.
categories (string or [string], optional) – The category of the CAD models (one or a combination of "Airplane", "Bag", "Cap", "Car", "Chair", "Earphone", "Guitar", "Knife", "Lamp", "Laptop", "Motorbike", "Mug", "Pistol", "Rocket", "Skateboard", "Table"). Can be explicitly set to None to load all categories. (default: None)
include_normals (bool, optional) – If set to False, will not include normal vectors as input features to data.x. As a result, data.x will be None. (default: True)
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. (default: "trainval")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#classes
16,881	~2,616.2	0	3	50

class ModelNet(root: str, name: str = '10', train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string, optional) – The name of the dataset ("10" for ModelNet10, "40" for ModelNet40). (default: "10")
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#features	#classes
ModelNet10	4,899	~9,508.2	~37,450.5	3	10
ModelNet40	12,311	~17,744.4	~66,060.9	3	40

class CoMA(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

#graphs	#nodes	#edges	#features	#classes
20,465	5,023	29,990	3	12

class SHREC2016(root: str, partiality: str, category: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper. The reference shape can be referenced via dataset.ref.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
partiality (string) – The partiality of the dataset (one of "Holes", "Cuts").
category (string) – The category of the dataset (one of "Cat", "Centaur", "David", "Dog", "Horse", "Michael", "Victoria", "Wolf").
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class TOSCA(root: str, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes. Meshes within the same category have the same triangulation and an equal number of vertices numbered in a compatible way.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
categories (list, optional) – List of categories to include in the dataset. Can include the categories "Cat", "Centaur", "David", "Dog", "Gorilla", "Horse", "Michael", "Victoria", "Wolf". If set to None, the dataset will contain all categories. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PCPNetDataset(root: str, category: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. For each shape, surface normals and local curvatures are given as node features.

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The training set category (one of "NoNoise", "Noisy", "VarDensity", "NoisyAndVarDensity" for split="train" or split="val", or one of "All", "LowNoise", "MedNoise", "HighNoise", :obj:”VarDensityStriped”, "VarDensityGradient" for split="test").
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class S3DIS(root: str, test_area: int = 6, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).

Parameters

root (string) – Root directory where the dataset should be saved.
test_area (int, optional) – Which area to use for testing (1-6). (default: 6)
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GeometricShapes(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.

Note

Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the torch_geometric.transforms.FaceToEdge as pre_transform. To convert the mesh to a point cloud, use the torch_geometric.transforms.SamplePoints as transform to sample a fixed number of points on the mesh faces according to their face area.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class BitcoinOTC(root: str, edge_window_size: int = 10, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.

Parameters

root (string) – Root directory where the dataset should be saved.
edge_window_size (int, optional) – The window size for the existence of an edge in the graph sequence since its initial creation. (default: 10)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class ICEWS18(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GDELT(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).

Parameters

root (string) – Root directory where the dataset should be saved.
split (string) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class DBP15K(root: str, pair: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. Node features are given by pre-trained and aligned monolingual word embeddings from the “Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network” paper.

Parameters

root (string) – Root directory where the dataset should be saved.
pair (string) – The pair of languages ("en_zh", "en_fr", "en_ja", "zh_en", "fr_en", "ja_en").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WILLOWObjectClass(root: str, category: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Car", "Duck", "Face", "Motorbike", "Winebottle").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalVOCKeypoints(root: str, category: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories. The dataset is pre-filtered to exclude difficult, occluded and truncated objects. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (relu4_2 and relu5_1).

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PascalPF(root: str, category: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.

Parameters

root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of "Aeroplane", "Bicycle", "Bird", "Boat", "Bottle", "Bus", "Car", "Cat", "Chair", "Diningtable", "Dog", "Horse", "Motorbike", "Person", "Pottedplant", "Sheep", "Sofa", "Train", "TVMonitor")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SNAPDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

A variety of graph datasets collected from SNAP at Stanford University.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class SuiteSparseMatrixCollection(root: str, group: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.

Parameters

root (string) – Root directory where the dataset should be saved.
group (string) – The group of the sparse matrix.
name (string) – The name of the sparse matrix.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class AMiner(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type "paper", "author" and "venue". Venue categories and author research interests are available as ground truth labels for a subset of nodes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.

Note

The original WordNet18 dataset suffers from test leakage, i.e. more than 80% of test triplets can be found in the training set with another relation type. Therefore, it should not be used for research evaluation anymore. We recommend to use its cleaned version WordNet18RR instead.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WordNet18RR(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikiCS(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, is_undirected: Optional[bool] = None)[source]

The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
is_undirected (bool, optional) – Whether the graph is undirected. (default: True)

class WebKB(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The task is to classify the nodes into one of the five categories, student, project, course, staff, and faculty.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("Cornell", "Texas", "Wisconsin").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class WikipediaNetwork(root: str, name: str, geom_gcn_preprocess: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Wikipedia networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features represent several informative nouns in the Wikipedia pages. The task is to predict the average daily traffic of the web page.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("chameleon", "crocodile", "squirrel").
geom_gcn_preprocess (bool) – If set to True, will load the pre-processed data as introduced in the “Geom-GCN: Geometric Graph Convolutional Networks” <https://arxiv.org/abs/2002.05287>_, in which the average monthly traffic of the web page is converted into five categories to predict. If set to True, the dataset "crocodile" is not available.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Actor(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Each node corresponds to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. The task is to classify the nodes into five categories in term of words of actor’s Wikipedia.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class OGB_MAG(root: str, preprocess: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The ogbn-mag dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper. ogbn-mag is a heterogeneous graph composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensional word2vec feature vector, while all other node types are not associated with any input features. The task is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues.

Parameters

root (string) – Root directory where the dataset should be saved.
preprocess (string, optional) – Pre-processes the original dataset by adding structural features ("metapath2vec", :obj:”TransE”) to featureless nodes. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DBLP(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the DBLP computer science bibliography website, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. DBLP is a heterogeneous graph containing four types of entities - authors (4,057 nodes), papers (14,328 nodes), terms (7,723 nodes), and conferences (20 nodes). The authors are divided into four research areas (database, data mining, artificial intelligence, information retrieval). Each author is described by a bag-of-words representation of their paper keywords.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class MovieLens(root, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, model_name: Optional[str] = 'all-MiniLM-L6-v2')[source]

A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type "movie" and "user". User ratings for movies are available as ground truth labels for the edges between the users and the movies ("user", "rates", "movie").

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
model_name (str) – Name of model used to transform movie titles to node features. The model comes from the`Huggingface SentenceTransformer <https://huggingface.co/sentence-transformers>`_.

class IMDB(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the Internet Movie Database (IMDB), as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. IMDB is a heterogeneous graph containing three types of entities - movies (4,278 nodes), actors (5,257 nodes), and directors (2,081 nodes). The movies are divided into three classes (action, comedy, drama) according to their genre. Movie features correspond to elements of a bag-of-words representation of its plot keywords.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class LastFM(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A subset of the last.fm music website keeping track of users’ listining information from various sources, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. last.fm is a heterogeneous graph containing three types of entities - users (1,892 nodes), artists (17,632 nodes), and artist tags (1,088 nodes). This dataset can be used for link prediction, and no labels or features are provided.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class HGBDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of heterogeneous graph benchmark datasets from the “Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks” paper.

Note

Test labels are randomly given to prevent data leakage issues. If you want to obtain final test performance, you will need to submit your model predictions to the HGB leaderboard.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "ACM", "DBLP", "Freebase", "IMDB")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class JODIEDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

class MixHopSyntheticDataset(root: str, homophily: float, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). All graphs have 5,000 nodes, where each node corresponds to 1 out of 10 classes. The feature values of the nodes are sampled from a 2D Gaussian distribution, which are distinct for each class.

Parameters

root (string) – Root directory where the dataset should be saved.
homophily (float) – The degree of homophily (one of 0.0, 0.1, …, 0.9).
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class UPFD(root: str, name: str, feature: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper. It includes two sets of tree-structured fake & real news propagation graphs extracted from Twitter. For a single graph, the root node represents the source news, and leaf nodes represent Twitter users who retweeted the same root news. A user node has an edge to the news node if and only if the user retweeted the root news directly. Two user nodes have an edge if and only if one user retweeted the root news from the other user. Four different node features are encoded using different encoders. Please refer to GNN-FakeNews repo for more details.

Note

For an example of using UPFD, see examples/upfd.py.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the graph set ("politifact", "gossipcop").
feature (string) – The node feature type ("profile", "spacy", "bert", "content"). If set to "profile", the 10-dimensional node feature is composed of ten Twitter user profile attributes. If set to "spacy", the 300-dimensional node feature is composed of Twitter user historical tweets encoded by the spaCy word2vec encoder. If set to "bert", the 768-dimensional node feature is composed of Twitter user historical tweets encoded by the bert-as-service. If set to "content", the 310-dimensional node feature is composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector.
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class GitHub(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The GitHub Web and ML Developers dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent developers on GitHub and edges are mutual follower relationships. It contains 37,300 nodes, 578,006 edges, 128 node features and 2 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class FacebookPagePage(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Facebook Page-Page network dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent verified pages on Facebook and edges are mutual likes. It contains 22,470 nodes, 342,004 edges, 128 node features and 4 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class LastFMAsia(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The LastFM Asia Network dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent LastFM users from Asia and edges are friendships. It contains 7,624 nodes, 55,612 edges, 128 node features and 18 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class DeezerEurope(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer Europe dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent European users of Deezer and edges are mutual follower relationships. It contains 28,281 nodes, 185,504 edges, 128 node features and 2 classes.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class GemsecDeezer(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Deezer User Network datasets introduced in the “GEMSEC: Graph Embedding with Self Clustering” paper. Nodes represent Deezer user and edges are mutual friendships. The task is multi-label multi-class node classification about the genres liked by the users.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("HU", "HR", "RO").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Twitch(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Twitch Gamer networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent gamers on Twitch and edges are followerships between them. Node features represent embeddings of games played by the Twitch users. The task is to predict whether a user streams mature content.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("DE", "EN", "ES", "FR", "PT", "RU").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class Airports(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Airports dataset from the “struc2vec: Learning Node Representations from Structural Identity” paper, where nodes denote airports and labels correspond to activity levels. Features are given by one-hot encoded node identifiers, as described in the “GraLSP: Graph Neural Networks with Local Structural Patterns” ` paper.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("USA", "Brazil", "Europe").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class BAShapes(connection_distribution: str = 'random', transform: Optional[Callable] = None)[source]

The BA-Shapes dataset from the “GNNExplainer: Generating Explanations for Graph Neural Networks” paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 “house”-structured graphs connected to it.

Parameters

connection_distribution (string, optional) – Specifies how the houses and the BA graph get connected. Valid inputs are "random" (random BA graph nodes are selected for connection to the houses), and "uniform" (uniformly distributed BA graph nodes are selected for connection to the houses). (default: "random")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

class LRGBDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The “Long Range Graph Benchmark (LRGB)” datasets which is a collection of 5 graph learning datasets with tasks that are based on long-range dependencies in graphs. See the original source code for more details on the individual datasets.

Dataset	Domain	Task
`PascalVOC-SP`	Computer Vision	Node Classification
`COCO-SP`	Computer Vision	Node Classification
`PCQM-Contact`	Quantum Chemistry	Link Prediction
`Peptides-func`	Chemistry	Graph Classification
`Peptides-struct`	Chemistry	Graph Regression

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of "PascalVOC-SP", "COCO-SP", "PCQM-Contact", "Peptides-func", "Peptides-struct")
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

Stats:

Name	#graphs	#nodes	#edges	#classes
PascalVOC-SP	11,355	~479.40	~2,710.48	21
COCO-SP	123,286	~476.88	~2,693.67	81
PCQM-Contact	529,434	~30.14	~61.09	1
Peptides-func	15,535	~150.94	~307.30	10
Peptides-struct	15,535	~150.94	~307.30	11

class MalNetTiny(root: str, split: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The MalNet Tiny dataset from the “A Large-Scale Database for Graph Representation Learning” paper. MalNetTiny contains 5,000 malicious and benign software function call graphs across 5 different types. Each graph contains at most 5k nodes.

Parameters

root (string) – Root directory where the dataset should be saved.
split (string, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "trainval", loads the training and validation dataset. If "test", loads the test dataset. If None, loads the entire dataset. (default: None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class OMDB(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]

The Organic Materials Database (OMDB) of bulk organic crystals.

Parameters

root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If True, loads the training dataset, otherwise the test dataset. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

class PolBlogs(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Political Blogs dataset from the “The Political Blogosphere and the 2004 US Election: Divided they Blog” paper.

Polblogs is a graph with 1,490 vertices (representing political blogs) and 19,025 edges (links between blogs). The links are automatically extracted from a crawl of the front page of the blog. Each vertex receives a label indicating the political leaning of the blog: liberal or conservative.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
1,490	19,025	0	2

class EmailEUCore(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

An e-mail communication network of a large European research institution, taken from the “Local Higher-order Graph Clustering” paper. Nodes indicate members of the institution. An edge between a pair of members indicates that they exchanged at least one email. Node labels indicate membership to one of the 42 departments.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class StochasticBlockModelDataset(root: str, block_sizes: Union[List[int], Tensor], edge_probs: Union[List[List[float]], Tensor], num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

A synthetic graph dataset generated by the stochastic block model. The node features of each block are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the sklearn.datasets.make_classification() method.

Parameters

root (string) – Root directory where the dataset should be saved.
block_sizes ([int] or LongTensor) – The sizes of blocks.
edge_probs ([[float]] or FloatTensor) – The density of edges going from each block to each other block. Must be symmetric if the graph is undirected.
num_channels (int, optional) – The number of node features. If given as None, node features are not generated. (default: None)
is_undirected (bool, optional) – Whether the graph to generate is undirected. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
**kwargs (optional) – The keyword arguments that are passed down to the sklearn.datasets.make_classification() method for drawing node features.

class RandomPartitionGraphDataset(root, num_classes: int, num_nodes_per_class: int, node_homophily_ratio: float, average_degree: float, num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]

The random partition graph dataset from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper. This is a synthetic graph of communities controlled by the node homophily and the average degree, and each community is considered as a class. The node features are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the sklearn.datasets.make_classification() method.

Parameters

root (string) – Root directory where the dataset should be saved.
num_classes (int) – The number of classes.
num_nodes_per_class (int) – The number of nodes per class.
node_homophily_ratio (float) – The degree of node homophily.
average_degree (float) – The average degree of the graph.
num_channels (int, optional) – The number of node features. If given as None, node features are not generated. (default: None)
is_undirected (bool, optional) – Whether the graph to generate is undirected. (default: True)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
**kwargs (optional) – The keyword arguments that are passed down to sklearn.datasets.make_classification() method in drawing node features.

class LINKXDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

A variety of non-homophilous graph datasets from the “Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods” paper.

Note

Some of the datasets provided in LINKXDataset are from other sources, but have been updated with new features and/or labels.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset ("penn94", "reed98", "amherst41", "cornell5", "johnshopkins55", "genius").
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

class EllipticBitcoinDataset(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The Elliptic Bitcoin dataset of Bitcoin transactions from the “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics” paper.

EllipticBitcoinDataset maps Bitcoin transactions to real entities belonging to licit categories (exchanges, wallet providers, miners, licit services, etc.) versus illicit ones (scams, malware, terrorist organizations, ransomware, Ponzi schemes, etc.)

There exists 203,769 node transactions and 234,355 directed edge payments flows, with two percent of nodes (4,545) labelled as illicit, and twenty-one percent of nodes (42,019) labelled as licit. The remaining transactions are unknown.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
203,769	234,355	165	2

class DGraphFin(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

The DGraphFin networks from the “DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection” paper. It is a directed, unweighted dynamic graph consisting of millions of nodes and edges, representing a realistic user-to-user social network in financial industry. Node represents a Finvolution user, and an edge from one user to another means that the user regards the other user as the emergency contact person. Each edge is associated with a timestamp ranging from 1 to 821 and a type of emergency contact ranging from 0 to 11.

Parameters

root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

Stats:

#nodes	#edges	#features	#classes
3,700,550	4,300,999	17	2

class HydroNet(root: str, name: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, num_workers: int = 8, clusters: Optional[Union[int, List[int]]] = None, use_processed: bool = True)[source]

The HydroNet dataest from the “HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data” paper, consisting of 5 million water clusters held together by hydrogen bonding networks. This dataset provides atomic coordinates and total energy in kcal/mol for the cluster.

Parameters

root (string) – Root directory where the dataset should be saved.
name (string, optional) – Name of the subset of the full dataset to use: "small" uses 500k graphs sampled from the "medium" dataset, "medium" uses 2.7m graphs with maximum size of 75 nodes. Mutually exclusive option with the clusters argument. (default None)
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
num_workers (int) – Number of multiprocessing workers to use for pre-processing the dataset. (default 8)
clusters (int or List[int], optional) – Select a subset of clusters from the full dataset. If set to None, will select all. (default None)
use_processed (bool) – Option to use a pre-processed version of the original xyz dataset. (default: True)

len() → int[source]: Returns the number of graphs stored in the dataset.