torch_geometric.datasets
Zachary's karate club network from the "An Information Flow Model for Conflict and Fission in Small Groups" paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. |
|
A variety of graph kernel benchmark datasets, .e.g. "IMDB-BINARY", "REDDIT-BINARY" or "PROTEINS", collected from the TU Dortmund University. |
|
A variety of artificially and semi-artificially generated graph datasets from the "Benchmarking Graph Neural Networks" paper. |
|
The citation network datasets "Cora", "CiteSeer" and "PubMed" from the "Revisiting Semi-Supervised Learning with Graph Embeddings" paper. |
|
A fake dataset that returns randomly generated |
|
A fake dataset that returns randomly generated |
|
The NELL dataset, a knowledge graph from the "Toward an Architecture for Never-Ending Language Learning" paper. |
|
The full citation network datasets from the "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking" paper. |
|
Alias for |
|
The Coauthor CS and Coauthor Physics networks from the "Pitfalls of Graph Neural Network Evaluation" paper. |
|
The Amazon Computers and Amazon Photo networks from the "Pitfalls of Graph Neural Network Evaluation" paper. |
|
The protein-protein interaction networks from the "Predicting Multicellular Function through Multi-layer Tissue Networks" paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total). |
|
The Reddit dataset from the "Inductive Representation Learning on Large Graphs" paper, containing Reddit posts belonging to different communities. |
|
The Reddit dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing Reddit posts belonging to different communities. |
|
The Flickr dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing descriptions and common properties of images. |
|
The Yelp dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing customer reviewers and their friendship. |
|
The Amazon dataset from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper, containing products and its categories. |
|
The QM7b dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of 7,211 molecules with 14 regression targets. |
|
The QM9 dataset from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, consisting of about 130,000 molecules with 19 regression targets. |
|
A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. |
|
The ZINC dataset from the ZINC database and the "Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules" paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. |
|
The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources. |
|
The MoleculeNet benchmark collection from the "MoleculeNet: A Benchmark for Molecular Machine Learning" paper, containing datasets from physical chemistry, biophysics and physiology. |
|
The relational entities networks "AIFB", "MUTAG", "BGS" and "AM" from the "Modeling Relational Data with Graph Convolutional Networks" paper. |
|
The relational link prediction datasets from the "Modeling Relational Data with Graph Convolutional Networks" paper. |
|
The GED datasets from the "Graph Edit Distance Computation via Graph Neural Networks" paper. |
|
A variety of attributed graph datasets from the "Scaling Attributed Network Embedding to Massive Graphs" paper. |
|
MNIST superpixels dataset from the "Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs" paper, containing 70,000 graphs with 75 nodes each. |
|
The FAUST humans dataset from the "FAUST: Dataset and Evaluation for 3D Mesh Registration" paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects. |
|
The dynamic FAUST humans dataset from the "Dynamic FAUST: Registering Human Bodies in Motion" paper. |
|
The ShapeNet part level segmentation dataset from the "A Scalable Active Framework for Region Annotation in 3D Shape Collections" paper, containing about 17,000 3D shape point clouds from 16 shape categories. |
|
The ModelNet10/40 datasets from the "3D ShapeNets: A Deep Representation for Volumetric Shapes" paper, containing CAD models of 10 and 40 categories, respectively. |
|
The CoMA 3D faces dataset from the "Generating 3D faces using Convolutional Mesh Autoencoders" paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects. |
|
The SHREC 2016 partial matching dataset from the "SHREC'16: Partial Matching of Deformable Shapes" paper. |
|
The TOSCA dataset from the "Numerical Geometry of Non-Ridig Shapes" book, containing 80 meshes. |
|
The PCPNet dataset from the "PCPNet: Learning Local Shape Properties from Raw Point Clouds" paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. |
|
The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the "3D Semantic Parsing of Large-Scale Indoor Spaces" paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class). |
|
Synthetic dataset of various geometric shapes like cubes, spheres or pyramids. |
|
The Bitcoin-OTC dataset from the "EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs" paper, consisting of 138 who-trusts-whom networks of sequential time steps. |
|
The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity). |
|
The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., "Recurrent Event Network for Reasoning over Temporal Knowledge Graphs" paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity). |
|
The DBP15K dataset from the "Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding" paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. |
|
The WILLOW-ObjectClass dataset from the "Learning Graphs to Match" paper, containing 10 equal keypoints of at least 40 images in each category. |
|
The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations" paper, containing 0 to 23 keypoints per example over 20 categories. |
|
The Pascal-PF dataset from the "Proposal Flow" paper, containing 4 to 16 keypoints per example over 20 categories. |
|
A variety of graph datasets collected from SNAP at Stanford University. |
|
A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications. |
|
The heterogeneous AMiner dataset from the "metapath2vec: Scalable Representation Learning for Heterogeneous Networks" paper, consisting of nodes from type |
|
The WordNet18 dataset from the "Translating Embeddings for Modeling Multi-Relational Data" paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed. |
|
The WordNet18RR dataset from the "Convolutional 2D Knowledge Graph Embeddings" paper, containing 40,943 entities, 11 relations and 93,003 fact triplets. |
|
The semi-supervised Wikipedia-based dataset from the "Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks" paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits. |
|
The WebKB datasets used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper. |
|
The Wikipedia networks introduced in the "Multi-scale Attributed Node Embedding" paper. |
|
The actor-only induced subgraph of the film-director-actor-writer network used in the "Geom-GCN: Geometric Graph Convolutional Networks" paper. |
|
The ogbn-mag dataset from the "Open Graph Benchmark: Datasets for Machine Learning on Graphs" paper. |
|
A subset of the DBLP computer science bibliography website, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper. |
|
A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type |
|
A subset of the Internet Movie Database (IMDB), as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper. |
|
A subset of the last.fm music website keeping track of users' listining information from various sources, as collected in the "MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding" paper. |
|
A variety of heterogeneous graph benchmark datasets from the "Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks" paper. |
|
The MixHop synthetic dataset from the "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). |
|
The tree-structured fake news propagation graph classification dataset from the "User Preference-aware Fake News Detection" paper. |
|
The GitHub Web and ML Developers dataset introduced in the "Multi-scale Attributed Node Embedding" paper. |
|
The Facebook Page-Page network dataset introduced in the "Multi-scale Attributed Node Embedding" paper. |
|
The LastFM Asia Network dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper. |
|
The Deezer Europe dataset introduced in the "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models" paper. |
|
The Deezer User Network datasets introduced in the "GEMSEC: Graph Embedding with Self Clustering" paper. |
|
The Twitch Gamer networks introduced in the "Multi-scale Attributed Node Embedding" paper. |
|
The Airports dataset from the "struc2vec: Learning Node Representations from Structural Identity" paper, where nodes denote airports and labels correspond to activity levels. |
|
The BA-Shapes dataset from the "GNNExplainer: Generating Explanations for Graph Neural Networks" paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 "house"-structured graphs connected to it. |
|
The "Long Range Graph Benchmark (LRGB)" datasets which is a collection of 5 graph learning datasets with tasks that are based on long-range dependencies in graphs. |
|
The MalNet Tiny dataset from the "A Large-Scale Database for Graph Representation Learning" paper. |
|
The Organic Materials Database (OMDB) of bulk organic crystals. |
|
The Political Blogs dataset from the "The Political Blogosphere and the 2004 US Election: Divided they Blog" paper. |
|
An e-mail communication network of a large European research institution, taken from the "Local Higher-order Graph Clustering" paper. |
|
A synthetic graph dataset generated by the stochastic block model. |
|
The random partition graph dataset from the "How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision" paper. |
|
A variety of non-homophilous graph datasets from the "Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods" paper. |
|
The Elliptic Bitcoin dataset of Bitcoin transactions from the "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics" paper. |
|
The DGraphFin networks from the "DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection" paper. |
|
The HydroNet dataest from the "HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data" paper, consisting of 5 million water clusters held together by hydrogen bonding networks. |
- class KarateClub(transform: Optional[Callable] = None)[source]
Zachary’s karate club network from the “An Information Flow Model for Conflict and Fission in Small Groups” paper, containing 34 nodes, connected by 156 (undirected and unweighted) edges. Every node is labeled by one of four classes obtained via modularity-based clustering, following the “Semi-supervised Classification with Graph Convolutional Networks” paper. Training is based on a single labeled example per class, i.e. a total number of 4 labeled nodes.
- Parameters
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
34
156
34
4
- class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]
A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.
Note
Some datasets may not come with any node labels. You can then either make use of the argument
use_node_attr
to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as liketorch_geometric.transforms.Constant
ortorch_geometric.transforms.OneHotDegree
.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)use_node_attr (bool, optional) – If
True
, the dataset will contain additional continuous node attributes (if present). (default:False
)use_edge_attr (bool, optional) – If
True
, the dataset will contain additional continuous edge attributes (if present). (default:False
)cleaned (bool, optional) – If
True
, the dataset will contain only non-isomorphic graphs. (default:False
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
MUTAG
188
~17.9
~39.6
7
2
ENZYMES
600
~32.6
~124.3
3
6
PROTEINS
1,113
~39.1
~145.6
3
2
COLLAB
5,000
~74.5
~4914.4
0
3
IMDB-BINARY
1,000
~19.8
~193.1
0
2
REDDIT-BINARY
2,000
~429.6
~995.5
0
2
…
- class GNNBenchmarkDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
A variety of artificially and semi-artificially generated graph datasets from the “Benchmarking Graph Neural Networks” paper.
Note
The ZINC dataset is provided via
torch_geometric.datasets.ZINC
.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of
"PATTERN"
,"CLUSTER"
,"MNIST"
,"CIFAR10"
,"TSP"
,"CSL"
)split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
PATTERN
10,000
~118.9
~6,098.9
3
2
CLUSTER
10,000
~117.2
~4,303.9
7
6
MNIST
55,000
~70.6
~564.5
3
10
CIFAR10
45,000
~117.6
~941.2
5
10
TSP
10,000
~275.4
~6,885.0
2
2
CSL
150
~41.0
~164.0
0
10
- class Planetoid(root: str, name: str, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"Cora"
,"CiteSeer"
,"PubMed"
).split (string) –
The type of dataset split (
"public"
,"full"
,"geom-gcn"
,"random"
). If set to"public"
, the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to"full"
, all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to"geom-gcn"
, the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given. If set to"random"
, train, validation, and test sets will be randomly generated, according tonum_train_per_class
,num_val
andnum_test
. (default:"public"
)num_train_per_class (int, optional) – The number of training samples per class in case of
"random"
split. (default:20
)num_val (int, optional) – The number of validation samples in case of
"random"
split. (default:500
)num_test (int, optional) – The number of test samples in case of
"random"
split. (default:1000
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
Name
#nodes
#edges
#features
#classes
Cora
2,708
10,556
1,433
7
CiteSeer
3,327
9,104
3,703
6
PubMed
19,717
88,648
500
3
- class FakeDataset(num_graphs: int = 1, avg_num_nodes: int = 1000, avg_degree: int = 10, num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]
A fake dataset that returns randomly generated
Data
objects.- Parameters
num_graphs (int, optional) – The number of graphs. (default:
1
)avg_num_nodes (int, optional) – The average number of nodes in a graph. (default:
1000
)avg_degree (int, optional) – The average degree per node. (default:
10
)num_channels (int, optional) – The number of node features. (default:
64
)edge_dim (int, optional) – The number of edge features. (default:
0
)num_classes (int, optional) – The number of classes in the dataset. (default:
10
)task (str, optional) – Whether to return node-level or graph-level labels (
"node"
,"graph"
,"auto"
). If set to"auto"
, will return graph-level labels ifnum_graphs > 1
, and node-level labels other-wise. (default:"auto"
)is_undirected (bool, optional) – Whether the graphs to generate are undirected. (default:
True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)**kwargs (optional) – Additional attributes and their shapes e.g.
global_features=5
.
- class FakeHeteroDataset(num_graphs: int = 1, num_node_types: int = 3, num_edge_types: int = 6, avg_num_nodes: int = 1000, avg_degree: int = 10, avg_num_channels: int = 64, edge_dim: int = 0, num_classes: int = 10, task: str = 'auto', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]
A fake dataset that returns randomly generated
HeteroData
objects.- Parameters
num_graphs (int, optional) – The number of graphs. (default:
1
)num_node_types (int, optional) – The number of node types. (default:
3
)num_edge_types (int, optional) – The number of edge types. (default:
6
)avg_num_nodes (int, optional) – The average number of nodes in a graph. (default:
1000
)avg_degree (int, optional) – The average degree per node. (default:
10
)avg_num_channels (int, optional) – The average number of node features. (default:
64
)edge_dim (int, optional) – The number of edge features. (default:
0
)num_classes (int, optional) – The number of classes in the dataset. (default:
10
)task (str, optional) – Whether to return node-level or graph-level labels (
"node"
,"graph"
,"auto"
). If set to"auto"
, will return graph-level labels ifnum_graphs > 1
, and node-level labels other-wise. (default:"auto"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)**kwargs (optional) – Additional attributes and their shapes e.g.
global_features=5
.
- class NELL(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The NELL dataset, a knowledge graph from the “Toward an Architecture for Never-Ending Language Learning” paper. The dataset is processed as in the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper.
Note
Entity nodes are described by sparse feature vectors of type
torch_sparse.SparseTensor
, which can be either used directly, or can be converted viadata.x.to_dense()
,data.x.to_scipy()
ordata.x.to_torch_sparse_coo_tensor()
.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
65,755
251,550
61,278
186
- class CitationFull(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The full citation network datasets from the “Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking” paper. Nodes represent documents and edges represent citation links. Datasets include citeseer, cora, cora_ml, dblp, pubmed.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"Cora"
,"Cora_ML"
"CiteSeer"
,"DBLP"
,"PubMed"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
Name
#nodes
#edges
#features
#classes
Cora
19,793
126,842
8,710
70
Cora_ML
2,995
16,316
2,879
7
CiteSeer
4,230
10,674
602
6
DBLP
17,716
105,734
1,639
4
PubMed
19,717
88,648
500
3
- class CoraFull(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
Alias for
torch_geometric.datasets.CitationFull
withname="cora"
.- Stats:
#nodes
#edges
#features
#classes
19,793
126,842
8,710
70
- class Coauthor(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Coauthor CS and Coauthor Physics networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent authors that are connected by an edge if they co-authored a paper. Given paper keywords for each author’s papers, the task is to map authors to their respective field of study.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"CS"
,"Physics"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
Name
#nodes
#edges
#features
#classes
CS
18,333
163,788
6,805
15
Physics
34,493
495,924
8,415
5
- class Amazon(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Amazon Computers and Amazon Photo networks from the “Pitfalls of Graph Neural Network Evaluation” paper. Nodes represent goods and edges represent that two goods are frequently bought together. Given product reviews as bag-of-words node features, the task is to map goods to their respective product category.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"Computers"
,"Photo"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
Name
#nodes
#edges
#features
#classes
Computers
13,752
491,722
767
10
Photo
7,650
238,162
745
8
- class PPI(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The protein-protein interaction networks from the “Predicting Multicellular Function through Multi-layer Tissue Networks” paper, containing positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).
- Parameters
root (string) – Root directory where the dataset should be saved.
split (string) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#tasks
20
~2,245.3
~61,318.4
50
121
- class Reddit(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Reddit dataset from the “Inductive Representation Learning on Large Graphs” paper, containing Reddit posts belonging to different communities.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
232,965
114,615,892
602
41
- class Reddit2(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Reddit dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing Reddit posts belonging to different communities.
Note
This is a sparser version of the original
Reddit
dataset (~23M edges instead of ~114M edges), and is used in papers such as SGC and GraphSAINT.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
232,965
23,213,838
602
41
- class Flickr(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Flickr dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing descriptions and common properties of images.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
89,250
899,756
500
7
- class Yelp(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Yelp dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing customer reviewers and their friendship.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#tasks
716,847
13,954,819
300
100
- class AmazonProducts(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Amazon dataset from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper, containing products and its categories.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
1,569,960
264,339,468
200
107
- class QM7b(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The QM7b dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of 7,211 molecules with 14 regression targets.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#tasks
7,211
~15.4
~245.0
0
14
- class QM9(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.
Target
Property
Description
Unit
0
\(\mu\)
Dipole moment
\(\textrm{D}\)
1
\(\alpha\)
Isotropic polarizability
\({a_0}^3\)
2
\(\epsilon_{\textrm{HOMO}}\)
Highest occupied molecular orbital energy
\(\textrm{eV}\)
3
\(\epsilon_{\textrm{LUMO}}\)
Lowest unoccupied molecular orbital energy
\(\textrm{eV}\)
4
\(\Delta \epsilon\)
Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)
\(\textrm{eV}\)
5
\(\langle R^2 \rangle\)
Electronic spatial extent
\({a_0}^2\)
6
\(\textrm{ZPVE}\)
Zero point vibrational energy
\(\textrm{eV}\)
7
\(U_0\)
Internal energy at 0K
\(\textrm{eV}\)
8
\(U\)
Internal energy at 298.15K
\(\textrm{eV}\)
9
\(H\)
Enthalpy at 298.15K
\(\textrm{eV}\)
10
\(G\)
Free energy at 298.15K
\(\textrm{eV}\)
11
\(c_{\textrm{v}}\)
Heat capavity at 298.15K
\(\frac{\textrm{cal}}{\textrm{mol K}}\)
12
\(U_0^{\textrm{ATOM}}\)
Atomization energy at 0K
\(\textrm{eV}\)
13
\(U^{\textrm{ATOM}}\)
Atomization energy at 298.15K
\(\textrm{eV}\)
14
\(H^{\textrm{ATOM}}\)
Atomization enthalpy at 298.15K
\(\textrm{eV}\)
15
\(G^{\textrm{ATOM}}\)
Atomization free energy at 298.15K
\(\textrm{eV}\)
16
\(A\)
Rotational constant
\(\textrm{GHz}\)
17
\(B\)
Rotational constant
\(\textrm{GHz}\)
18
\(C\)
Rotational constant
\(\textrm{GHz}\)
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#tasks
130,831
~18.0
~37.3
11
19
- class MD17(root: str, name: str, train: Optional[bool] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets as well as all other datasets released by sGDML since then (15 in total).
For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection.
Note
Data objects contain no edge indices as these are most commonly constructed via the
torch_geometric.transforms.RadiusGraph
transform, with its cut-off being a hyperparameter.Some of the trajectories were computed at different levels of theory, and for most molecules there exists two versions: a long trajectory on DFT level of theory and a short trajectory on coupled cluster level of theory. Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the
name
argument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via thetrain
argument.When using these datasets, make sure to cite the appropriate publications listed on the sGDML website.
Molecule
Level of Theory
Name
#Examples
Benzene
DFT
benzene
49,863
Benzene
DFT FHI-aims
benzene FHI-aims
627,983
Benzene
CCSD(T)
benzene CCSD(T)
1,500
Uracil
DFT
uracil
133,770
Naphthalene
DFT
napthalene
326,250
Aspirin
DFT
aspirin
211,762
Aspirin
CCSD
aspirin CCSD
1,500
Salicylic acid
DFT
salicylic acid
320,231
Malonaldehyde
DFT
malonaldehyde
993,237
Malonaldehyde
CCSD(T)
malonaldehyde CCSD(T)
1,500
Ethanol
DFT
ethanol
555,092
Ethanol
CCSD(T)
ethanol CCSD(T)
2,000
Toluene
DFT
toluene
442,790
Toluene
CCSD(T)
toluene CCSD(T)
1,501
Paracetamol
DFT
paracetamol
106,490
Azobenzene
DFT
azobenzene
99,999
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – Keyword of the trajectory that should be loaded.
train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
Benzene FHI-aims
49,863
12
0
0
0
Benzene
627,983
12
0
0
0
Benzene CCSD-T
1,500
12
0
0
0
Uracil
133,770
12
0
0
0
Naphthalene
326,250
10
0
0
0
Aspirin
211,762
21
0
0
0
Aspirin CCSD-T
1,500
21
0
0
0
Salicylic acid
320,231
16
0
0
0
Malonaldehyde
993,237
9
0
0
0
Malonaldehyde CCSD-T
1,500
9
0
0
0
Ethanol
555,092
9
0
0
0
Ethanol CCSD-T
2000
9
0
0
0
Toluene
442,790
15
0
0
0
Toluene CCSD-T
1,501
15
0
0
0
Paracetamol
106,490
20
0
0
0
Azobenzene
99,999
24
0
0
0
- class ZINC(root: str, subset: bool = False, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized
logP
(also called constrained solubility in some works), given byy = logP - SAS - cycles
, wherelogP
is the water-octanol partition coefficient,SAS
is the synthetic accessibility score, andcycles
denotes the number of cycles with more than six atoms. PenalizedlogP
is a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.- Parameters
root (string) – Root directory where the dataset should be saved.
subset (boolean, optional) –
If set to
True
, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default:False
)split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
ZINC Full
249,456
~23.2
~49.8
1
1
ZINC Subset
12,000
~23.2
~49.8
1
1
- class AQSOL(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.
The aqueous solubility targets are collected from experimental measurements and standardized to LogS units in AqSolDB. These final values denote the property to regress in the
AQSOL
dataset. After filtering out few graphs with no bonds/edges, the total number of molecular graphs is 9,833. For each molecular graph, the node features are the types of heavy atoms and the edge features are the types of bonds between them, similar as in theZINC
dataset.- Parameters
root (string) – Root directory where the dataset should be saved.
split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#classes
9,833
~17.6
~35.8
1
1
- class MoleculeNet(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The MoleculeNet benchmark collection from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, containing datasets from physical chemistry, biophysics and physiology. All datasets come with the additional node and edge features introduced by the Open Graph Benchmark.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"ESOL"
,"FreeSolv"
,"Lipo"
,"PCBA"
,"MUV"
,"HIV"
,"BACE"
,"BBPB"
,"Tox21"
,"ToxCast"
,"SIDER"
,"ClinTox"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
ESOL
1,128
~13.3
~27.4
9
1
FreeSolv
642
~8.7
~16.8
9
1
Lipophilicity
4,200
~27.0
~59.0
9
1
PCBA
437,929
~26.0
~56.2
9
128
MUV
93,087
~24.2
~52.6
9
17
HIV
41,127
~25.5
~54.9
9
1
BACE
1513
~34.1
~73.7
9
1
BBPB
2,050
~23.9
~51.6
9
1
Tox21
7,831
~18.6
~38.6
9
12
ToxCast
8,597
~18.7
~38.4
9
617
SIDER
1,427
~33.6
~70.7
9
27
ClinTox
1,484
~26.1
~55.5
9
2
- class Entities(root: str, name: str, hetero: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The relational entities networks “AIFB”, “MUTAG”, “BGS” and “AM” from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by node indices.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"AIFB"
,"MUTAG"
,"BGS"
,"AM"
).hetero (bool, optional) – If set to
True
, will save the dataset as aHeteroData
object. (default:False
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
Name
#nodes
#edges
#features
#classes
AIFB
8,285
58,086
0
4
AM
1,666,764
11,976,642
0
11
MUTAG
23,644
148,454
0
2
BGS
333,845
1,832,398
0
2
- class RelLinkPredDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The relational link prediction datasets from the “Modeling Relational Data with Graph Convolutional Networks” paper. Training and test splits are given by sets of triplets.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"FB15k-237"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
14,541
544,230
0
0
- class GEDDataset(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The GED datasets from the “Graph Edit Distance Computation via Graph Neural Networks” paper. GEDs can be accessed via the global attributes
ged
andnorm_ged
for all train/train graph pairs and all train/test graph pairs:dataset = GEDDataset(root, name="LINUX") data1, data2 = dataset[0], dataset[1] ged = dataset.ged[data1.i, data2.i] # GED between `data1` and `data2`.
Note that GEDs are not available if both graphs are from the test set. For evaluation, it is recommended to pair up each graph from the test set with each graph in the training set.
Note
ALKANE
is missing GEDs for train/test graph pairs since they are not provided in the official datasets.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of
"AIDS700nef"
,"LINUX"
,"ALKANE"
,"IMDBMulti"
).train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
AIDS700nef
700
~8.9
~17.6
29
0
LINUX
1,000
~7.6
~13.9
0
0
ALKANE
150
~8.9
~15.8
0
0
IMDBMulti
1,500
~13.0
~131.9
0
0
- class AttributedGraphDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A variety of attributed graph datasets from the “Scaling Attributed Network Embedding to Massive Graphs” paper.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"Wiki"
,"Cora"
"CiteSeer"
,"PubMed"
,"BlogCatalog"
,"PPI"
,"Flickr"
,"Facebook"
,"Twitter"
,"TWeibo"
,"MAG"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class MNISTSuperpixels(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
MNIST superpixels dataset from the “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs” paper, containing 70,000 graphs with 75 nodes each. Every graph is labeled by one of 10 classes.
- Parameters
root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#classes
70,000
75
~1,393.0
1
10
- class FAUST(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The FAUST humans dataset from the “FAUST: Dataset and Evaluation for 3D Mesh Registration” paper, containing 100 watertight meshes representing 10 different poses for 10 different subjects.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#classes
100
6,890
41,328
3
10
- class DynamicFAUST(root: str, subjects: Optional[List[str]] = None, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The dynamic FAUST humans dataset from the “Dynamic FAUST: Registering Human Bodies in Motion” paper.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
subjects (list, optional) – List of subjects to include in the dataset. Can include the subjects
"50002"
,"50004"
,"50007"
,"50009"
,"50020"
,"50021"
,"50022"
,"50025"
,"50026"
,"50027"
. If set toNone
, the dataset will contain all subjects. (default:None
)categories (list, optional) – List of categories to include in the dataset. Can include the categories
"chicken_wings"
,"hips"
,"jiggle_on_toes"
,"jumping_jacks"
,"knees"
,"light_hopping_loose"
,"light_hopping_stiff"
,"one_leg_jump"
,"one_leg_loose"
,"personal_move"
,"punching"
,"running_on_spot"
,"running_on_spot_bugfix"
,"shake_arms"
,"shake_hips"
,"shoulders"
. If set toNone
, the dataset will contain all categories. (default:None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class ShapeNet(root: str, categories: Optional[Union[str, List[str]]] = None, include_normals: bool = True, split: str = 'trainval', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The ShapeNet part level segmentation dataset from the “A Scalable Active Framework for Region Annotation in 3D Shape Collections” paper, containing about 17,000 3D shape point clouds from 16 shape categories. Each category is annotated with 2 to 6 parts.
- Parameters
root (string) – Root directory where the dataset should be saved.
categories (string or [string], optional) – The category of the CAD models (one or a combination of
"Airplane"
,"Bag"
,"Cap"
,"Car"
,"Chair"
,"Earphone"
,"Guitar"
,"Knife"
,"Lamp"
,"Laptop"
,"Motorbike"
,"Mug"
,"Pistol"
,"Rocket"
,"Skateboard"
,"Table"
). Can be explicitly set toNone
to load all categories. (default:None
)include_normals (bool, optional) – If set to
False
, will not include normal vectors as input features todata.x
. As a result,data.x
will beNone
. (default:True
)split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"trainval"
, loads the training and validation dataset. If"test"
, loads the test dataset. (default:"trainval"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#classes
16,881
~2,616.2
0
3
50
- class ModelNet(root: str, name: str = '10', train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The ModelNet10/40 datasets from the “3D ShapeNets: A Deep Representation for Volumetric Shapes” paper, containing CAD models of 10 and 40 categories, respectively.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string, optional) – The name of the dataset (
"10"
for ModelNet10,"40"
for ModelNet40). (default:"10"
)train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#features
#classes
ModelNet10
4,899
~9,508.2
~37,450.5
3
10
ModelNet40
12,311
~17,744.4
~66,060.9
3
40
- class CoMA(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The CoMA 3D faces dataset from the “Generating 3D faces using Convolutional Mesh Autoencoders” paper, containing 20,466 meshes of extreme expressions captured over 12 different subjects.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
#graphs
#nodes
#edges
#features
#classes
20,465
5,023
29,990
3
12
- class SHREC2016(root: str, partiality: str, category: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The SHREC 2016 partial matching dataset from the “SHREC’16: Partial Matching of Deformable Shapes” paper. The reference shape can be referenced via
dataset.ref
.Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
partiality (string) – The partiality of the dataset (one of
"Holes"
,"Cuts"
).category (string) – The category of the dataset (one of
"Cat"
,"Centaur"
,"David"
,"Dog"
,"Horse"
,"Michael"
,"Victoria"
,"Wolf"
).train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class TOSCA(root: str, categories: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The TOSCA dataset from the “Numerical Geometry of Non-Ridig Shapes” book, containing 80 meshes. Meshes within the same category have the same triangulation and an equal number of vertices numbered in a compatible way.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
categories (list, optional) – List of categories to include in the dataset. Can include the categories
"Cat"
,"Centaur"
,"David"
,"Dog"
,"Gorilla"
,"Horse"
,"Michael"
,"Victoria"
,"Wolf"
. If set toNone
, the dataset will contain all categories. (default:None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class PCPNetDataset(root: str, category: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The PCPNet dataset from the “PCPNet: Learning Local Shape Properties from Raw Point Clouds” paper, consisting of 30 shapes, each given as a point cloud, densely sampled with 100k points. For each shape, surface normals and local curvatures are given as node features.
- Parameters
root (string) – Root directory where the dataset should be saved.
category (string) – The training set category (one of
"NoNoise"
,"Noisy"
,"VarDensity"
,"NoisyAndVarDensity"
forsplit="train"
orsplit="val"
, or one of"All"
,"LowNoise"
,"MedNoise"
,"HighNoise", :obj:
”VarDensityStriped”,"VarDensityGradient"
forsplit="test"
).split (string) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class S3DIS(root: str, test_area: int = 6, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The (pre-processed) Stanford Large-Scale 3D Indoor Spaces dataset from the “3D Semantic Parsing of Large-Scale Indoor Spaces” paper, containing point clouds of six large-scale indoor parts in three buildings with 12 semantic elements (and one clutter class).
- Parameters
root (string) – Root directory where the dataset should be saved.
test_area (int, optional) – Which area to use for testing (1-6). (default:
6
)train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class GeometricShapes(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
Synthetic dataset of various geometric shapes like cubes, spheres or pyramids.
Note
Data objects hold mesh faces instead of edge indices. To convert the mesh to a graph, use the
torch_geometric.transforms.FaceToEdge
aspre_transform
. To convert the mesh to a point cloud, use thetorch_geometric.transforms.SamplePoints
astransform
to sample a fixed number of points on the mesh faces according to their face area.- Parameters
root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class BitcoinOTC(root: str, edge_window_size: int = 10, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Bitcoin-OTC dataset from the “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs” paper, consisting of 138 who-trusts-whom networks of sequential time steps.
- Parameters
root (string) – Root directory where the dataset should be saved.
edge_window_size (int, optional) – The window size for the existence of an edge in the graph sequence since its initial creation. (default:
10
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class ICEWS18(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The Integrated Crisis Early Warning System (ICEWS) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 10/31/2018 (24 hours time granularity).
- Parameters
root (string) – Root directory where the dataset should be saved.
split (string) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class GDELT(root: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The Global Database of Events, Language, and Tone (GDELT) dataset used in the, e.g., “Recurrent Event Network for Reasoning over Temporal Knowledge Graphs” paper, consisting of events collected from 1/1/2018 to 1/31/2018 (15 minutes time granularity).
- Parameters
root (string) – Root directory where the dataset should be saved.
split (string) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class DBP15K(root: str, pair: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The DBP15K dataset from the “Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding” paper, where Chinese, Japanese and French versions of DBpedia were linked to its English version. Node features are given by pre-trained and aligned monolingual word embeddings from the “Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network” paper.
- Parameters
root (string) – Root directory where the dataset should be saved.
pair (string) – The pair of languages (
"en_zh"
,"en_fr"
,"en_ja"
,"zh_en"
,"fr_en"
,"ja_en"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class WILLOWObjectClass(root: str, category: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The WILLOW-ObjectClass dataset from the “Learning Graphs to Match” paper, containing 10 equal keypoints of at least 40 images in each category. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (
relu4_2
andrelu5_1
).- Parameters
root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of
"Car"
,"Duck"
,"Face"
,"Motorbike"
,"Winebottle"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class PascalVOCKeypoints(root: str, category: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The Pascal VOC 2011 dataset with Berkely annotations of keypoints from the “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations” paper, containing 0 to 23 keypoints per example over 20 categories. The dataset is pre-filtered to exclude difficult, occluded and truncated objects. The keypoints contain interpolated features from a pre-trained VGG16 model on ImageNet (
relu4_2
andrelu5_1
).- Parameters
root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of
"Aeroplane"
,"Bicycle"
,"Bird"
,"Boat"
,"Bottle"
,"Bus"
,"Car"
,"Cat"
,"Chair"
,"Diningtable"
,"Dog"
,"Horse"
,"Motorbike"
,"Person"
,"Pottedplant"
,"Sheep"
,"Sofa"
,"Train"
,"TVMonitor"
)train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class PascalPF(root: str, category: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The Pascal-PF dataset from the “Proposal Flow” paper, containing 4 to 16 keypoints per example over 20 categories.
- Parameters
root (string) – Root directory where the dataset should be saved.
category (string) – The category of the images (one of
"Aeroplane"
,"Bicycle"
,"Bird"
,"Boat"
,"Bottle"
,"Bus"
,"Car"
,"Cat"
,"Chair"
,"Diningtable"
,"Dog"
,"Horse"
,"Motorbike"
,"Person"
,"Pottedplant"
,"Sheep"
,"Sofa"
,"Train"
,"TVMonitor"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class SNAPDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
A variety of graph datasets collected from SNAP at Stanford University.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class SuiteSparseMatrixCollection(root: str, group: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A suite of sparse matrix benchmarks known as the Suite Sparse Matrix Collection collected from a wide range of applications.
- Parameters
root (string) – Root directory where the dataset should be saved.
group (string) – The group of the sparse matrix.
name (string) – The name of the sparse matrix.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class AMiner(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The heterogeneous AMiner dataset from the “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” paper, consisting of nodes from type
"paper"
,"author"
and"venue"
. Venue categories and author research interests are available as ground truth labels for a subset of nodes.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class WordNet18(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The WordNet18 dataset from the “Translating Embeddings for Modeling Multi-Relational Data” paper, containing 40,943 entities, 18 relations and 151,442 fact triplets, e.g., furniture includes bed.
Note
The original
WordNet18
dataset suffers from test leakage, i.e. more than 80% of test triplets can be found in the training set with another relation type. Therefore, it should not be used for research evaluation anymore. We recommend to use its cleaned versionWordNet18RR
instead.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class WordNet18RR(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The WordNet18RR dataset from the “Convolutional 2D Knowledge Graph Embeddings” paper, containing 40,943 entities, 11 relations and 93,003 fact triplets.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class WikiCS(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, is_undirected: Optional[bool] = None)[source]
The semi-supervised Wikipedia-based dataset from the “Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks” paper, containing 11,701 nodes, 216,123 edges, 10 classes and 20 different training splits.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)is_undirected (bool, optional) – Whether the graph is undirected. (default:
True
)
- class WebKB(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The WebKB datasets used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The task is to classify the nodes into one of the five categories, student, project, course, staff, and faculty.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"Cornell"
,"Texas"
,"Wisconsin"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class WikipediaNetwork(root: str, name: str, geom_gcn_preprocess: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Wikipedia networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent web pages and edges represent hyperlinks between them. Node features represent several informative nouns in the Wikipedia pages. The task is to predict the average daily traffic of the web page.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"chameleon"
,"crocodile"
,"squirrel"
).geom_gcn_preprocess (bool) – If set to
True
, will load the pre-processed data as introduced in the “Geom-GCN: Geometric Graph Convolutional Networks” <https://arxiv.org/abs/2002.05287>_, in which the average monthly traffic of the web page is converted into five categories to predict. If set toTrue
, the dataset"crocodile"
is not available.transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class Actor(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The actor-only induced subgraph of the film-director-actor-writer network used in the “Geom-GCN: Geometric Graph Convolutional Networks” paper. Each node corresponds to an actor, and the edge between two nodes denotes co-occurrence on the same Wikipedia page. Node features correspond to some keywords in the Wikipedia pages. The task is to classify the nodes into five categories in term of words of actor’s Wikipedia.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class OGB_MAG(root: str, preprocess: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The ogbn-mag dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper. ogbn-mag is a heterogeneous graph composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensional
word2vec
feature vector, while all other node types are not associated with any input features. The task is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues.- Parameters
root (string) – Root directory where the dataset should be saved.
preprocess (string, optional) – Pre-processes the original dataset by adding structural features (
"metapath2vec", :obj:
”TransE”) to featureless nodes. (default:None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class DBLP(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A subset of the DBLP computer science bibliography website, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. DBLP is a heterogeneous graph containing four types of entities - authors (4,057 nodes), papers (14,328 nodes), terms (7,723 nodes), and conferences (20 nodes). The authors are divided into four research areas (database, data mining, artificial intelligence, information retrieval). Each author is described by a bag-of-words representation of their paper keywords.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class MovieLens(root, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, model_name: Optional[str] = 'all-MiniLM-L6-v2')[source]
A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens web site, consisting of nodes of type
"movie"
and"user"
. User ratings for movies are available as ground truth labels for the edges between the users and the movies("user", "rates", "movie")
.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)model_name (str) – Name of model used to transform movie titles to node features. The model comes from the`Huggingface SentenceTransformer <https://huggingface.co/sentence-transformers>`_.
- class IMDB(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A subset of the Internet Movie Database (IMDB), as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. IMDB is a heterogeneous graph containing three types of entities - movies (4,278 nodes), actors (5,257 nodes), and directors (2,081 nodes). The movies are divided into three classes (action, comedy, drama) according to their genre. Movie features correspond to elements of a bag-of-words representation of its plot keywords.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class LastFM(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A subset of the last.fm music website keeping track of users’ listining information from various sources, as collected in the “MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding” paper. last.fm is a heterogeneous graph containing three types of entities - users (1,892 nodes), artists (17,632 nodes), and artist tags (1,088 nodes). This dataset can be used for link prediction, and no labels or features are provided.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class HGBDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A variety of heterogeneous graph benchmark datasets from the “Are We Really Making Much Progress? Revisiting, Benchmarking, and Refining Heterogeneous Graph Neural Networks” paper.
Note
Test labels are randomly given to prevent data leakage issues. If you want to obtain final test performance, you will need to submit your model predictions to the HGB leaderboard.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of
"ACM"
,"DBLP"
,"Freebase"
,"IMDB"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class JODIEDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
- class MixHopSyntheticDataset(root: str, homophily: float, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The MixHop synthetic dataset from the “MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing” paper, containing 10 graphs, each with varying degree of homophily (ranging from 0.0 to 0.9). All graphs have 5,000 nodes, where each node corresponds to 1 out of 10 classes. The feature values of the nodes are sampled from a 2D Gaussian distribution, which are distinct for each class.
- Parameters
root (string) – Root directory where the dataset should be saved.
homophily (float) – The degree of homophily (one of
0.0
,0.1
, …,0.9
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class UPFD(root: str, name: str, feature: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The tree-structured fake news propagation graph classification dataset from the “User Preference-aware Fake News Detection” paper. It includes two sets of tree-structured fake & real news propagation graphs extracted from Twitter. For a single graph, the root node represents the source news, and leaf nodes represent Twitter users who retweeted the same root news. A user node has an edge to the news node if and only if the user retweeted the root news directly. Two user nodes have an edge if and only if one user retweeted the root news from the other user. Four different node features are encoded using different encoders. Please refer to GNN-FakeNews repo for more details.
Note
For an example of using UPFD, see examples/upfd.py.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the graph set (
"politifact"
,"gossipcop"
).feature (string) – The node feature type (
"profile"
,"spacy"
,"bert"
,"content"
). If set to"profile"
, the 10-dimensional node feature is composed of ten Twitter user profile attributes. If set to"spacy"
, the 300-dimensional node feature is composed of Twitter user historical tweets encoded by the spaCy word2vec encoder. If set to"bert"
, the 768-dimensional node feature is composed of Twitter user historical tweets encoded by the bert-as-service. If set to"content"
, the 310-dimensional node feature is composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector.split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class GitHub(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The GitHub Web and ML Developers dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent developers on GitHub and edges are mutual follower relationships. It contains 37,300 nodes, 578,006 edges, 128 node features and 2 classes.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class FacebookPagePage(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Facebook Page-Page network dataset introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent verified pages on Facebook and edges are mutual likes. It contains 22,470 nodes, 342,004 edges, 128 node features and 4 classes.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class LastFMAsia(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The LastFM Asia Network dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent LastFM users from Asia and edges are friendships. It contains 7,624 nodes, 55,612 edges, 128 node features and 18 classes.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class DeezerEurope(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Deezer Europe dataset introduced in the “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models” paper. Nodes represent European users of Deezer and edges are mutual follower relationships. It contains 28,281 nodes, 185,504 edges, 128 node features and 2 classes.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class GemsecDeezer(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Deezer User Network datasets introduced in the “GEMSEC: Graph Embedding with Self Clustering” paper. Nodes represent Deezer user and edges are mutual friendships. The task is multi-label multi-class node classification about the genres liked by the users.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"HU"
,"HR"
,"RO"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class Twitch(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Twitch Gamer networks introduced in the “Multi-scale Attributed Node Embedding” paper. Nodes represent gamers on Twitch and edges are followerships between them. Node features represent embeddings of games played by the Twitch users. The task is to predict whether a user streams mature content.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"DE"
,"EN"
,"ES"
,"FR"
,"PT"
,"RU"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class Airports(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Airports dataset from the “struc2vec: Learning Node Representations from Structural Identity” paper, where nodes denote airports and labels correspond to activity levels. Features are given by one-hot encoded node identifiers, as described in the “GraLSP: Graph Neural Networks with Local Structural Patterns” ` paper.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"USA"
,"Brazil"
,"Europe"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class BAShapes(connection_distribution: str = 'random', transform: Optional[Callable] = None)[source]
The BA-Shapes dataset from the “GNNExplainer: Generating Explanations for Graph Neural Networks” paper, containing a Barabasi-Albert (BA) graph with 300 nodes and a set of 80 “house”-structured graphs connected to it.
- Parameters
connection_distribution (string, optional) – Specifies how the houses and the BA graph get connected. Valid inputs are
"random"
(random BA graph nodes are selected for connection to the houses), and"uniform"
(uniformly distributed BA graph nodes are selected for connection to the houses). (default:"random"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)
- class LRGBDataset(root: str, name: str, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The “Long Range Graph Benchmark (LRGB)” datasets which is a collection of 5 graph learning datasets with tasks that are based on long-range dependencies in graphs. See the original source code for more details on the individual datasets.
Dataset
Domain
Task
PascalVOC-SP
Computer Vision
Node Classification
COCO-SP
Computer Vision
Node Classification
PCQM-Contact
Quantum Chemistry
Link Prediction
Peptides-func
Chemistry
Graph Classification
Peptides-struct
Chemistry
Graph Regression
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (one of
"PascalVOC-SP"
,"COCO-SP"
,"PCQM-Contact"
,"Peptides-func"
,"Peptides-struct"
)split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"test"
, loads the test dataset. (default:"train"
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- Stats:
Name
#graphs
#nodes
#edges
#classes
PascalVOC-SP
11,355
~479.40
~2,710.48
21
COCO-SP
123,286
~476.88
~2,693.67
81
PCQM-Contact
529,434
~30.14
~61.09
1
Peptides-func
15,535
~150.94
~307.30
10
Peptides-struct
15,535
~150.94
~307.30
11
- class MalNetTiny(root: str, split: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The MalNet Tiny dataset from the “A Large-Scale Database for Graph Representation Learning” paper.
MalNetTiny
contains 5,000 malicious and benign software function call graphs across 5 different types. Each graph contains at most 5k nodes.- Parameters
root (string) – Root directory where the dataset should be saved.
split (string, optional) – If
"train"
, loads the training dataset. If"val"
, loads the validation dataset. If"trainval"
, loads the training and validation dataset. If"test"
, loads the test dataset. IfNone
, loads the entire dataset. (default:None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class OMDB(root: str, train: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
The Organic Materials Database (OMDB) of bulk organic crystals.
- Parameters
root (string) – Root directory where the dataset should be saved.
train (bool, optional) – If
True
, loads the training dataset, otherwise the test dataset. (default:True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
- class PolBlogs(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Political Blogs dataset from the “The Political Blogosphere and the 2004 US Election: Divided they Blog” paper.
Polblogs
is a graph with 1,490 vertices (representing political blogs) and 19,025 edges (links between blogs). The links are automatically extracted from a crawl of the front page of the blog. Each vertex receives a label indicating the political leaning of the blog: liberal or conservative.- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
1,490
19,025
0
2
- class EmailEUCore(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
An e-mail communication network of a large European research institution, taken from the “Local Higher-order Graph Clustering” paper. Nodes indicate members of the institution. An edge between a pair of members indicates that they exchanged at least one email. Node labels indicate membership to one of the 42 departments.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class StochasticBlockModelDataset(root: str, block_sizes: Union[List[int], Tensor], edge_probs: Union[List[List[float]], Tensor], num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]
A synthetic graph dataset generated by the stochastic block model. The node features of each block are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the
sklearn.datasets.make_classification()
method.- Parameters
root (string) – Root directory where the dataset should be saved.
block_sizes ([int] or LongTensor) – The sizes of blocks.
edge_probs ([[float]] or FloatTensor) – The density of edges going from each block to each other block. Must be symmetric if the graph is undirected.
num_channels (int, optional) – The number of node features. If given as
None
, node features are not generated. (default:None
)is_undirected (bool, optional) – Whether the graph to generate is undirected. (default:
True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)**kwargs (optional) – The keyword arguments that are passed down to the
sklearn.datasets.make_classification()
method for drawing node features.
- class RandomPartitionGraphDataset(root, num_classes: int, num_nodes_per_class: int, node_homophily_ratio: float, average_degree: float, num_channels: Optional[int] = None, is_undirected: bool = True, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, **kwargs)[source]
The random partition graph dataset from the “How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision” paper. This is a synthetic graph of communities controlled by the node homophily and the average degree, and each community is considered as a class. The node features are sampled from normal distributions where the centers of clusters are vertices of a hypercube, as computed by the
sklearn.datasets.make_classification()
method.- Parameters
root (string) – Root directory where the dataset should be saved.
num_classes (int) – The number of classes.
num_nodes_per_class (int) – The number of nodes per class.
node_homophily_ratio (float) – The degree of node homophily.
average_degree (float) – The average degree of the graph.
num_channels (int, optional) – The number of node features. If given as
None
, node features are not generated. (default:None
)is_undirected (bool, optional) – Whether the graph to generate is undirected. (default:
True
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)**kwargs (optional) – The keyword arguments that are passed down to
sklearn.datasets.make_classification()
method in drawing node features.
- class LINKXDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
A variety of non-homophilous graph datasets from the “Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods” paper.
Note
Some of the datasets provided in
LINKXDataset
are from other sources, but have been updated with new features and/or labels.- Parameters
root (string) – Root directory where the dataset should be saved.
name (string) – The name of the dataset (
"penn94"
,"reed98"
,"amherst41"
,"cornell5"
,"johnshopkins55"
,"genius"
).transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- class EllipticBitcoinDataset(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The Elliptic Bitcoin dataset of Bitcoin transactions from the “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics” paper.
EllipticBitcoinDataset
maps Bitcoin transactions to real entities belonging to licit categories (exchanges, wallet providers, miners, licit services, etc.) versus illicit ones (scams, malware, terrorist organizations, ransomware, Ponzi schemes, etc.)There exists 203,769 node transactions and 234,355 directed edge payments flows, with two percent of nodes (4,545) labelled as illicit, and twenty-one percent of nodes (42,019) labelled as licit. The remaining transactions are unknown.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
203,769
234,355
165
2
- class DGraphFin(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
The DGraphFin networks from the “DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection” paper. It is a directed, unweighted dynamic graph consisting of millions of nodes and edges, representing a realistic user-to-user social network in financial industry. Node represents a Finvolution user, and an edge from one user to another means that the user regards the other user as the emergency contact person. Each edge is associated with a timestamp ranging from 1 to 821 and a type of emergency contact ranging from 0 to 11.
- Parameters
root (string) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)
- Stats:
#nodes
#edges
#features
#classes
3,700,550
4,300,999
17
2
- class HydroNet(root: str, name: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, num_workers: int = 8, clusters: Optional[Union[int, List[int]]] = None, use_processed: bool = True)[source]
The HydroNet dataest from the “HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data” paper, consisting of 5 million water clusters held together by hydrogen bonding networks. This dataset provides atomic coordinates and total energy in kcal/mol for the cluster.
- Parameters
root (string) – Root directory where the dataset should be saved.
name (string, optional) – Name of the subset of the full dataset to use:
"small"
uses 500k graphs sampled from the"medium"
dataset,"medium"
uses 2.7m graphs with maximum size of 75 nodes. Mutually exclusive option with the clusters argument. (defaultNone
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)num_workers (int) – Number of multiprocessing workers to use for pre-processing the dataset. (default
8
)clusters (int or List[int], optional) – Select a subset of clusters from the full dataset. If set to
None
, will select all. (defaultNone
)use_processed (bool) – Option to use a pre-processed version of the original
xyz
dataset. (default:True
)