torch_geometric.data¶

class
Batch
(batch=None, **kwargs)[source]¶ A plain old python object modeling a batch of graphs as one big (disconnected) graph. With
torch_geometric.data.Data
being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vectorbatch
, which maps each node to its respective graph identifier.
static
from_data_list
(data_list, follow_batch=[])[source]¶ Constructs a batch object from a python list holding
torch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly. Additionally, creates assignment batch vectors for each key infollow_batch
.

property
num_graphs
¶ Returns the number of graphs in the batch.

to_data_list
()[source]¶ Reconstructs the list of
torch_geometric.data.Data
objects from the batch object. The batch object must have been created viafrom_data_list()
in order to be able to reconstruct the initial objects.

static

class
ClusterData
(data, num_parts: int, recursive: bool = False, save_dir: Optional[str] = None, log: bool = True)[source]¶ Clusters/partitions a graph data object into multiple subgraphs, as motivated by the “ClusterGCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper.
 Parameters
data (torch_geometric.data.Data) – The graph data object.
num_parts (int) – The number of partitions.
recursive (bool, optional) – If set to
True
, will use multilevel recursive bisection instead of multilevel kway partitioning. (default:False
)save_dir (string, optional) – If set, will save the partitioned data to the
save_dir
directory for faster reuse. (default:None
)log (bool, optional) – If set to
False
, will not log any progress. (default:True
)

class
ClusterLoader
(cluster_data, **kwargs)[source]¶ The data loader scheme from the “ClusterGCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper which merges partioned subgraphs and their betweencluster links from a largescale graph data object to form a minibatch.
Note
Use
torch_geometric.data.ClusterData
andtorch_geometric.data.ClusterLoader
in conjunction to form minibatches of clusters. For an example of using ClusterGCN, see examples/cluster_gcn_reddit.py or examples/cluster_gcn_ppi.py. Parameters
cluster_data (torch_geometric.data.ClusterData) – The already partioned data object.
**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader
, such asbatch_size
,shuffle
,drop_last
ornum_workers
.

class
Data
(x=None, edge_index=None, edge_attr=None, y=None, pos=None, normal=None, face=None, **kwargs)[source]¶ A plain old python object modeling a single graph with various (optional) attributes:
 Parameters
x (Tensor, optional) – Node feature matrix with shape
[num_nodes, num_node_features]
. (default:None
)edge_index (LongTensor, optional) – Graph connectivity in COO format with shape
[2, num_edges]
. (default:None
)edge_attr (Tensor, optional) – Edge feature matrix with shape
[num_edges, num_edge_features]
. (default:None
)y (Tensor, optional) – Graph or node targets with arbitrary shape. (default:
None
)pos (Tensor, optional) – Node position matrix with shape
[num_nodes, num_dimensions]
. (default:None
)normal (Tensor, optional) – Normal vector matrix with shape
[num_nodes, num_dimensions]
. (default:None
)face (LongTensor, optional) – Face adjacency matrix with shape
[3, num_faces]
. (default:None
)
The data object is not restricted to these attributes and can be extented by any other additional data.
Example:
data = Data(x=x, edge_index=edge_index) data.train_idx = torch.tensor([...], dtype=torch.long) data.test_mask = torch.tensor([...], dtype=torch.bool)

__call__
(*keys)[source]¶ Iterates over all attributes
*keys
in the data, yielding their attribute names and content. If*keys
is not given this method will iterative over all present attributes.

__cat_dim__
(key, value)[source]¶ Returns the dimension for which
value
of attributekey
will get concatenated when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

__inc__
(key, value)[source]¶ Returns the incremental count to cumulatively increase the value of the next attribute of
key
when creating batches.Note
This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

__iter__
()[source]¶ Iterates over all present attributes in the data, yielding their attribute names and content.

apply
(func, *keys)[source]¶ Applies the function
func
to all tensor attributes*keys
. If*keys
is not given,func
is applied to all present attributes.

contiguous
(*keys)[source]¶ Ensures a contiguous memory layout for all attributes
*keys
. If*keys
is not given, all present attributes are ensured to have a contiguous memory layout.

is_coalesced
()[source]¶ Returns
True
, if edge indices are ordered and do not contain duplicate entries.

property
keys
¶ Returns all names of graph attributes.

property
num_edge_features
¶ Returns the number of features per edge in the graph.

property
num_edges
¶ Returns the number of edges in the graph.

property
num_faces
¶ Returns the number of faces in the mesh.

property
num_features
¶ Alias for
num_node_features
.

property
num_node_features
¶ Returns the number of features per node in the graph.

property
num_nodes
¶ Returns or sets the number of nodes in the graph.
Note
The number of nodes in your data object is typically automatically inferred, e.g., when node features
x
are present. In some cases however, a graph may only be given by its edge indicesedge_index
. PyTorch Geometric then guesses the number of nodes according toedge_index.max().item() + 1
, but in case there exists isolated nodes, this number has not to be correct and can therefore result in unexpected batchwise behavior. Thus, we recommend to set the number of nodes in your data object explicitly viadata.num_nodes = ...
. You will be given a warning that requests you to do so.

class
DataListLoader
(dataset, batch_size=1, shuffle=False, **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a python list.Note
This data loader should be used for multigpu support via
torch_geometric.nn.DataParallel
.

class
DataLoader
(dataset, batch_size=1, shuffle=False, follow_batch=[], **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a minibatch. Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1
)shuffle (bool, optional) – If set to
True
, the data will be reshuffled at every epoch. (default:False
)follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default:
[]
)

class
Dataset
(root=None, transform=None, pre_transform=None, pre_filter=None)[source]¶ Dataset base class for creating graph datasets. See here for the accompanying tutorial.
 Parameters
root (string, optional) – Root directory where the dataset should be saved. (optional:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)

__getitem__
(idx)[source]¶ Gets the data object at index
idx
and transforms it (in case aself.transform
is given). In caseidx
is a slicing object, e.g.,[2:5]
, a list, a tuple, a LongTensor or a BoolTensor, will return a subset of the dataset at the specified indices.

property
num_edge_features
¶ Returns the number of features per edge in the dataset.

property
num_features
¶ Alias for
num_node_features
.

property
num_node_features
¶ Returns the number of features per node in the dataset.

property
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.

property
processed_paths
¶ The filepaths to find in the
self.processed_dir
folder in order to skip the processing.

property
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.

property
raw_paths
¶ The filepaths to find in order to skip the download.

class
DenseDataLoader
(dataset, batch_size=1, shuffle=False, **kwargs)[source]¶ Data loader which merges data objects from a
torch_geometric.data.dataset
to a minibatch.Note
To make use of this data loader, all graphs in the dataset needs to have the same shape for each its attributes. Therefore, this data loader should only be used when working with dense adjacency matrices.

class
GraphSAINTEdgeSampler
(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]¶ The GraphSAINT edge sampler class (see
torch_geometric.data.GraphSAINTSampler
).

class
GraphSAINTNodeSampler
(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]¶ The GraphSAINT node sampler class (see
torch_geometric.data.GraphSAINTSampler
).

class
GraphSAINTRandomWalkSampler
(data, batch_size: int, walk_length: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]¶ The GraphSAINT random walk sampler class (see
torch_geometric.data.GraphSAINTSampler
). Parameters
walk_length (int) – The length of each random walk.

class
GraphSAINTSampler
(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]¶ The GraphSAINT sampler base class from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper. Given a graph in a
data
object, this class samples nodes and constructs subgraphs that can be processed in a minibatch fashion. Normalization coefficients for each minibatch are given vianode_norm
andedge_norm
data attributes.Note
See
torch_geometric.data.GraphSAINTNodeSampler
,torch_geometric.data.GraphSAINTEdgeSampler
andtorch_geometric.data.GraphSAINTRandomWalkSampler
for currently supported samplers. For an example of using GraphSAINT sampling, see examples/graph_saint.py. Parameters
data (torch_geometric.data.Data) – The graph data object.
batch_size (int) – The approximate number of samples per batch.
num_steps (int, optional) – The number of iterations per epoch. (default:
1
)sample_coverage (int) – How many samples per node should be used to compute normalization statistics. (default:
0
)save_dir (string, optional) – If set, will save normalization statistics to the
save_dir
directory for faster reuse. (default:None
)log (bool, optional) – If set to
False
, will not log any preprocessing progress. (default:True
)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader
, such asbatch_size
ornum_workers
.

class
InMemoryDataset
(root=None, transform=None, pre_transform=None, pre_filter=None)[source]¶ Dataset base class for creating graph datasets which fit completely into CPU memory. See here for the accompanying tutorial.
 Parameters
root (string, optional) – Root directory where the dataset should be saved. (default:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)

static
collate
(data_list)[source]¶ Collates a python list of data objects to the internal storage format of
torch_geometric.data.InMemoryDataset
.

property
num_classes
¶ The number of classes in the dataset.

property
processed_file_names
¶ The name of the files to find in the
self.processed_dir
folder in order to skip the processing.

property
raw_file_names
¶ The name of the files to find in the
self.raw_dir
folder in order to skip the download.

class
NeighborSampler
(edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], sizes: List[int], node_idx: Optional[torch.Tensor] = None, num_nodes: Optional[int] = None, return_e_id: bool = True, **kwargs)[source]¶ The neighbor sampler from the “Inductive Representation Learning on Large Graphs” paper, which allows for minibatch training of GNNs on largescale graphs where fullbatch training is not feasible.
Given a GNN with \(L\) layers and a specific minibatch of nodes
node_idx
for which we want to compute embeddings, this module iteratively samples neighbors and constructs bipartite graphs that simulate the actual computation flow of GNNs.More specifically,
sizes
denotes how much neighbors we want to sample for each node in each layer. This module then takes in thesesizes
and iteratively samplessizes[l]
for each node involved in layerl
. In the next layer, sampling is repeated for the union of nodes that were already encountered. The actual computation graphs are then returned in reversemode, meaning that we pass messages from a larger set of nodes to a smaller one, until we reach the nodes for which we originally wanted to compute embeddings.Hence, an item returned by
NeighborSampler
holds the currentbatch_size
, the IDsn_id
of all nodes involved in the computation, and a list of bipartite graph objects via the tuple(edge_index, e_id, size)
, whereedge_index
represents the bipartite edges between source and target nodes,e_id
denotes the IDs of original edges in the full graph, andsize
holds the shape of the bipartite graph. For each bipartite graph, target nodes are also included at the beginning of the list of source nodes so that one can easily apply skipconnections or add selfloops.Note
For an example of using
NeighborSampler
, see examples/reddit.py or examples/ogbn_products_sage.py. Parameters
edge_index (Tensor or SparseTensor) – A
torch.LongTensor
or atorch_sparse.SparseTensor
that defines the underlying graph connectivity/message passing flow.edge_index
holds the indices of a (sparse) symmetric adjacency matrix. Ifedge_index
is of typetorch.LongTensor
, its shape must be defined as[2, num_edges]
, where messages from nodesedge_index[0]
are sent to nodes inedge_index[1]
(in caseflow="source_to_target"
). Ifedge_index
is of typetorch_sparse.SparseTensor
, its sparse indices(row, col)
should relate torow = edge_index[1]
andcol = edge_index[0]
. The major difference between both formats is that we need to input the transposed sparse adjacency matrix.size ([int]) – The number of neighbors to sample for each node in each layer. If set to
sizes[l] = 1
, all neighbors are included in layerl
.node_idx (LongTensor, optional) – The nodes that should be considered for creating minibatches. If set to
None
, all nodes will be considered.num_nodes (int, optional) – The number of nodes in the graph. (default:
None
)return_e_id (bool, optional) – If set to
False
, will not return original edge indices of sampled edges. This is only useful in case when operating on graphs without edge features to save memory. (default:True
)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader
, such asbatch_size
,shuffle
,drop_last
ornum_workers
.

class
RandomNodeSampler
(data, num_parts: int, shuffle: bool = False, **kwargs)[source]¶ A data loader that randomly samples nodes within a graph and returns their induced subgraph.
Note
For an example of using
RandomNodeSampler
, see examples/ogbn_proteins_deepgcn.py. Parameters
data (torch_geometric.data.Data) – The graph data object.
num_parts (int) – The number of partitions.
shuffle (bool, optional) – If set to
True
, the data is reshuffled at every epoch (default:False
).**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader
, such asnum_workers
.