torch_geometric.data

class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, norm=None, face=None, **kwargs)[source]

A plain old python object modeling a single graph with various (optional) attributes:

Parameters:
  • x (Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)
  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)
  • edge_attr (Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)
  • y (Tensor, optional) – Graph or node targets with arbitrary shape. (default: None)
  • pos (Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)
  • norm (Tensor, optional) – Normal vector matrix with shape [num_nodes, num_dimensions]. (default: None)
  • face (LongTensor, optional) – Face adjacency matrix with shape [3, num_faces]. (default: None)

The data object is not restricted to these attributes and can be extented by any other additional data.

Example:

data = Data(x=x, edge_index=edge_index)
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)
__call__(*keys)[source]

Iterates over all attributes *keys in the data, yielding their attribute names and content. If *keys is not given this method will iterative over all present attributes.

__cat_dim__(key, value)[source]

Returns the dimension for which value of attribute key will get concatenated when creating batches.

Note

This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

__contains__(key)[source]

Returns True, if the attribute key is present in the data.

__getitem__(key)[source]

Gets the data of the attribute key.

__inc__(key, value)[source]

“Returns the incremental count to cumulatively increase the value of the next attribute of key when creating batches.

Note

This method is for internal use only, and should only be overridden if the batch concatenation process is corrupted for a specific data attribute.

__iter__()[source]

Iterates over all present attributes in the data, yielding their attribute names and content.

__len__()[source]

Returns the number of all present attributes.

__setitem__(key, value)[source]

Sets the attribute key to value.

apply(func, *keys)[source]

Applies the function func to all tensor attributes *keys. If *keys is not given, func is applied to all present attributes.

clone()[source]
coalesce()[source]

“Orders and removes duplicated entries from edge indices.

contains_isolated_nodes()[source]

Returns True, if the graph contains isolated nodes.

contains_self_loops()[source]

Returns True, if the graph contains self-loops.

contiguous(*keys)[source]

Ensures a contiguous memory layout for all attributes *keys. If *keys is not given, all present attributes are ensured to have a contiguous memory layout.

debug()[source]
classmethod from_dict(dictionary)[source]

Creates a data object from a python dictionary.

is_coalesced()[source]

Returns True, if edge indices are ordered and do not contain duplicate entries.

is_directed()[source]

Returns True, if graph edges are directed.

is_undirected()[source]

Returns True, if graph edges are undirected.

keys

Returns all names of graph attributes.

num_edge_features

Returns the number of features per edge in the graph.

num_edges

Returns the number of edges in the graph.

num_faces

Returns the number of faces in the mesh.

num_features

Alias for num_node_features.

num_node_features

Returns the number of features per node in the graph.

num_nodes

Returns or sets the number of nodes in the graph.

Note

The number of nodes in your data object is typically automatically inferred, e.g., when node features x are present. In some cases however, a graph may only be given by its edge indices edge_index. PyTorch Geometric then guesses the number of nodes according to edge_index.max().item() + 1, but in case there exists isolated nodes, this number has not to be correct and can therefore result in unexpected batch-wise behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

to(device, *keys)[source]

Performs tensor dtype and/or device conversion to all attributes *keys. If *keys is not given, the conversion is applied to all present attributes.

class Batch(batch=None, **kwargs)[source]

A plain old python object modeling a batch of graphs as one big (dicconnected) graph. With torch_geometric.data.Data being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vector batch, which maps each node to its respective graph identifier.

static from_data_list(data_list, follow_batch=[])[source]

Constructs a batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly. Additionally, creates assignment batch vectors for each key in follow_batch.

num_graphs

Returns the number of graphs in the batch.

to_data_list()[source]

Reconstructs the list of torch_geometric.data.Data objects from the batch object. The batch object must have been created via from_data_list() in order to be able reconstruct the initial objects.

class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None)[source]

Dataset base class for creating graph datasets. See here for the accompanying tutorial.

Parameters:
  • root (string, optional) – Root directory where the dataset should be saved. (optional: None)
  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
__getitem__(idx)[source]

Gets the data object at index idx and transforms it (in case a self.transform is given). In case idx is a slicing object, e.g., [2:5], a list, a tuple, a LongTensor or a BoolTensor, will return a subset of the dataset at the specified indices.

__len__()[source]

The number of examples in the dataset.

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

index_select(idx)[source]
indices()[source]
len()[source]
num_edge_features

Returns the number of features per edge in the dataset.

num_features

Alias for num_node_features.

num_node_features

Returns the number of features per node in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_dir
processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

processed_paths

The filepaths to find in the self.processed_dir folder in order to skip the processing.

raw_dir
raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

raw_paths

The filepaths to find in order to skip the download.

shuffle(return_perm=False)[source]

Randomly shuffles the examples in the dataset.

Parameters:return_perm (bool, optional) – If set to True, will additionally return the random permutation used to shuffle the dataset. (default: False)
class InMemoryDataset(root=None, transform=None, pre_transform=None, pre_filter=None)[source]

Dataset base class for creating graph datasets which fit completely into memory. See here for the accompanying tutorial.

Parameters:
  • root (string, optional) – Root directory where the dataset should be saved. (default: None)
  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
collate(data_list)[source]

Collates a python list of data objects to the internal storage format of torch_geometric.data.InMemoryDataset.

download()[source]

Downloads the dataset to the self.raw_dir folder.

get(idx)[source]

Gets the data object at index idx.

len()[source]
num_classes

The number of classes in the dataset.

process()[source]

Processes the dataset to the self.processed_dir folder.

processed_file_names

The name of the files to find in the self.processed_dir folder in order to skip the processing.

raw_file_names

The name of the files to find in the self.raw_dir folder in order to skip the download.

class DataLoader(dataset, batch_size=1, shuffle=False, follow_batch=[], **kwargs)[source]

Data loader which merges data objects from a torch_geometric.data.dataset to a mini-batch.

Parameters:
  • dataset (Dataset) – The dataset from which to load the data.
  • batch_size (int, optional) – How many samples per batch to load. (default: 1)
  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)
  • follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default: [])
class DataListLoader(dataset, batch_size=1, shuffle=False, **kwargs)[source]

Data loader which merges data objects from a torch_geometric.data.dataset to a python list.

Note

This data loader should be used for multi-gpu support via torch_geometric.nn.DataParallel.

Parameters:
  • dataset (Dataset) – The dataset from which to load the data.
  • batch_size (int, optional) – How many samples per batch to load. (default: 1)
  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch (default: False)
class DenseDataLoader(dataset, batch_size=1, shuffle=False, **kwargs)[source]

Data loader which merges data objects from a torch_geometric.data.dataset to a mini-batch.

Note

To make use of this data loader, all graphs in the dataset needs to have the same shape for each its attributes. Therefore, this data loader should only be used when working with dense adjacency matrices.

Parameters:
  • dataset (Dataset) – The dataset from which to load the data.
  • batch_size (int, optional) – How many samples per batch to load. (default: 1)
  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch (default: False)
class NeighborSampler(data, size, num_hops, batch_size=1, shuffle=False, drop_last=False, bipartite=True, add_self_loops=False, flow='source_to_target')[source]

The neighbor sampler from the “Inductive Representation Learning on Large Graphs” paper which iterates over graph nodes in a mini-batch fashion and constructs sampled subgraphs of size num_hops.

It returns a generator of DataFlow that defines the message passing flow to the root nodes via a list of num_hops bipartite graph objects edge_index and the initial start nodes n_id.

Parameters:
  • data (torch_geometric.data.Data) – The graph data object.
  • size (int or float or [int] or [float]) – The number of neighbors to sample (for each layer). The value of this parameter can be either set to be the same for each neighborhood or percentage-based.
  • num_hops (int) – The number of layers to sample.
  • batch_size (int, optional) – How many samples per batch to load. (default: 1)
  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)
  • drop_last (bool, optional) – If set to True, will drop the last incomplete batch if the number of nodes is not divisible by the batch size. If set to False and the size of graph is not divisible by the batch size, the last batch will be smaller. (default: False)
  • bipartite (bool, optional) – If set to False, will not return a generator of DataFlow to mark the computation flow, but instead will return a torch_geometric.data.Data object holding the subgraph information around each mini-batch. If set to False, the add_self_loops option is ignored. (default: True)
  • add_self_loops (bool, optional) – If set to True, will add self-loops to each sampled neigborhood. (default: False)
  • flow (string, optional) – The flow direction of message passing ("source_to_target" or "target_to_source"). (default: "source_to_target")
__call__(subset=None)[source]

Returns a generator of DataFlow that iterates over the nodes in subset in a mini-batch fashion.

Parameters:subset (LongTensor or BoolTensor, optional) – The initial nodes to propagete messages to. If set to None, will iterate over all nodes in the graph. (default: None)
__get_batches__(subset=None)[source]

Returns a list of mini-batches from the initial nodes in subset.

__produce_bipartite_data_flow__(n_id)[source]

Produces a DataFlow object with a bipartite assignment matrix for a given mini-batch n_id.

__produce_subgraph__(b_id)[source]

Produces a Data object holding the subgraph data for a given mini-batch b_id.

class ClusterData(data, num_parts, recursive=False, save_dir=None)[source]

Clusters/partitions a graph data object into multiple subgraphs, as motivated by the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper.

Parameters:
  • data (torch_geometric.data.Data) – The graph data object.
  • num_parts (int) – The number of partitions.
  • recursive (bool, optional) – If set to True, will use multilevel recursive bisection instead of multilevel k-way partitioning. (default: False)
  • save_dir (string, optional) – If set, will save the partitioned data to the save_dir directory for faster re-use.
__getitem__(idx)[source]
__len__()[source]
permute_data(data, perm, adj)[source]
class ClusterLoader(cluster_data, batch_size=1, shuffle=False, **kwargs)[source]

The data loader scheme from the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper which merges partioned subgraphs and their between-cluster links from a large-scale graph data object to form a mini-batch.

Parameters:
  • cluster_data (torch_geometric.data.ClusterData) – The already partioned data object.
  • batch_size (int, optional) – How many samples per batch to load. (default: 1)
  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)
download_url(url, folder, log=True)[source]

Downloads the content of an URL to a specific folder.

Parameters:
  • url (string) – The url.
  • folder (string) – The folder.
  • log (bool, optional) – If False, will not print anything to the console. (default: True)
extract_tar(path, folder, mode='r:gz', log=True)[source]

Extracts a tar archive to a specific folder.

Parameters:
  • path (string) – The path to the tar archive.
  • folder (string) – The folder.
  • mode (string, optional) – The compression mode. (default: "r:gz")
  • log (bool, optional) – If False, will not print anything to the console. (default: True)
extract_zip(path, folder, log=True)[source]

Extracts a zip archive to a specific folder.

Parameters:
  • path (string) – The path to the tar archive.
  • folder (string) – The folder.
  • log (bool, optional) – If False, will not print anything to the console. (default: True)
extract_bz2(path, folder, log=True)[source]
extract_gz(path, folder, log=True)[source]