torch_geometric.graphgym

Workflow and Register Modules

load_ckpt

Load latest model checkpoint

save_ckpt

Save model checkpoint at given epoch

clean_ckpt

Only keep the latest model checkpoint, remove all the older checkpoints

parse_args

Parses the command line arguments.

cfg

CfgNode represents an internal node in the configuration tree.

set_cfg

This function sets the default config value.

load_cfg

Load configurations from file system and command line

dump_cfg

Dumps the config to the output directory specified in cfg.out_dir

set_run_dir

Create the directory for each random seed experiment run

set_agg_dir

Create the directory for aggregated results over all the random seeds

get_fname

Extract filename from file name path

init_weights

Performs weight initialization

create_loader

Create data loader object

set_printing

Set up printing options

create_logger

Create logger for the experiment

compute_loss

Compute loss and prediction score

create_model

Create model for graph machine learning

create_optimizer

Create optimizer for the model

create_scheduler

Create learning rate scheduler for the optimizer

train

The core training pipeline

register_base

Base function for registering a module in GraphGym.

register_act

Registers an activation function in GraphGym.

register_node_encoder

Registers a node feature encoder in GraphGym.

register_edge_encoder

Registers an edge feature encoder in GraphGym.

register_stage

Registers a customized GNN stage in GraphGym.

register_head

Registers a GNN prediction head in GraphGym.

register_layer

Registers a GNN layer in GraphGym.

register_pooling

Registers a GNN global pooling/readout layer in GraphGym.

register_network

Registers a GNN model in GraphGym.

register_config

Registers a configuration group in GraphGym.

register_loader

Registers a data loader in GraphGym.

register_optimizer

Registers an optimizer in GraphGym.

register_scheduler

Registers a learning rate scheduler in GraphGym.

register_loss

Registers a loss function in GraphGym.

register_train

Registers a training function in GraphGym.

load_ckpt(model, optimizer=None, scheduler=None)[source]

Load latest model checkpoint

Parameters
  • model (torch.nn.Module) – The model that will be loaded

  • optimizer (torch.optim, optional) – The optimizer that will be loaded

  • scheduler (torch.optim, optional) – The schduler that will be loaded

Returns

Epoch count after loading the model

save_ckpt(model, optimizer, scheduler, epoch)[source]

Save model checkpoint at given epoch

Parameters
  • model (torch.nn.Module) – The model that will be saved

  • optimizer (torch.optim) – The optimizer that will be saved

  • scheduler (torch.optim) – The schduler that will be saved

  • epoch (int) – The epoch when the model is saved

clean_ckpt()[source]

Only keep the latest model checkpoint, remove all the older checkpoints

parse_args()[source]

Parses the command line arguments.

set_cfg(cfg)[source]

This function sets the default config value. 1) Note that for an experiment, only part of the arguments will be used The remaining unused arguments won’t affect anything. So feel free to register any argument in graphgym.contrib.config 2) We support at most two levels of configs, e.g., cfg.dataset.name

Returns

configuration use by the experiment.

load_cfg(cfg, args)[source]

Load configurations from file system and command line

Parameters
  • cfg (CfgNode) – Configuration node

  • args (ArgumentParser) – Command argument parser

dump_cfg(cfg)[source]

Dumps the config to the output directory specified in cfg.out_dir

Parameters

cfg (CfgNode) – Configuration node

set_run_dir(out_dir, fname)[source]

Create the directory for each random seed experiment run

Parameters
  • out_dir (string) – Directory for output, specified in cfg.out_dir

  • fname (string) – Filename for the yaml format configuration file

set_agg_dir(out_dir, fname)[source]

Create the directory for aggregated results over all the random seeds

Parameters
  • out_dir (string) – Directory for output, specified in cfg.out_dir

  • fname (string) – Filename for the yaml format configuration file

get_fname(fname)[source]

Extract filename from file name path

Parameters

fname (string) – Filename for the yaml format configuration file

init_weights(m)[source]

Performs weight initialization

Parameters

m (nn.Module) – PyTorch module

create_loader()[source]

Create data loader object

Returns: List of PyTorch data loaders

set_printing()[source]

Set up printing options

create_logger()[source]

Create logger for the experiment

Returns: List of logger objects

compute_loss(pred, true)[source]

Compute loss and prediction score

Parameters
  • pred (torch.tensor) – Unnormalized prediction

  • true (torch.tensor) – Grou

Returns: Loss, normalized prediction score

create_model(to_device=True, dim_in=None, dim_out=None)[source]

Create model for graph machine learning

Parameters
  • to_device (string) – The devide that the model will be transferred to

  • dim_in (int, optional) – Input dimension to the model

  • dim_out (int, optional) – Output dimension to the model

create_optimizer(params, optimizer_config: torch_geometric.graphgym.optimizer.OptimizerConfig)[source]

Create optimizer for the model

Parameters

params – PyTorch model parameters

Returns: PyTorch optimizer

create_scheduler(optimizer, scheduler_config: torch_geometric.graphgym.optimizer.SchedulerConfig)[source]

Create learning rate scheduler for the optimizer

Parameters

optimizer – PyTorch optimizer

Returns: PyTorch scheduler

train(loggers, loaders, model, optimizer, scheduler)[source]

The core training pipeline

Parameters
  • loggers – List of loggers

  • loaders – List of loaders

  • model – GNN model

  • optimizer – PyTorch optimizer

  • scheduler – PyTorch learning rate scheduler

register_base(mapping: Dict[str, Any], key: str, module: Optional[Any] = None)Union[None, Callable][source]

Base function for registering a module in GraphGym.

Parameters
  • mapping (dict) – Python dictionary to register the module. hosting all the registered modules

  • key (string) – The name of the module.

  • module (any, optional) – The module. If set to None, will return a decorator to register a module.

register_act(key: str, module: Optional[Any] = None)[source]

Registers an activation function in GraphGym.

register_node_encoder(key: str, module: Optional[Any] = None)[source]

Registers a node feature encoder in GraphGym.

register_edge_encoder(key: str, module: Optional[Any] = None)[source]

Registers an edge feature encoder in GraphGym.

register_stage(key: str, module: Optional[Any] = None)[source]

Registers a customized GNN stage in GraphGym.

register_head(key: str, module: Optional[Any] = None)[source]

Registers a GNN prediction head in GraphGym.

register_layer(key: str, module: Optional[Any] = None)[source]

Registers a GNN layer in GraphGym.

register_pooling(key: str, module: Optional[Any] = None)[source]

Registers a GNN global pooling/readout layer in GraphGym.

register_network(key: str, module: Optional[Any] = None)[source]

Registers a GNN model in GraphGym.

register_config(key: str, module: Optional[Any] = None)[source]

Registers a configuration group in GraphGym.

register_loader(key: str, module: Optional[Any] = None)[source]

Registers a data loader in GraphGym.

register_optimizer(key: str, module: Optional[Any] = None)[source]

Registers an optimizer in GraphGym.

register_scheduler(key: str, module: Optional[Any] = None)[source]

Registers a learning rate scheduler in GraphGym.

register_loss(key: str, module: Optional[Any] = None)[source]

Registers a loss function in GraphGym.

register_train(key: str, module: Optional[Any] = None)[source]

Registers a training function in GraphGym.

Model Modules

IntegerFeatureEncoder

Provides an encoder for integer node features.

AtomEncoder

The atom Encoder used in OGB molecule dataset.

BondEncoder

The bond Encoder used in OGB molecule dataset.

GNNLayer

Wrapper for a GNN layer

GNNPreMP

Wrapper for NN layer before GNN message passing

GNNStackStage

Simple Stage that stack GNN layers

FeatureEncoder

Encoding node and edge features

GNN

General GNN model: encoder + stage + head

GNNNodeHead

GNN prediction head for node prediction tasks.

GNNEdgeHead

GNN prediction head for edge/link prediction tasks.

GNNGraphHead

GNN prediction head for graph prediction tasks.

GeneralLayer

General wrapper for layers

GeneralMultiLayer

General wrapper for a stack of multiple layers

Linear

Basic Linear layer.

BatchNorm1dNode

BatchNorm for node feature.

BatchNorm1dEdge

BatchNorm for edge feature.

MLP

Basic MLP model.

GCNConv

Graph Convolutional Network (GCN) layer

SAGEConv

GraphSAGE Conv layer

GATConv

Graph Attention Network (GAT) layer

GINConv

Graph Isomorphism Network (GIN) layer

SplineConv

SplineCNN layer

GeneralConv

A general GNN layer

GeneralEdgeConv

A general GNN layer that supports edge features as well

GeneralSampleEdgeConv

A general GNN layer that supports edge features and edge sampling

global_add_pool

Returns batch-wise graph-level-outputs by adding node features across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

global_mean_pool

Returns batch-wise graph-level-outputs by averaging node features across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

global_max_pool

Returns batch-wise graph-level-outputs by taking the channel-wise maximum across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

class IntegerFeatureEncoder(emb_dim, num_classes=None)[source]

Provides an encoder for integer node features.

Parameters
  • emb_dim (int) – Output embedding dimension

  • num_classes (int) – the number of classes for the

  • mapping to learn from (embedding) –

class AtomEncoder(emb_dim, num_classes=None)[source]

The atom Encoder used in OGB molecule dataset.

Parameters
  • emb_dim (int) – Output embedding dimension

  • num_classes – None

class BondEncoder(emb_dim)[source]

The bond Encoder used in OGB molecule dataset.

Parameters

emb_dim (int) – Output edge embedding dimension

GNNLayer(dim_in, dim_out, has_act=True)[source]

Wrapper for a GNN layer

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • has_act (bool) – Whether has activation function after the layer

GNNPreMP(dim_in, dim_out, num_layers)[source]

Wrapper for NN layer before GNN message passing

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • num_layers (int) – Number of layers

class GNNStackStage(dim_in, dim_out, num_layers)[source]

Simple Stage that stack GNN layers

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • num_layers (int) – Number of GNN layers

class FeatureEncoder(dim_in)[source]

Encoding node and edge features

Parameters

dim_in (int) – Input feature dimension

class GNN(dim_in, dim_out, **kwargs)[source]

General GNN model: encoder + stage + head

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • **kwargs (optional) – Optional additional args

class GNNNodeHead(dim_in, dim_out)[source]

GNN prediction head for node prediction tasks.

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension. For binary prediction, dim_out=1.

class GNNEdgeHead(dim_in, dim_out)[source]

GNN prediction head for edge/link prediction tasks.

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension. For binary prediction, dim_out=1.

class GNNGraphHead(dim_in, dim_out)[source]

GNN prediction head for graph prediction tasks. The optional post_mp layer (specified by cfg.gnn.post_mp) is used to transform the pooled embedding using an MLP.

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension. For binary prediction, dim_out=1.

class GeneralLayer(name, layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

General wrapper for layers

Parameters
  • name (string) – Name of the layer in registered layer_dict

  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • has_act (bool) – Whether has activation after the layer

  • has_bn (bool) – Whether has BatchNorm in the layer

  • has_l2norm (bool) – Wheter has L2 normalization after the layer

  • **kwargs (optional) – Additional args

class GeneralMultiLayer(name, layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

General wrapper for a stack of multiple layers

Parameters
  • name (string) – Name of the layer in registered layer_dict

  • num_layers (int) – Number of layers in the stack

  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • dim_inner (int) – The dimension for the inner layers

  • final_act (bool) – Whether has activation after the layer stack

  • **kwargs (optional) – Additional args

class Linear(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

Basic Linear layer.

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • bias (bool) – Whether has bias term

  • **kwargs (optional) – Additional args

class BatchNorm1dNode(layer_config: torch_geometric.graphgym.models.layer.LayerConfig)[source]

BatchNorm for node feature.

Parameters

dim_in (int) – Input dimension

class BatchNorm1dEdge(layer_config: torch_geometric.graphgym.models.layer.LayerConfig)[source]

BatchNorm for edge feature.

Parameters

dim_in (int) – Input dimension

class MLP(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

Basic MLP model. Here 1-layer MLP is equivalent to a Liner layer.

Parameters
  • dim_in (int) – Input dimension

  • dim_out (int) – Output dimension

  • bias (bool) – Whether has bias term

  • dim_inner (int) – The dimension for the inner layers

  • num_layers (int) – Number of layers in the stack

  • **kwargs (optional) – Additional args

class GCNConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

Graph Convolutional Network (GCN) layer

class SAGEConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

GraphSAGE Conv layer

class GATConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

Graph Attention Network (GAT) layer

class GINConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

Graph Isomorphism Network (GIN) layer

class SplineConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

SplineCNN layer

class GeneralConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

A general GNN layer

class GeneralEdgeConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

A general GNN layer that supports edge features as well

class GeneralSampleEdgeConv(layer_config: torch_geometric.graphgym.models.layer.LayerConfig, **kwargs)[source]

A general GNN layer that supports edge features and edge sampling

global_add_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by adding node features across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

\[\mathbf{r}_i = \sum_{n=1}^{N_i} \mathbf{x}_n\]
Parameters
  • x (Tensor) – Node feature matrix \(\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}\).

  • batch (LongTensor) – Batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each node to a specific example.

  • size (int, optional) – Batch-size \(B\). Automatically calculated if not given. (default: None)

Return type

Tensor

global_mean_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by averaging node features across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

\[\mathbf{r}_i = \frac{1}{N_i} \sum_{n=1}^{N_i} \mathbf{x}_n\]
Parameters
  • x (Tensor) – Node feature matrix \(\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}\).

  • batch (LongTensor) – Batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each node to a specific example.

  • size (int, optional) – Batch-size \(B\). Automatically calculated if not given. (default: None)

Return type

Tensor

global_max_pool(x, batch, size: Optional[int] = None)[source]

Returns batch-wise graph-level-outputs by taking the channel-wise maximum across the node dimension, so that for a single graph \(\mathcal{G}_i\) its output is computed by

\[\mathbf{r}_i = \mathrm{max}_{n=1}^{N_i} \, \mathbf{x}_n\]
Parameters
  • x (Tensor) – Node feature matrix \(\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}\).

  • batch (LongTensor) – Batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each node to a specific example.

  • size (int, optional) – Batch-size \(B\). Automatically calculated if not given. (default: None)

Return type

Tensor

Utility Modules

agg_runs

Aggregate over different random seeds of a single experiment

agg_batch

Aggregate across results from multiple experiments via grid search

params_count

Computes the number of parameters.

match_baseline_cfg

Match the computational budget of a given baseline model.

get_current_gpu_usage

Get the current GPU memory usage.

auto_select_device

Auto select device for the experiment.

is_eval_epoch

Determines if the model should be evaluated at the current epoch.

is_ckpt_epoch

Determines if the model should be evaluated at the current epoch.

dict_to_json

Dump a Python dictionary to JSON file

dict_list_to_json

Dump a list of Python dictionaries to JSON file

dict_to_tb

Add a dictionary of statistics to a Tensorboard writer

makedirs_rm_exist

Make a directory, remove any existing data.

dummy_context

Default context manager that does nothing

agg_runs(dir, metric_best='auto')[source]

Aggregate over different random seeds of a single experiment

Parameters
  • dir (str) – Directory of the results, containing 1 experiment

  • metric_best (str, optional) – The metric for selecting the best

  • performance. Options (validation) – auto, accuracy, auc.

agg_batch(dir, metric_best='auto')[source]

Aggregate across results from multiple experiments via grid search

Parameters
  • dir (str) – Directory of the results, containing multiple experiments

  • metric_best (str, optional) – The metric for selecting the best

  • performance. Options (validation) – auto, accuracy, auc.

params_count(model)[source]

Computes the number of parameters.

Parameters

model (nn.Module) – PyTorch model

match_baseline_cfg(cfg_dict, cfg_dict_baseline, verbose=True)[source]

Match the computational budget of a given baseline model. THe current configuration dictionary will be modifed and returned.

Parameters
  • cfg_dict (dict) – Current experiment’s configuration

  • cfg_dict_baseline (dict) – Baseline configuration

  • verbose (str, optional) – If printing matched paramter conunts

get_current_gpu_usage()[source]

Get the current GPU memory usage.

auto_select_device(memory_max=8000, memory_bias=200, strategy='random')[source]

Auto select device for the experiment. Useful when having multiple GPUs.

Parameters
  • memory_max (int) – Threshold of existing GPU memory usage. GPUs with

  • usage beyond this threshold will be deprioritized. (memory) –

  • memory_bias (int) – A bias GPU memory usage added to all the GPUs.

  • dvided by zero error. (Avoild) –

  • strategy (str, optional) – ‘random’ (random select GPU) or ‘greedy’

  • select GPU) ((greedily) –

is_eval_epoch(cur_epoch)[source]

Determines if the model should be evaluated at the current epoch.

is_ckpt_epoch(cur_epoch)[source]

Determines if the model should be evaluated at the current epoch.

dict_to_json(dict, fname)[source]

Dump a Python dictionary to JSON file

Parameters
  • dict (dict) – Python dictionary

  • fname (str) – Output file name

dict_list_to_json(dict_list, fname)[source]

Dump a list of Python dictionaries to JSON file

Parameters
  • dict_list (list of dict) – List of Python dictionaries

  • fname (str) – Output file name

dict_to_tb(dict, writer, epoch)[source]

Add a dictionary of statistics to a Tensorboard writer

Parameters
  • dict (dict) – Statistics of experiments, the keys are attribute names,

  • values are the attribute values (the) –

  • writer – Tensorboard writer object

  • epoch (int) – The current epoch

makedirs_rm_exist(dir)[source]

Make a directory, remove any existing data.

Parameters

dir (str) – The directory to be created.

Returns:

class dummy_context[source]

Default context manager that does nothing