torch_geometric.profile

profileit

A decorator to facilitate profiling a function, e.g., obtaining training runtime and memory statistics of a specific model on a specific dataset.

timeit

A context decorator to facilitate timing a function, e.g., obtaining the runtime of a specific model on a specific dataset.

get_stats_summary

Creates a summary of collected runtime and memory statistics.

trace_handler

print_time_total

rename_profile_file

torch_profile

xpu_profile

count_parameters

Given a torch.nn.Module, count its trainable parameters.

get_model_size

Given a torch.nn.Module, get its actual disk size in bytes.

get_data_size

Given a torch_geometric.data.Data object, get its theoretical memory usage in bytes.

get_cpu_memory_from_gc

Returns the used CPU memory in bytes, as reported by the garbage collector.

get_gpu_memory_from_gc

Returns the used GPU memory in bytes, as reported by the garbage collector.

get_gpu_memory_from_nvidia_smi

Returns the free and used GPU memory in megabytes, as reported by nivdia-smi.

get_gpu_memory_from_ipex

Returns the XPU memory statistics.

benchmark

Benchmark a list of functions funcs that receive the same set of arguments args.

GNN profiling package.

profileit(device: str)[source]

A decorator to facilitate profiling a function, e.g., obtaining training runtime and memory statistics of a specific model on a specific dataset. Returns a GPUStats if device is xpu or extended object CUDAStats, if device is cuda.

Parameters:

device (str) – Target device for profiling. Options are: cuda and obj:xpu.

@profileit("cuda")
def train(model, optimizer, x, edge_index, y):
    optimizer.zero_grad()
    out = model(x, edge_index)
    loss = criterion(out, y)
    loss.backward()
    optimizer.step()
    return float(loss)

loss, stats = train(model, x, edge_index, y)
class timeit(log: bool = True, avg_time_divisor: int = 0)[source]

A context decorator to facilitate timing a function, e.g., obtaining the runtime of a specific model on a specific dataset.

@torch.no_grad()
def test(model, x, edge_index):
    return model(x, edge_index)

with timeit() as t:
    z = test(model, x, edge_index)
time = t.duration
Parameters:
  • log (bool, optional) – If set to False, will not log any runtime to the console. (default: True)

  • avg_time_divisor (int, optional) – If set to a value greater than 1, will divide the total time by this value. Useful for calculating the average of runtimes within a for-loop. (default: 0)

reset()[source]

Prints the duration and resets current timer.

get_stats_summary(stats_list: Union[List[GPUStats], List[CUDAStats]]) Union[GPUStatsSummary, CUDAStatsSummary][source]

Creates a summary of collected runtime and memory statistics. Returns a GPUStatsSummary if list of GPUStats was passed, otherwise (list of CUDAStats was passed), returns a CUDAStatsSummary.

Parameters:

stats_list (Union[List[GPUStats], List[CUDAStats]]) – A list of GPUStats or CUDAStats objects, as returned by profileit().

trace_handler(p)[source]
print_time_total(p)[source]
rename_profile_file(*args)[source]
torch_profile(export_chrome_trace=True, csv_data=None, write_csv=None)[source]
xpu_profile(export_chrome_trace=True)[source]
count_parameters(model: Module) int[source]

Given a torch.nn.Module, count its trainable parameters.

Parameters:

model (torch.nn.Model) – The model.

get_model_size(model: Module) int[source]

Given a torch.nn.Module, get its actual disk size in bytes.

Parameters:

model (torch model) – The model.

get_data_size(data: BaseData) int[source]

Given a torch_geometric.data.Data object, get its theoretical memory usage in bytes.

Parameters:

data (torch_geometric.data.Data or torch_geometric.data.HeteroData) – The Data or HeteroData graph object.

get_cpu_memory_from_gc() int[source]

Returns the used CPU memory in bytes, as reported by the garbage collector.

get_gpu_memory_from_gc(device: int = 0) int[source]

Returns the used GPU memory in bytes, as reported by the garbage collector.

Parameters:

device (int, optional) – The GPU device identifier. (default: 1)

get_gpu_memory_from_nvidia_smi(device: int = 0, digits: int = 2) Tuple[float, float][source]

Returns the free and used GPU memory in megabytes, as reported by nivdia-smi.

Note

nvidia-smi will generally overestimate the amount of memory used by the actual program, see here.

Parameters:
  • device (int, optional) – The GPU device identifier. (default: 1)

  • digits (int) – The number of decimals to use for megabytes. (default: 2)

get_gpu_memory_from_ipex(device: int = 0, digits=2) Tuple[float, float, float][source]

Returns the XPU memory statistics.

Parameters:
  • device (int, optional) – The GPU device identifier. (default: 0)

  • digits (int) – The number of decimals to use for megabytes. (default: 2)

benchmark(funcs: List[Callable], args: Union[Tuple[Any], List[Tuple[Any]]], num_steps: int, func_names: Optional[List[str]] = None, num_warmups: int = 10, backward: bool = False, per_step: bool = False, progress_bar: bool = False)[source]

Benchmark a list of functions funcs that receive the same set of arguments args.

Parameters:
  • funcs ([Callable]) – The list of functions to benchmark.

  • args ((Any, ) or [(Any, )]) – The arguments to pass to the functions. Can be a list of arguments for each function in funcs in case their headers differ. Alternatively, you can pass in functions that generate arguments on-the-fly (e.g., useful for benchmarking models on various sizes).

  • num_steps (int) – The number of steps to run the benchmark.

  • func_names ([str], optional) – The names of the functions. If not given, will try to infer the name from the function itself. (default: None)

  • num_warmups (int, optional) – The number of warmup steps. (default: 10)

  • backward (bool, optional) – If set to True, will benchmark both forward and backward passes. (default: False)

  • per_step (bool, optional) – If set to True, will report runtimes per step. (default: False)

  • progress_bar (bool, optional) – If set to True, will print a progress bar during benchmarking. (default: False)