torch_geometric.datasets.HydroNet

class HydroNet(root: str, name: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, force_reload: bool = False, num_workers: int = 8, clusters: Optional[Union[int, List[int]]] = None, use_processed: bool = True)[source]

Bases: InMemoryDataset

The HydroNet dataest from the “HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data” paper, consisting of 5 million water clusters held together by hydrogen bonding networks. This dataset provides atomic coordinates and total energy in kcal/mol for the cluster.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • name (str, optional) – Name of the subset of the full dataset to use: "small" uses 500k graphs sampled from the "medium" dataset, "medium" uses 2.7m graphs with maximum size of 75 nodes. Mutually exclusive option with the clusters argument. (default None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

  • num_workers (int) – Number of multiprocessing workers to use for pre-processing the dataset. (default 8)

  • clusters (int or List[int], optional) – Select a subset of clusters from the full dataset. If set to None, will select all. (default None)

  • use_processed (bool) – Option to use a pre-processed version of the original xyz dataset. (default: True)