torch_geometric.datasets.PCQM4Mv2

class PCQM4Mv2(root: str, split: str = 'train', transform: Optional[Callable] = None, backend: str = 'sqlite')[source]

Bases: OnDiskDataset

The PCQM4Mv2 dataset from the “OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs” paper. PCQM4Mv2 is a quantum chemistry dataset originally curated under the PubChemQC project. The task is to predict the DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs.

Note

This dataset uses the OnDiskDataset base class to load data dynamically from disk.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • split (str, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. If "holdout", loads the holdout dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • backend (str) – The Database backend to use. (default: "sqlite")