- class ZINC(root: str, subset: bool = False, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)
The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized
logP(also called constrained solubility in some works), given by
y = logP - SAS - cycles, where
logPis the water-octanol partition coefficient,
SASis the synthetic accessibility score, and
cyclesdenotes the number of cycles with more than six atoms. Penalized
logPis a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.
root (str) – Root directory where the dataset should be saved.
"train", loads the training dataset. If
"val", loads the validation dataset. If
"test", loads the test dataset. (default:
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:
torch_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: