torch_geometric.datasets.MD17
- class MD17(root: str, name: str, train: Optional[bool] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None)[source]
Bases:
InMemoryDataset
A variety of ab-initio molecular dynamics trajectories from the authors of sGDML. This class provides access to the original MD17 datasets as well as all other datasets released by sGDML since then (15 in total).
For every trajectory, the dataset contains the Cartesian positions of atoms (in Angstrom), their atomic numbers, as well as the total energy (in kcal/mol) and forces (kcal/mol/Angstrom) on each atom. The latter two are the regression targets for this collection.
Note
Data objects contain no edge indices as these are most commonly constructed via the
torch_geometric.transforms.RadiusGraph
transform, with its cut-off being a hyperparameter.Some of the trajectories were computed at different levels of theory, and for most molecules there exists two versions: a long trajectory on DFT level of theory and a short trajectory on coupled cluster level of theory. Check the table below for detailed information on the molecule, level of theory and number of data points contained in each dataset. Which trajectory is loaded is determined by the
name
argument. For the coupled cluster trajectories, the dataset comes with pre-defined training and testing splits which are loaded separately via thetrain
argument.When using these datasets, make sure to cite the appropriate publications listed on the sGDML website.
Molecule
Level of Theory
Name
#Examples
Benzene
DFT
benzene
49,863
Benzene
DFT FHI-aims
benzene FHI-aims
627,983
Benzene
CCSD(T)
benzene CCSD(T)
1,500
Uracil
DFT
uracil
133,770
Naphthalene
DFT
napthalene
326,250
Aspirin
DFT
aspirin
211,762
Aspirin
CCSD
aspirin CCSD
1,500
Salicylic acid
DFT
salicylic acid
320,231
Malonaldehyde
DFT
malonaldehyde
993,237
Malonaldehyde
CCSD(T)
malonaldehyde CCSD(T)
1,500
Ethanol
DFT
ethanol
555,092
Ethanol
CCSD(T)
ethanol CCSD(T)
2,000
Toluene
DFT
toluene
442,790
Toluene
CCSD(T)
toluene CCSD(T)
1,501
Paracetamol
DFT
paracetamol
106,490
Azobenzene
DFT
azobenzene
99,999
- Parameters
root (str) – Root directory where the dataset should be saved.
name (str) – Keyword of the trajectory that should be loaded.
train (bool, optional) – Determines whether the train or test split gets loaded for the coupled cluster trajectories. (default:
None
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Data
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Data
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)
STATS:
Name
#graphs
#nodes
#edges
#features
#classes
Benzene FHI-aims
49,863
12
0
0
0
Benzene
627,983
12
0
0
0
Benzene CCSD-T
1,500
12
0
0
0
Uracil
133,770
12
0
0
0
Naphthalene
326,250
10
0
0
0
Aspirin
211,762
21
0
0
0
Aspirin CCSD-T
1,500
21
0
0
0
Salicylic acid
320,231
16
0
0
0
Malonaldehyde
993,237
9
0
0
0
Malonaldehyde CCSD-T
1,500
9
0
0
0
Ethanol
555,092
9
0
0
0
Ethanol CCSD-T
2000
9
0
0
0
Toluene
442,790
15
0
0
0
Toluene CCSD-T
1,501
15
0
0
0
Paracetamol
106,490
20
0
0
0
Azobenzene
99,999
24
0
0
0