torch_geometric.datasets.TUDataset

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, force_reload: bool = False, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]

Bases: InMemoryDataset

A variety of graph kernel benchmark datasets, .e.g., "IMDB-BINARY", "REDDIT-BINARY" or "PROTEINS", collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as torch_geometric.transforms.Constant or torch_geometric.transforms.OneHotDegree.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • name (str) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

  • use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

  • cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)

STATS:

Name

#graphs

#nodes

#edges

#features

#classes

MUTAG

188

~17.9

~39.6

7

2

ENZYMES

600

~32.6

~124.3

3

6

PROTEINS

1,113

~39.1

~145.6

3

2

COLLAB

5,000

~74.5

~4914.4

0

3

IMDB-BINARY

1,000

~19.8

~193.1

0

2

REDDIT-BINARY

2,000

~429.6

~995.5

0

2