Dataset Cheatsheet

Note

This dataset statistics table is a work in progress. Please consider helping us filling its content by providing statistics for individual datasets. See here and here for examples on how to do so.

Name

#graphs

#nodes

#edges

#features

#classes/#tasks

KarateClub (Paper)

1

34

156

34

4

TUDataset (Paper)

└─ MUTAG

188

~17.9

~39.6

7

2

└─ ENZYMES

600

~32.6

~124.3

3

6

└─ PROTEINS

1,113

~39.1

~145.6

3

2

└─ COLLAB

5,000

~74.5

~4914.4

0

3

└─ IMDB-BINARY

1,000

~19.8

~193.1

0

2

└─ REDDIT-BINARY

2,000

~429.6

~995.5

0

2

└─ …

GNNBenchmarkDataset (Paper)

└─ PATTERN

10,000

~118.9

~6,098.9

3

2

└─ CLUSTER

10,000

~117.2

~4,303.9

7

6

└─ MNIST

55,000

~70.6

~564.5

3

10

└─ CIFAR10

45,000

~117.6

~941.2

5

10

└─ TSP

10,000

~275.4

~6,885.0

2

2

└─ CSL

150

~41.0

~164.0

0

10

Planetoid (Paper)

└─ Cora

1

2,708

10,556

1,433

7

└─ CiteSeer

1

3,327

9,104

3,703

6

└─ PubMed

1

19,717

88,648

500

3

FakeDataset

FakeHeteroDataset

NELL (Paper)

1

65,755

251,550

61,278

186

CitationFull (Paper)

└─ Cora

1

19,793

126,842

8,710

70

└─ Cora_ML

1

2,995

16,316

2,879

7

└─ CiteSeer

1

4,230

10,674

602

6

└─ DBLP

1

17,716

105,734

1,639

4

└─ PubMed

1

19,717

88,648

500

3

CoraFull

1

19,793

126,842

8,710

70

Coauthor (Paper)

└─ CS

1

18,333

163,788

6,805

15

└─ Physics

1

34,493

495,924

8,415

5

Amazon (Paper)

└─ Computers

1

13,752

491,722

767

10

└─ Photo

1

7,650

238,162

745

8

PPI (Paper)

20

~2,245.3

~61,318.4

50

121

Reddit (Paper)

1

232,965

114,615,892

602

41

Reddit2 (Paper)

1

232,965

23,213,838

602

41

Flickr (Paper)

1

89,250

899,756

500

7

Yelp (Paper)

1

716,847

13,954,819

300

100

AmazonProducts (Paper)

1

1,569,960

264,339,468

200

107

QM7b (Paper)

7,211

~15.4

~245.0

0

14

QM9 (Paper)

130,831

~18.0

~37.3

11

19

MD17 (Paper)

└─ Benzene FHI-aims

49,863

12

0

0

0

└─ Benzene

627,983

12

0

0

0

└─ Benzene CCSD-T

1,500

12

0

0

0

└─ Uracil

133,770

12

0

0

0

└─ Naphthalene

326,250

10

0

0

0

└─ Aspirin

211,762

21

0

0

0

└─ Aspirin CCSD-T

1,500

21

0

0

0

└─ Salicylic acid

320,231

16

0

0

0

└─ Malonaldehyde

993,237

9

0

0

0

└─ Malonaldehyde CCSD-T

1,500

9

0

0

0

└─ Ethanol

555,092

9

0

0

0

└─ Ethanol CCSD-T

2000

9

0

0

0

└─ Toluene

442,790

15

0

0

0

└─ Toluene CCSD-T

1,501

15

0

0

0

└─ Paracetamol

106,490

20

0

0

0

└─ Azobenzene

99,999

24

0

0

0

ZINC (Paper)

└─ ZINC Full

249,456

~23.2

~49.8

1

1

└─ ZINC Subset

12,000

~23.2

~49.8

1

1

AQSOL (Paper)

9,833

~17.6

~35.8

1

1

MoleculeNet (Paper)

└─ ESOL

1,128

~13.3

~27.4

9

1

└─ FreeSolv

642

~8.7

~16.8

9

1

└─ Lipophilicity

4,200

~27.0

~59.0

9

1

└─ PCBA

437,929

~26.0

~56.2

9

128

└─ MUV

93,087

~24.2

~52.6

9

17

└─ HIV

41,127

~25.5

~54.9

9

1

└─ BACE

1513

~34.1

~73.7

9

1

└─ BBPB

2,050

~23.9

~51.6

9

1

└─ Tox21

7,831

~18.6

~38.6

9

12

└─ ToxCast

8,597

~18.7

~38.4

9

617

└─ SIDER

1,427

~33.6

~70.7

9

27

└─ ClinTox

1,484

~26.1

~55.5

9

2

Entities (Paper)

└─ AIFB

1

8,285

58,086

0

4

└─ AM

1

1,666,764

11,976,642

0

11

└─ MUTAG

1

23,644

148,454

0

2

└─ BGS

1

333,845

1,832,398

0

2

RelLinkPredDataset (Paper)

1

14,541

544,230

0

0

GEDDataset (Paper)

└─ AIDS700nef

700

~8.9

~17.6

29

0

└─ LINUX

1,000

~7.6

~13.9

0

0

└─ ALKANE

150

~8.9

~15.8

0

0

└─ IMDBMulti

1,500

~13.0

~131.9

0

0

AttributedGraphDataset (Paper)

MNISTSuperpixels (Paper)

70,000

75

~1,393.0

1

10

FAUST (Paper)

100

6,890

41,328

3

10

DynamicFAUST (Paper)

ShapeNet (Paper)

16,881

~2,616.2

0

3

50

ModelNet (Paper)

└─ ModelNet10

4,899

~9,508.2

~37,450.5

3

10

└─ ModelNet40

12,311

~17,744.4

~66,060.9

3

40

CoMA (Paper)

20,465

5,023

29,990

3

12

SHREC2016 (Paper)

TOSCA (Paper)

PCPNetDataset (Paper)

S3DIS (Paper)

GeometricShapes

BitcoinOTC (Paper)

ICEWS18 (Paper)

GDELT (Paper)

DBP15K (Paper)

WILLOWObjectClass (Paper)

PascalVOCKeypoints (Paper)

PascalPF (Paper)

SNAPDataset (Paper)

SuiteSparseMatrixCollection (Paper)

AMiner (Paper)

WordNet18 (Paper)

WordNet18RR (Paper)

FB15k_237 (Paper)

WikiCS (Paper)

WebKB (Paper)

WikipediaNetwork (Paper)

Actor (Paper)

OGB_MAG (Paper)

DBLP (Paper)

MovieLens (Paper)

IMDB (Paper)

LastFM (Paper)

HGBDataset (Paper)

JODIEDataset (Paper)

MixHopSyntheticDataset (Paper)

UPFD (Paper)

GitHub (Paper)

FacebookPagePage (Paper)

LastFMAsia (Paper)

DeezerEurope (Paper)

GemsecDeezer (Paper)

Twitch (Paper)

Airports (Paper)

BAShapes (Paper)

LRGBDataset (Paper)

└─ PascalVOC-SP

11,355

~479.40

~2,710.48

21

└─ COCO-SP

123,286

~476.88

~2,693.67

81

└─ PCQM-Contact

529,434

~30.14

~61.09

1

└─ Peptides-func

15,535

~150.94

~307.30

10

└─ Peptides-struct

15,535

~150.94

~307.30

11

MalNetTiny (Paper)

OMDB (Paper)

PolBlogs (Paper)

1

1,490

19,025

0

2

EmailEUCore (Paper)

StochasticBlockModelDataset

RandomPartitionGraphDataset (Paper)

LINKXDataset (Paper)

EllipticBitcoinDataset (Paper)

1

203,769

234,355

165

2

DGraphFin (Paper)

1

3,700,550

4,300,999

17

2

HydroNet (Paper)

ExplainerDataset (Paper)

InfectionDataset (Paper)

BA2MotifDataset (Paper)

1000

25

~51.0

10

2

BAMultiShapesDataset (Paper)

1000

40

~87.0

10

2

AirfRANS (Paper)

1,000

~180,000

0

5

4

Taobao (Paper)