Dataset Cheatsheet

Note

This dataset statistics table is a work in progress. Please consider helping us filling its content by providing statistics for individual datasets. See here and here for examples on how to do so.

Homogeneous Datasets

Name

#graphs

#nodes

#edges

#features

#classes/#tasks

KarateClub (Paper)

1

34

156

34

4

TUDataset (Paper)

└─ MUTAG

188

~17.9

~39.6

7

2

└─ ENZYMES

600

~32.6

~124.3

3

6

└─ PROTEINS

1,113

~39.1

~145.6

3

2

└─ COLLAB

5,000

~74.5

~4914.4

0

3

└─ IMDB-BINARY

1,000

~19.8

~193.1

0

2

└─ REDDIT-BINARY

2,000

~429.6

~995.5

0

2

└─ …

GNNBenchmarkDataset (Paper)

└─ PATTERN

14,000

~118.9

~6,098.9

3

2

└─ CLUSTER

12,000

~117.2

~4,303.9

7

6

└─ MNIST

70,000

~70.6

~564.5

3

10

└─ CIFAR10

60,000

~117.6

~941.2

5

10

└─ TSP

12,000

~275.4

~6,885.0

2

2

└─ CSL

150

~41.0

~164.0

0

10

Planetoid (Paper)

└─ Cora

1

2,708

10,556

1,433

7

└─ CiteSeer

1

3,327

9,104

3,703

6

└─ PubMed

1

19,717

88,648

500

3

NELL (Paper)

1

65,755

251,550

61,278

186

CitationFull (Paper)

└─ Cora

1

19,793

126,842

8,710

70

└─ Cora_ML

1

2,995

16,316

2,879

7

└─ CiteSeer

1

4,230

10,674

602

6

└─ DBLP

1

17,716

105,734

1,639

4

└─ PubMed

1

19,717

88,648

500

3

CoraFull

1

19,793

126,842

8,710

70

Coauthor (Paper)

└─ CS

1

18,333

163,788

6,805

15

└─ Physics

1

34,493

495,924

8,415

5

Amazon (Paper)

└─ Computers

1

13,752

491,722

767

10

└─ Photo

1

7,650

238,162

745

8

PPI (Paper)

20

~2,245.3

~61,318.4

50

121

Reddit (Paper)

1

232,965

114,615,892

602

41

Reddit2 (Paper)

1

232,965

23,213,838

602

41

Flickr (Paper)

1

89,250

899,756

500

7

Yelp (Paper)

1

716,847

13,954,819

300

100

AmazonProducts (Paper)

1

1,569,960

264,339,468

200

107

QM7b (Paper)

7,211

~15.4

~245.0

0

14

QM9 (Paper)

130,831

~18.0

~37.3

11

19

MD17 (Paper)

└─ Benzene

627,983

12

0

1

2

└─ Uracil

133,770

12

0

1

2

└─ Naphthalene

326,250

10

0

1

2

└─ Aspirin

211,762

21

0

1

2

└─ Salicylic acid

320,231

16

0

1

2

└─ Malonaldehyde

993,237

9

0

1

2

└─ Ethanol

555,092

9

0

1

2

└─ Toluene

442,790

15

0

1

2

└─ Paracetamol

106,490

20

0

1

2

└─ Azobenzene

99,999

24

0

1

2

└─ Benzene (R)

100,000

12

0

1

2

└─ Uracil (R)

100,000

12

0

1

2

└─ Naphthalene (R)

100,000

10

0

1

2

└─ Aspirin (R)

100,000

21

0

1

2

└─ Salicylic acid (R)

100,000

16

0

1

2

└─ Malonaldehyde (R)

100,000

9

0

1

2

└─ Ethanol (R)

100,000

9

0

1

2

└─ Toluene (R)

100,000

15

0

1

2

└─ Paracetamol (R)

100,000

20

0

1

2

└─ Azobenzene (R)

99,988

24

0

1

2

└─ Benzene CCSD-T

1,500

12

0

1

2

└─ Aspirin CCSD-T

1,500

21

0

1

2

└─ Malonaldehyde CCSD-T

1,500

9

0

1

2

└─ Ethanol CCSD-T

2000

9

0

1

2

└─ Toluene CCSD-T

1,501

15

0

1

2

└─ Benzene FHI-aims

49,863

12

0

1

2

ZINC (Paper)

└─ ZINC Full

249,456

~23.2

~49.8

1

1

└─ ZINC Subset

12,000

~23.2

~49.8

1

1

AQSOL (Paper)

9,833

~17.6

~35.8

1

1

MoleculeNet (Paper)

└─ ESOL

1,128

~13.3

~27.4

9

1

└─ FreeSolv

642

~8.7

~16.8

9

1

└─ Lipophilicity

4,200

~27.0

~59.0

9

1

└─ PCBA

437,929

~26.0

~56.2

9

128

└─ MUV

93,087

~24.2

~52.6

9

17

└─ HIV

41,127

~25.5

~54.9

9

1

└─ BACE

1513

~34.1

~73.7

9

1

└─ BBBP

2,050

~23.9

~51.6

9

1

└─ Tox21

7,831

~18.6

~38.6

9

12

└─ ToxCast

8,597

~18.7

~38.4

9

617

└─ SIDER

1,427

~33.6

~70.7

9

27

└─ ClinTox

1,484

~26.1

~55.5

9

2

PCQM4Mv2 (Paper)

Entities (Paper)

└─ AIFB

1

8,285

58,086

0

4

└─ AM

1

1,666,764

11,976,642

0

11

└─ MUTAG

1

23,644

148,454

0

2

└─ BGS

1

333,845

1,832,398

0

2

RelLinkPredDataset (Paper)

1

14,541

544,230

0

0

GEDDataset (Paper)

└─ AIDS700nef

700

~8.9

~17.6

29

0

└─ LINUX

1,000

~7.6

~13.9

0

0

└─ ALKANE

150

~8.9

~15.8

0

0

└─ IMDBMulti

1,500

~13.0

~131.9

0

0

AttributedGraphDataset (Paper)

└─ Wiki

1

2,405

17,981

4,973

17

└─ Cora

1

2,708

5,429

1,433

7

└─ CiteSeer

1

3,312

4,715

3,703

6

└─ PubMed

1

19,717

44,338

500

3

└─ BlogCatalog

1

5,196

343,486

8,189

6

└─ PPI

1

56,944

1,612,348

50

121

└─ Flickr

1

7,575

479,476

12,047

9

└─ Facebook

1

4,039

88,234

1,283

193

└─ TWeibo

1

2,320,895

9,840,066

1,657

8

└─ MAG

1

59,249,719

978,147,253

2,000

100

MNISTSuperpixels (Paper)

70,000

75

~1,393.0

1

10

FAUST (Paper)

100

6,890

41,328

3

10

DynamicFAUST (Paper)

ShapeNet (Paper)

16,881

~2,616.2

0

3

50

ModelNet (Paper)

└─ ModelNet10

4,899

~9,508.2

~37,450.5

3

10

└─ ModelNet40

12,311

~17,744.4

~66,060.9

3

40

CoMA (Paper)

20,465

5,023

29,990

3

12

SHREC2016 (Paper)

TOSCA (Paper)

PCPNetDataset (Paper)

S3DIS (Paper)

GeometricShapes

80

~148.8

~859.5

3

40

BitcoinOTC (Paper)

138

6,005

~2,573.2

0

0

GDELTLite (Paper)

1

8,831

1,912,909

413

ICEWS18 (Paper)

GDELT (Paper)

WILLOWObjectClass (Paper)

PascalVOCKeypoints (Paper)

PascalPF (Paper)

SNAPDataset (Paper)

SuiteSparseMatrixCollection (Paper)

WordNet18 (Paper)

WordNet18RR (Paper)

FB15k_237 (Paper)

WikiCS (Paper)

WebKB (Paper)

└─ Cornell

1

183

298

1,703

5

└─ Texas

1

183

325

1,703

5

└─ Wisconsin

1

251

515

1,703

5

WikipediaNetwork (Paper)

HeterophilousGraphDataset (Paper)

└─ Roman-empire

1

22,662

32,927

300

18

└─ Amazon-ratings

1

24,492

93,050

300

5

└─ Minesweeper

1

10,000

39,402

7

2

└─ Tolokers

1

11,758

519,000

10

2

└─ Questions

1

48,921

153,540

301

2

Actor (Paper)

1

7,600

30,019

932

5

UPFD (Paper)

GitHub (Paper)

1

37,700

578,006

0

2

FacebookPagePage (Paper)

LastFMAsia (Paper)

DeezerEurope (Paper)

GemsecDeezer (Paper)

Twitch (Paper)

└─ DE

1

9,498

315,774

128

2

└─ EN

1

7,126

77,774

128

2

└─ ES

1

4,648

123,412

128

2

└─ FR

1

6,551

231,883

128

2

└─ PT

1

1,912

64,510

128

2

└─ RU

1

4,385

78,993

128

2

Airports (Paper)

LRGBDataset (Paper)

└─ PascalVOC-SP

11,355

~479.40

~2,710.48

21

└─ COCO-SP

123,286

~476.88

~2,693.67

81

└─ PCQM-Contact

529,434

~30.14

~61.09

1

└─ Peptides-func

15,535

~150.94

~307.30

10

└─ Peptides-struct

15,535

~150.94

~307.30

11

MalNetTiny (Paper)

OMDB (Paper)

PolBlogs (Paper)

1

1,490

19,025

0

2

EmailEUCore (Paper)

LINKXDataset (Paper)

EllipticBitcoinDataset (Paper)

1

203,769

234,355

165

2

EllipticBitcoinTemporalDataset (Paper)

1

203,769

234,355

165

2

DGraphFin (Paper)

1

3,700,550

4,300,999

17

2

HydroNet (Paper)

AirfRANS (Paper)

1,000

~180,000

0

5

4

JODIEDataset (Paper)

└─ Reddit

1

6,509

25,470

172

1

└─ Wikipedia

1

9,227

157,474

172

2

└─ MOOC

1

7,144

411,749

4

2

└─ LastFM

1

1,980

1,293,103

2

1

Wikidata5M (Paper)

MyketDataset (Paper)

1

17,988

694,121

33

1

BrcaTcga (Paper)

1,082

9,288

271,771

1,082

NeuroGraphDataset (Paper)

Heterogeneous Datasets

Name

#nodes/#edges

#features

#classes/#tasks

DBP15K (Paper)

AMiner (Paper)

OGB_MAG (Paper)

DBLP (Paper)

└─ Node Type: Author

4,057

334

4

└─ Node Type: Paper

14,328

4,231

└─ Node Type: Term

7,723

50

└─ Node Type: Conference

20

0

└─ Edge Type: Author-Paper

196,425

└─ Edge Type: Paper-Term

85,810

└─ Edge Type: Conference-Paper

14,328

MovieLens (Paper)

MovieLens100K (Paper)

└─ Node Type: Movie

1,682

18

└─ Node Type: User

943

24

└─ Edge Type: User-Movie

80,000

1

1

MovieLens1M (Paper)

└─ Node Type: Movie

3,883

18

└─ Node Type: User

6,040

30

└─ Edge Type: User-Movie

1,000,209

1

1

IMDB (Paper)

LastFM (Paper)

HGBDataset (Paper)

Taobao (Paper)

IGMCDataset (Paper)

AmazonBook (Paper)

HM (Paper)

OSE_GVCS (Paper)

RCDD (Paper)

OPFDataset (Paper)

Synthetic Datasets

Name

#graphs

#nodes

#edges

#features

#classes/#tasks

FakeDataset

FakeHeteroDataset

StochasticBlockModelDataset

RandomPartitionGraphDataset (Paper)

MixHopSyntheticDataset (Paper)

ExplainerDataset (Paper)

InfectionDataset (Paper)

BA2MotifDataset (Paper)

1000

25

~51.0

10

2

BAMultiShapesDataset (Paper)

1000

40

~87.0

10

2

BAShapes (Paper)