Dataset Cheatsheet
Note
This dataset statistics table is a work in progress. Please consider helping us filling its content by providing statistics for individual datasets. See here and here for examples on how to do so.
Homogeneous Datasets
Name |
#graphs |
#nodes |
#edges |
#features |
#classes/#tasks |
---|---|---|---|---|---|
1 |
34 |
156 |
34 |
4 |
|
└─ MUTAG |
188 |
~17.9 |
~39.6 |
7 |
2 |
└─ ENZYMES |
600 |
~32.6 |
~124.3 |
3 |
6 |
└─ PROTEINS |
1,113 |
~39.1 |
~145.6 |
3 |
2 |
└─ COLLAB |
5,000 |
~74.5 |
~4914.4 |
0 |
3 |
└─ IMDB-BINARY |
1,000 |
~19.8 |
~193.1 |
0 |
2 |
└─ REDDIT-BINARY |
2,000 |
~429.6 |
~995.5 |
0 |
2 |
└─ … |
|||||
└─ PATTERN |
14,000 |
~118.9 |
~6,098.9 |
3 |
2 |
└─ CLUSTER |
12,000 |
~117.2 |
~4,303.9 |
7 |
6 |
└─ MNIST |
70,000 |
~70.6 |
~564.5 |
3 |
10 |
└─ CIFAR10 |
60,000 |
~117.6 |
~941.2 |
5 |
10 |
└─ TSP |
12,000 |
~275.4 |
~6,885.0 |
2 |
2 |
└─ CSL |
150 |
~41.0 |
~164.0 |
0 |
10 |
└─ Cora |
1 |
2,708 |
10,556 |
1,433 |
7 |
└─ CiteSeer |
1 |
3,327 |
9,104 |
3,703 |
6 |
└─ PubMed |
1 |
19,717 |
88,648 |
500 |
3 |
1 |
65,755 |
251,550 |
61,278 |
186 |
|
└─ Cora |
1 |
19,793 |
126,842 |
8,710 |
70 |
└─ Cora_ML |
1 |
2,995 |
16,316 |
2,879 |
7 |
└─ CiteSeer |
1 |
4,230 |
10,674 |
602 |
6 |
└─ DBLP |
1 |
17,716 |
105,734 |
1,639 |
4 |
└─ PubMed |
1 |
19,717 |
88,648 |
500 |
3 |
1 |
19,793 |
126,842 |
8,710 |
70 |
|
└─ CS |
1 |
18,333 |
163,788 |
6,805 |
15 |
└─ Physics |
1 |
34,493 |
495,924 |
8,415 |
5 |
└─ Computers |
1 |
13,752 |
491,722 |
767 |
10 |
└─ Photo |
1 |
7,650 |
238,162 |
745 |
8 |
20 |
~2,245.3 |
~61,318.4 |
50 |
121 |
|
1 |
232,965 |
114,615,892 |
602 |
41 |
|
1 |
232,965 |
23,213,838 |
602 |
41 |
|
1 |
89,250 |
899,756 |
500 |
7 |
|
1 |
716,847 |
13,954,819 |
300 |
100 |
|
1 |
1,569,960 |
264,339,468 |
200 |
107 |
|
7,211 |
~15.4 |
~245.0 |
0 |
14 |
|
130,831 |
~18.0 |
~37.3 |
11 |
19 |
|
└─ Benzene |
627,983 |
12 |
0 |
1 |
2 |
└─ Uracil |
133,770 |
12 |
0 |
1 |
2 |
└─ Naphthalene |
326,250 |
10 |
0 |
1 |
2 |
└─ Aspirin |
211,762 |
21 |
0 |
1 |
2 |
└─ Salicylic acid |
320,231 |
16 |
0 |
1 |
2 |
└─ Malonaldehyde |
993,237 |
9 |
0 |
1 |
2 |
└─ Ethanol |
555,092 |
9 |
0 |
1 |
2 |
└─ Toluene |
442,790 |
15 |
0 |
1 |
2 |
└─ Paracetamol |
106,490 |
20 |
0 |
1 |
2 |
└─ Azobenzene |
99,999 |
24 |
0 |
1 |
2 |
└─ Benzene (R) |
100,000 |
12 |
0 |
1 |
2 |
└─ Uracil (R) |
100,000 |
12 |
0 |
1 |
2 |
└─ Naphthalene (R) |
100,000 |
10 |
0 |
1 |
2 |
└─ Aspirin (R) |
100,000 |
21 |
0 |
1 |
2 |
└─ Salicylic acid (R) |
100,000 |
16 |
0 |
1 |
2 |
└─ Malonaldehyde (R) |
100,000 |
9 |
0 |
1 |
2 |
└─ Ethanol (R) |
100,000 |
9 |
0 |
1 |
2 |
└─ Toluene (R) |
100,000 |
15 |
0 |
1 |
2 |
└─ Paracetamol (R) |
100,000 |
20 |
0 |
1 |
2 |
└─ Azobenzene (R) |
99,988 |
24 |
0 |
1 |
2 |
└─ Benzene CCSD-T |
1,500 |
12 |
0 |
1 |
2 |
└─ Aspirin CCSD-T |
1,500 |
21 |
0 |
1 |
2 |
└─ Malonaldehyde CCSD-T |
1,500 |
9 |
0 |
1 |
2 |
└─ Ethanol CCSD-T |
2000 |
9 |
0 |
1 |
2 |
└─ Toluene CCSD-T |
1,501 |
15 |
0 |
1 |
2 |
└─ Benzene FHI-aims |
49,863 |
12 |
0 |
1 |
2 |
└─ ZINC Full |
249,456 |
~23.2 |
~49.8 |
1 |
1 |
└─ ZINC Subset |
12,000 |
~23.2 |
~49.8 |
1 |
1 |
9,833 |
~17.6 |
~35.8 |
1 |
1 |
|
└─ ESOL |
1,128 |
~13.3 |
~27.4 |
9 |
1 |
└─ FreeSolv |
642 |
~8.7 |
~16.8 |
9 |
1 |
└─ Lipophilicity |
4,200 |
~27.0 |
~59.0 |
9 |
1 |
└─ PCBA |
437,929 |
~26.0 |
~56.2 |
9 |
128 |
└─ MUV |
93,087 |
~24.2 |
~52.6 |
9 |
17 |
└─ HIV |
41,127 |
~25.5 |
~54.9 |
9 |
1 |
└─ BACE |
1513 |
~34.1 |
~73.7 |
9 |
1 |
└─ BBBP |
2,050 |
~23.9 |
~51.6 |
9 |
1 |
└─ Tox21 |
7,831 |
~18.6 |
~38.6 |
9 |
12 |
└─ ToxCast |
8,597 |
~18.7 |
~38.4 |
9 |
617 |
└─ SIDER |
1,427 |
~33.6 |
~70.7 |
9 |
27 |
└─ ClinTox |
1,484 |
~26.1 |
~55.5 |
9 |
2 |
└─ AIFB |
1 |
8,285 |
58,086 |
0 |
4 |
└─ AM |
1 |
1,666,764 |
11,976,642 |
0 |
11 |
└─ MUTAG |
1 |
23,644 |
148,454 |
0 |
2 |
└─ BGS |
1 |
333,845 |
1,832,398 |
0 |
2 |
1 |
14,541 |
544,230 |
0 |
0 |
|
└─ AIDS700nef |
700 |
~8.9 |
~17.6 |
29 |
0 |
└─ LINUX |
1,000 |
~7.6 |
~13.9 |
0 |
0 |
└─ ALKANE |
150 |
~8.9 |
~15.8 |
0 |
0 |
└─ IMDBMulti |
1,500 |
~13.0 |
~131.9 |
0 |
0 |
└─ Wiki |
1 |
2,405 |
17,981 |
4,973 |
17 |
└─ Cora |
1 |
2,708 |
5,429 |
1,433 |
7 |
└─ CiteSeer |
1 |
3,312 |
4,715 |
3,703 |
6 |
└─ PubMed |
1 |
19,717 |
44,338 |
500 |
3 |
└─ BlogCatalog |
1 |
5,196 |
343,486 |
8,189 |
6 |
└─ PPI |
1 |
56,944 |
1,612,348 |
50 |
121 |
└─ Flickr |
1 |
7,575 |
479,476 |
12,047 |
9 |
1 |
4,039 |
88,234 |
1,283 |
193 |
|
└─ TWeibo |
1 |
2,320,895 |
9,840,066 |
1,657 |
8 |
└─ MAG |
1 |
59,249,719 |
978,147,253 |
2,000 |
100 |
70,000 |
75 |
~1,393.0 |
1 |
10 |
|
100 |
6,890 |
41,328 |
3 |
10 |
|
16,881 |
~2,616.2 |
0 |
3 |
50 |
|
└─ ModelNet10 |
4,899 |
~9,508.2 |
~37,450.5 |
3 |
10 |
└─ ModelNet40 |
12,311 |
~17,744.4 |
~66,060.9 |
3 |
40 |
20,465 |
5,023 |
29,990 |
3 |
12 |
|
80 |
~148.8 |
~859.5 |
3 |
40 |
|
138 |
6,005 |
~2,573.2 |
0 |
0 |
|
1 |
8,831 |
1,912,909 |
413 |
||
└─ Cornell |
1 |
183 |
298 |
1,703 |
5 |
└─ Texas |
1 |
183 |
325 |
1,703 |
5 |
└─ Wisconsin |
1 |
251 |
515 |
1,703 |
5 |
└─ Roman-empire |
1 |
22,662 |
32,927 |
300 |
18 |
└─ Amazon-ratings |
1 |
24,492 |
93,050 |
300 |
5 |
└─ Minesweeper |
1 |
10,000 |
39,402 |
7 |
2 |
└─ Tolokers |
1 |
11,758 |
519,000 |
10 |
2 |
└─ Questions |
1 |
48,921 |
153,540 |
301 |
2 |
1 |
7,600 |
30,019 |
932 |
5 |
|
1 |
37,700 |
578,006 |
0 |
2 |
|
└─ DE |
1 |
9,498 |
315,774 |
128 |
2 |
└─ EN |
1 |
7,126 |
77,774 |
128 |
2 |
└─ ES |
1 |
4,648 |
123,412 |
128 |
2 |
└─ FR |
1 |
6,551 |
231,883 |
128 |
2 |
└─ PT |
1 |
1,912 |
64,510 |
128 |
2 |
└─ RU |
1 |
4,385 |
78,993 |
128 |
2 |
└─ PascalVOC-SP |
11,355 |
~479.40 |
~2,710.48 |
21 |
|
└─ COCO-SP |
123,286 |
~476.88 |
~2,693.67 |
81 |
|
└─ PCQM-Contact |
529,434 |
~30.14 |
~61.09 |
1 |
|
└─ Peptides-func |
15,535 |
~150.94 |
~307.30 |
10 |
|
└─ Peptides-struct |
15,535 |
~150.94 |
~307.30 |
11 |
|
1 |
1,490 |
19,025 |
0 |
2 |
|
1 |
203,769 |
234,355 |
165 |
2 |
|
1 |
203,769 |
234,355 |
165 |
2 |
|
1 |
3,700,550 |
4,300,999 |
17 |
2 |
|
1,000 |
~180,000 |
0 |
5 |
4 |
|
1 |
6,509 |
25,470 |
172 |
1 |
|
└─ Wikipedia |
1 |
9,227 |
157,474 |
172 |
2 |
└─ MOOC |
1 |
7,144 |
411,749 |
4 |
2 |
└─ LastFM |
1 |
1,980 |
1,293,103 |
2 |
1 |
1 |
17,988 |
694,121 |
33 |
1 |
|
1,082 |
9,288 |
271,771 |
1,082 |
||
Heterogeneous Datasets
Name |
#nodes/#edges |
#features |
#classes/#tasks |
---|---|---|---|
└─ Node Type: Author |
4,057 |
334 |
4 |
└─ Node Type: Paper |
14,328 |
4,231 |
|
└─ Node Type: Term |
7,723 |
50 |
|
└─ Node Type: Conference |
20 |
0 |
|
└─ Edge Type: Author-Paper |
196,425 |
||
└─ Edge Type: Paper-Term |
85,810 |
||
└─ Edge Type: Conference-Paper |
14,328 |
||
└─ Node Type: Movie |
1,682 |
18 |
|
└─ Node Type: User |
943 |
24 |
|
└─ Edge Type: User-Movie |
80,000 |
1 |
1 |
└─ Node Type: Movie |
3,883 |
18 |
|
└─ Node Type: User |
6,040 |
30 |
|
└─ Edge Type: User-Movie |
1,000,209 |
1 |
1 |
Synthetic Datasets
Name |
#graphs |
#nodes |
#edges |
#features |
#classes/#tasks |
---|---|---|---|---|---|
1000 |
25 |
~51.0 |
10 |
2 |
|
1000 |
40 |
~87.0 |
10 |
2 |
|