Available in GitHub [github].
N1. Dolphin [1] [original source] [graphml format]
N2. Karate [2] [original source] [graphml format]
N3. Random [graphml format]
N4. Price [graphml format]
N5. p2p-Gnutella08 [3,4] [original source] [graphml format]
N6. Price 2 [graphml format]
N7. Enhanced Price [graphml format]
N8. Combined-AP/MS [5] [original source] [graphml format]
N9. LC-multiple [6] [original source] [graphml format]
N10. School-Day1 [7] [original source] [graphml format]
N11. School-Day2 [7] [original source] [graphml format]
* If you use these datasets for your publication, please follow the citation policy of each original source.
* We have preprocessed each original dataset to a graphml format dataset with graph-tool. You can load a network with:
>>> from graph_tool.all import *
>>> g = gt.load_graph("NAME_OF_FILE.xml.gz")
References
Lusseau et al., The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.
Zachary. An information flow model for conflict and fission insmall groups. Journal of Anthropological Research 33, 4 (1977), 452–473.
Leskovec et al., Graph evolution:Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2–es.
Ripeanu et al., Mapping the GnutellaNetwork. IEEE Internet Computing 6, 1 (2002), 50.
Collins et al., Toward a comprehensive atlas of the physical interactome of Saccharomycescerevisiae. Molecular & Cellular Proteomics 6, 3 (2007), 439–450.
Reguly et al., Comprehensive curation and analysis of global interaction networksin Saccharomyces cerevisiae. Journal of Biology 5, 4 (2006), 11.
Stehlé et al., High-resolution measurements of face-to-face contactpatterns in a primary school. PloS one 6, 8 (2011).
GT: Dolphin, GB: Karate [data]
GT: p2p-Gnutella08, GB: Price 2[data]
GT: LC-multiple, GB: Combined-AP/MS [data]
GT: School-Day2, GB: School-Day1 [data]
To generate feature matices with GraphSAGE, use commands below after setting up GraphSAGE. Change DATA_DIR/FILE_PREFIX based on your file path (e.g., ./data/dolphin-karate)
>>> python -m graphsage.unsupervised_train --train_prefix DATA_DIR/FILE_PREFIX --model graphsage_maxpool --max_total_steps 1000 --validate_iter 10 --dim_1 12 --dim_2 12 --base_log_dir .
Indices in a generated data include both GT and GB. Indices from 0 to (nT-1) correspond to nodes in GT. Indices from nT to (nT + nB-1) correspond to nodes in GB.
GT: Dolphin, GB: Karate [DeepGL][GraphSAGE]
GT: Price, GB: Random [DeepGL][GraphSAGE]
GT: Random, GB: Price [DeepGL][GraphSAGE]
GT: p2p-Gnutella08, GB: Price 2 [DeepGL][GraphSAGE]
GT: p2p-Gnutella08, GB: Enhanced Price [DeepGL]
GT: LC-multiple, GB: Combined-AP/MS [DeepGL][GraphSAGE]
GT: School-Day2, GB: School-Day1 [DeepGL][GraphSAGE]
Each learned feature matrix with DeepGL contains
*_tg.npy: feature matrix of GT
*_bg.npy: feature matrix of GB
*.feat_defs.npy: learned feature definitions
Each learned feature matrix with GraphSAGE contains
*_tg_bg.npy: feature values for each node of both GT and GB
*_node_id.txt: node indices corresponding to lines of *_tg_bg.npy. Indices from 0 to (nT-1) correspond to nodes in GT. Indices from nT to (nT + nB-1) correspond to nodes in GB.
.npy file can be loaded with NumPy: numpy.load(FILE_PATH)