Interpretable Contrastive Learning for Networks

Source Code, Datasets, and Commands Used for Evaluation

1. Source Code of cNRL and i-cNRL

Available in GitHub [github].


2. Source Code for Generating Experimental Results

[download]


3. Network Datasets

N1. Dolphin [1] [original source] [graphml format]

N2. Karate [2] [original source] [graphml format]

N3. Random [graphml format]

N4. Price [graphml format]

N5. p2p-Gnutella08 [3,4] [original source] [graphml format]

N6. Price 2 [graphml format]

N7. Enhanced Price [graphml format]

N8. Combined-AP/MS [5] [original source] [graphml format]

N9. LC-multiple [6] [original source] [graphml format]

N10. School-Day1 [7] [original source] [graphml format]

N11. School-Day2 [7] [original source] [graphml format]

* If you use these datasets for your publication, please follow the citation policy of each original source.

* We have preprocessed each original dataset to a graphml format dataset with graph-tool. You can load a network with:
   >>> from graph_tool.all import *
   >>> g = gt.load_graph("NAME_OF_FILE.xml.gz")

References

[1]

Lusseau et al., The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (2003), 396–405.

[2]

Zachary. An information flow model for conflict and fission insmall groups. Journal of Anthropological Research 33, 4 (1977), 452–473.

[3]

Leskovec et al., Graph evolution:Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2–es.

[4]

Ripeanu et al., Mapping the GnutellaNetwork. IEEE Internet Computing 6, 1 (2002), 50.

[5]

Collins et al., Toward a comprehensive atlas of the physical interactome of Saccharomycescerevisiae. Molecular & Cellular Proteomics 6, 3 (2007), 439–450.

[6]

Reguly et al., Comprehensive curation and analysis of global interaction networksin Saccharomyces cerevisiae. Journal of Biology 5, 4 (2006), 11.

[7]

Stehlé et al., High-resolution measurements of face-to-face contactpatterns in a primary school. PloS one 6, 8 (2011).


5. Input Data and Commands for GraphSAGE to Generate Feature Matrices

GT: Dolphin, GB: Karate [data]

GT: p2p-Gnutella08, GB: Price 2[data]

GT: LC-multiple, GB: Combined-AP/MS [data]

GT: School-Day2, GB: School-Day1 [data]


To generate feature matices with GraphSAGE, use commands below after setting up GraphSAGE. Change DATA_DIR/FILE_PREFIX based on your file path (e.g., ./data/dolphin-karate)

>>> python -m graphsage.unsupervised_train --train_prefix DATA_DIR/FILE_PREFIX --model graphsage_maxpool --max_total_steps 1000 --validate_iter 10 --dim_1 12 --dim_2 12 --base_log_dir .

Indices in a generated data include both GT and GB. Indices from 0 to (nT-1) correspond to nodes in GT. Indices from nT to (nT + nB-1) correspond to nodes in GB.


6. Learned Feature Matrices with DeepGL / GraphSAGE

GT: Dolphin, GB: Karate [DeepGL][GraphSAGE]

GT: Price, GB: Random [DeepGL][GraphSAGE]

GT: Random, GB: Price [DeepGL][GraphSAGE]

GT: p2p-Gnutella08, GB: Price 2 [DeepGL][GraphSAGE]

GT: p2p-Gnutella08, GB: Enhanced Price [DeepGL]

GT: LC-multiple, GB: Combined-AP/MS [DeepGL][GraphSAGE]

GT: School-Day2, GB: School-Day1 [DeepGL][GraphSAGE]


Each learned feature matrix with DeepGL contains

*_tg.npy: feature matrix of GT

*_bg.npy: feature matrix of GB

*.feat_defs.npy: learned feature definitions

Each learned feature matrix with GraphSAGE contains

*_tg_bg.npy: feature values for each node of both GT and GB

*_node_id.txt: node indices corresponding to lines of *_tg_bg.npy. Indices from 0 to (nT-1) correspond to nodes in GT. Indices from nT to (nT + nB-1) correspond to nodes in GB.

.npy file can be loaded with NumPy: numpy.load(FILE_PATH)