An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.
GOSH is a GPU-based graph embedding tool that takes a graph and produces d-dimensional vector for every node in the graph. The embeddings can then be used for a multitude of machine learning tasks including node classification, link prediction, graph visualization, and anomaly detection.
GOSH employs a novel coarsening algorithm (MultiEdgeCollapse) to compress the graph into smaller graphs and embeds the smaller graphs to produce very accurate embeddings very quickly. Besides, it uses a special scheduling algorithm to embed any graph using a single GPU - even if the memory requirement of the graph exceeds that of the GPU.
We compiled this program with nvcc
using CUDA 10.1
and ran it on Ubuntu 4.4.0-159
.
You can build the executables of the project with a single command. Simply clone the repo, navigate to it and call the make
command:
git clone https://github.com/SabanciParallelComputing/GOSH
cd GOSH
make clean && make
This will produce two executables:
execs/gosh.out # this executable takes inputs as flags and options.
execs/gosh_nargparse.out # this executable takes all the parameters as a sequence of strings and none of the parameters are optional
The most basic execution of GOSH can be done as follows:
execs/gosh.out --input-graph $string --output-embedding $string --directed $number --epochs $number
--input-graph $string
:Where
i j
k l
...
a b
i, j, k, l, a, b
are vertex IDs from the graph. Please note that all vertices will get embeddings - even those without any edges.--output-embedding $string
:--binary-output
flag).Where
num_vertices dimension
0 e_1 e_2 e_3 ... e_dimension-1
1 e_1 e_2 e_3 ... e_dimension-1
...
num_vertices-1 e_1 e_2 e_3 ... e_dimension-1
num_vertices
and dimension
are the number of vertices and the embedding dimensionality, respectively, and every line afterward corresponds to a vertex from the graph. A line starts with the ID of the vertex and is followed by its embeddings. All elements within a line are space-separated.--epochs $number
: The number of epochs to run on the entirety of the graph. Running an epoch on a graph Gi which has been coarsening i times corresponds to running |Vi| positive samples on that graph. --epoch-strategy
and --smoothing-ratio
and are discussed further below.--directed $number
: Whether the graph is directed (1), undirected (0), or passed as in a Binary Compressed Sparse Row format (2).Many optional parameters can be used to fine-tune the embedding:
-d --dimension $number
: an integer with the dimensionality of the embedding.-s --negative_samples $number
: number of negative samples used with every positive update.--negative-weight $float
: a scaling factor used with negative samples to scale the gradients to be used when updating embeddings during negative updates.--device-id $nummber
: the ID of the GPU device to be used during the embedding.--binary-output
: whether the outputs are printed in binary format for compactness and ease of processing on memory. The format of the binary output is as follows:
--sampling-algorithm $number
: the method used to create positive samples during training. Currently, two sampling strategies are implemented: --sampling-algorithm 0
: 1-hop neighbor sampling or PPR sampling as described in VERSE. Depending on the value of --alpha
. --alpha 0
: positive samples for a node are sampled from its direct neighbors --alpha > 0 && --alpha < 1
: positive samples of a node are nodes reached after performing a Personalized Page Rank random walk, with --alpha
being the damping factor as defined in VERSE.--sampling-algorithm 1
: random-walk based sampling. With this method, random walks are generated on the CPU and samples are extracted from them and sent to the GPU. It is controlled with three parameters:--walk-length $number
: length of each random walk.--augmentation-distance $number
: within a walk, from each sequence of $number$
, all the pairs of nodes are used as poisitive samples.--sample-pool-size
: the number of samples to be added to the pool on the CPU which is copied to the GPU.-a --alpha $number
: A value for the positive sampling strategy to be used in the model based on VERSE. alpha = 0
will use an adjacency similarity positive sampling approach while 0 > alpha > 100
will use PPR with alpha/100
as its damping factor.-l --learning_rate $float
: The global learning rate of the model.--learning-rate-decay-strategy $num
: The strategy used to decay the learning rate during a level and between different levels. There are four strategies (0/1/2/3), their differences are shown below:
where
current_learning_rate = (max(1-current_epoch/total_epochs), 1e-4)*initial_learning_rate
current_epoch
and total_epochs
are the current and total epochs for the current coarsening level.1, 3; initial learning rate for every coarsening level differs based on the following heuristic:
lr_i = lr; if |Vi| < LR_REV_NV
lr_i = lr/sqrt(|Vi|/ LR_REV_NV); otherwise
Where:lr
= input learning rate (global)lri
= initial learning rate at coarsening level i.LR_REV_NV
= tunable hyperparameter in the src/main.cu
0, 2: initial learning rate at each level is the same as the original learning rate given as input
--no-coarsening
: will not apply any coarsening and will run the embedding directly on the original graph.--coarsening-stopping-threshold $num
: the number of vertices to stop coarsening at. i.e, when a graph Gi is generated having |Vi|< $num
, it will be added to the coarsened set, but the coarsening will not continue.--coarsening-stopping-precision $float
: the accpetable shrinkage of a graph during coarsening. i.e, if graph Gi is coarsened into graph Gi+1, and |Vi+1| > |Vi| $float
, graph Gi+1 is not* added to the coarsened set and coarsening will not continue.--coarsening-matching-threshold-ratio $num
: controls the total number of matches allowed per vertex. Given a graph Gi coarsened i times, a vertex in Gi is not allowed to match more than i * i * ($num / |Vi|)
vertices.--coarsening-min-vertices-in-graph $num
: the minimum number of vertices acceptable in a graph to be added into the coarsened set, i.e, if a graph Gi is coarsened into a graph Gi+1 and |Vi+1| < $num
, the graph Gi+1 is not added to the coarsened set.--epoch-strategy
: choose the strategy to use to distribute epochs across levels, there are multiple strategies available:
COARSE_SMOOTHING_RATIO * total_epochs
epochs are distributed equally across levels, while the remainder is distributed based on the fast rule.COARSE_SMOOTHING_RATIO * total_epochs
epochs are distributed equally across levels, while the remainder is distributed based on the accurate rule.NOTE: in all of the aforementioned rules, a level is allocated a minimum of 1 epoch.
--smoothing-ratio
: the smoothing ratio used in distributing epochs for the smooth strategies, i.e s-fast and s-accurate.--epoch-batch-size $num
: the number of epochs to run per large graph execution round, where a single round consists of a full rotation over all the embedding part pairs.--num-parts $num
: the number of embedding parts to store concurrently on the GPU.--num-pools $num
: the number of sample pools to store concurrently on the GPU.--sampling-threads $num
: the number of threads to work on sampling in parallel.--concurrent-samplers $num
: the number of sample pools to be sampled into concurrently, where a single pool can be sampled into by a maximum of sampling-threads / concurrent-samplers
threads.--task-queue-threads $num
: the number of threads to execute tasks from the task queue.--num-sample-pool-sets $num
: number of sample pool sets.Amro Alabsi Aljundi, Taha Atahan Akyildiz, and Kamer Kaya.
@INPROCEEDINGS{10.1145/3404397.3404456,
author = {Akyildiz, Taha Atahan and Aljundi, Amro Alabsi and Kaya, Kamer},
title = {GOSH: Embedding Big Graphs on Small Hardware},
year = {2020},
isbn = {9781450388160},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3404397.3404456},
doi = {10.1145/3404397.3404456},
booktitle = {49th International Conference on Parallel Processing - ICPP},
articleno = {4},
numpages = {11},
keywords = {GPU, parallel graph algorithms, link prediction, Graph embedding, graph coarsening},
location = {Edmonton, AB, Canada},
series = {ICPP ‘20}
note = {Nominated for the best-paper award.}
}
```