:hocho: Neutralize cycles in (large) graphs
Collection of utilities and scripts for neutralizing cycles while performing the least amount of changes to the original edge structure:
greedy_remove_cycles
inside cycle_removal.cpp
)Currently, the best strategy is as follows:
Convert DCG to DAG. Keep track of ancestry when merging two nodes (~10mn to convert full English Wikipedia graph to DAG)
Recover ancestry for nodes that received most merges (most nodes with merges has 1-5 merges, but 3-4 have over a 1000, and contain an eclectic mix of nodes)
Form subgraph from recovered nodes in (2)
On each subgraph of (3) perform a greedy deletion of the edges that participate in the most cycles within those subgraphs. Only remove edges that do not disconnect nodes from the root.
Reconnect subgraph to merged graph from (1)
cd build
cmake ..
make -j9
To run greedy most-cyclic edge deletion do the following:
Select some number of cycles you want to search for at each step of the algorithms (say 10000),
and whether you want to update the most-cyclic edge count after deleting an edge (when cycles are broken the edges that participate in are no longer cyclic, however they were at some point) - for now let’s not update (0). We run this algorithm on a graph that can be loaded from a binary file by storing edges in a binary file as a sequence of from-to pairs made of int32s (graph_out.bin
contains such a graph):
./cycle_removal graph_out.bin 10000 0
To export, specify an output path as the optional 4th argument:
./cycle_removal graph_out.bin 10000 0 graph_out_cycleless.bin