项目作者: GiulioRossetti

项目描述 :
Network datasets with ground truth clusterings
高级语言:
项目地址: git://github.com/GiulioRossetti/cdlib_datasets.git
创建时间: 2020-12-18T07:24:21Z
项目社区:https://github.com/GiulioRossetti/cdlib_datasets

开源协议:

下载


CDlib datasets

CDlib logo

Remote repository of public domain network datasets (along with their ground truth clustering) for the CDlib libray.

For instructions on how to load the data within CDlib refer to the official documentation

Available datasets

Here the list of available network datasets - both real and synthetically generated.

Real world

Network Name Network Type Upstream
Karate Club Social UCINET
Youtube Social SNAP
DBLP Scientific Collaboration SNAP
Amazon Co-Purchases SNAP

Synthetic

LFR Benchmark datasets:

Set of networks with planted community partitions generated using the networkx implementation of the Lancichinetti-Fortunato-Radicchi benchmark.

“Benchmark graphs for testing community detection algorithms”, Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi, Phys. Rev. E 78, 046110 2008

Dataset names follows the pattern

LFR_N{number of nodes}_ad{average degree}_mc{min community size}_mu{mixing coefficient}

where:

  • number of nodes: [1000, 5000, 10000, 50000, 100000]
  • average degree: [5]
  • min community size: [50]
  • mixing coefficient: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

The power law exponent for the degree distribution is fixed at 3, while for the community size distribution to 1.5