项目作者: tr0j4n034

项目描述 :
高级语言: C++
项目地址: git://github.com/tr0j4n034/Count-Min-Sketch.git
创建时间: 2019-01-14T11:05:32Z
项目社区:https://github.com/tr0j4n034/Count-Min-Sketch

开源协议:

下载


Count-Min-Sketch

Count-Min Sketch and Consistent Weighted Sampling implementation

Quick look:
https://en.wikipedia.org/wiki/Count–min_sketch
https://www.microsoft.com/en-us/research/publication/consistent-weighted-sampling/
https://arxiv.org/abs/1706.01172

It is a tool for generating Count-Min sketches and consistent weighted samples for varying sized data streams. The implementation has been tested over artificially generated datasets with size upto $10^8$. The library contains:

  • implementations for distance measurements (Jaccard, Hamming, Euclidean, Edit, Manhattan, Cosine).
  • Count-Min Sketch tables. Adding stream elements to the table, and fast look-up.
  • CWS with settings from original paper. CWS adaptation over Count-Min tables and iterable streams. Sketching CMS tables
  • Artificial dataset generation tools, random number generation via uniform, gamma, beta random variables.

Note: Please, install/use C++ 11 or higher. Boost should be installed for some features.