项目作者: fmoessbauer

项目描述 :
Implementation of the Lossy Counting stream processing algorithm
高级语言: C++
项目地址: git://github.com/fmoessbauer/LossyCountingModel.git
创建时间: 2017-07-11T09:35:02Z
项目社区:https://github.com/fmoessbauer/LossyCountingModel

开源协议:BSD 3-Clause "New" or "Revised" License

下载


Lossy Counting Model

CircleCI

Implementation of the Lossy Counting stream processing algorithm in C++.

Parts of the implementation are based on mvogiatzis/freq-count. He also made an excellent blog article on how the algorithms works.

The implementation of the algorithm is capable of processing streams with millions of elements by using only very limited resources. This includes the following features:

  • Single pass
  • Limited memory
  • Real-time data processing

How the model works

The Lossy Counting Model takes a frequency and an error as input. The frequency denotes the threshold for an item to be visible in the output. The error denotes the maximum underestimation of a returned element to be e*N where N is the lenght of the stream.

The model is capable of processing single elements as well as batches of size window size. In both cases no data is copied.

Setup a model and process

  1. LossyCountingModel<T> lcm(frequency, error);
  2. // feed in data
  3. while(...){
  4. lcm.processItem(item);
  5. }
  6. // compute results (map containing the histogram)
  7. auto results = lcm.computeOutput();

Compiling

The implementation requires C++11 and is header only. Hence it can be easily included into any application. You might also have a look at the provided examples.

Contributing

Do you want to improve the implementation or report bugs? Just contact me or open an issue / pull request on GitHub.

Performance

The implementation of the model should be with good performance. However the main limitation in the examples is the generation of the random numbers.