项目作者: ikelin

项目描述 :
Cuckoo Filter is a thread safe probability filter that performs set membership tests.
高级语言: Java
项目地址: git://github.com/ikelin/cuckoofilter.git
创建时间: 2019-01-11T06:33:27Z
项目社区:https://github.com/ikelin/cuckoofilter

开源协议:Apache License 2.0

下载


Cuckoo Filter

A thread safe probability filter that performs set membership tests. A lookup returns either might be in set or definitely not in set.

Maven Central
Build Status
Coverage Status
Codacy Badge

Usage

Maven pom.xml:

  1. <dependency>
  2. <groupId>com.ikelin</groupId>
  3. <artifactId>cuckoofilter</artifactId>
  4. <version>{VERSION}</version>
  5. </dependency>

Creating a Filter

  1. CuckooFilter filter = CuckooFilter.create(10000)
  2. .withFalsePositiveProbability(0.001)
  3. .withConcurrencyLevel(8)
  4. .build();

This creates a cuckoo filter of with expected max capacity of 10,000, false positive probability of 0.001 (or 0.1%), and concurrency level of 8.

Expected Max Capacity specifies the expected number of items this set can hold.

False Positive Probability is the probability that lookup item operation will return a false positive.

The allowed concurrency among read and write operations is guided by Concurrency Level.

Lookup Item

To lookup an item in the filter, use the mightContain() method:

  1. CuckooFilter tables = CuckooFilter.create(100).build();
  2. boolean tableMightExist = tables.mightContain(tableHash)
  3. if (!tableMightExist) {
  4. // table definitely does not exist, do not query database
  5. }

If mightContain() returns true, the item might or might not be in the filter. If mightContain() returns false, the item is definitely not in the set.

Put Item

To put an item into the filter, use the put() method:

  1. CuckooFilter blacklistedWebsites = CuckooFilter.create(3000000).build();
  2. boolean success = blacklistedWebsites.put(websiteHash);
  3. if (!success) {
  4. // set expectedMaxCapacity reached...
  5. }

Always check the boolean returned from the put() method. If put() returns true, the item is successfully inserted. If put() returns false, the filter has reached its capacity. In this case, create a new filter with larger capacity.

Remove Item

To remove an item from the filter, use the remove() method:

  1. CuckooFilter cdnCachedContents = CuckooFilter.create(100000000).build();
  2. filter.remove(purgedCdnCachedContentHash);

Item Hash Value

Use a performant hashing library that generates a well distributed long hash value for items. For example, OpenHFT’s Zero Allocation Hash or Google’s Guava Hashing.

  1. LongHashFunction hashFunction = LongHashFunction.xx();
  2. long itemHash = hashFunction.hashChars("item-foo");
  3. if (!filter.mightContain(itemHash)) {
  4. // item definitely does not exist in filter...
  5. }