项目作者: mthh

项目描述 :
Compute Natural Breaks in Python (Fisher-Jenks algorithm)
高级语言: Python
项目地址: git://github.com/mthh/jenkspy.git
创建时间: 2016-09-13T09:46:04Z
项目社区:https://github.com/mthh/jenkspy

开源协议:MIT License

下载


Jenkspy: Fast Fisher-Jenks breaks for Python

Compute “natural breaks” (Fisher-Jenks algorithm) on list / tuple / array / numpy.ndarray of integers/floats.

The algorithm implemented by this library is also sometimes referred to as Fisher-Jenks algorithm, Jenks Optimisation Method or Fisher exact optimization method. This is a deterministic method to calculate the optimal class boundaries.

Intended compatibility: CPython 3.7+

Wheels are provided via PyPI for Windows / MacOS / Linux users - Also available on conda-forge channel for Anaconda users.




Usage

Two ways of using jenkspy are available:

  • by using the jenks_breaks function which takes as input
    a list
    / tuple
    / array.array
    / numpy.ndarray of integers or floats and returns a list of values that correspond to the limits of the classes (starting with the minimum value of the series - the lower bound of the first class - and ending with its maximum value - the upper bound of the last class).
  1. >>> import jenkspy
  2. >>> import json
  3. >>> with open('tests/test.json', 'r') as f:
  4. ... # Read some data from a JSON file
  5. ... data = json.loads(f.read())
  6. ...
  7. >>> jenkspy.jenks_breaks(data, n_classes=5) # Asking for 5 classes
  8. [0.0028109620325267315, 2.0935479691252112, 4.205495140049607, 6.178148351609707, 8.09175917180255, 9.997982932254672]
  9. # ^ ^ ^ ^ ^ ^
  10. # Lower bound Upper bound Upper bound Upper bound Upper bound Upper bound
  11. # 1st class 1st class 2nd class 3rd class 4th class 5th class
  12. # (Minimum value) (Maximum value)
  • by using the JenksNaturalBreaks class that is inspired by scikit-learn classes.

The .fit and .group behavior is slightly different from jenks_breaks,
by accepting value outside the range of the minimum and maximum value of breaks_,
retaining the input size. It means that fit and group will use only the inner_breaks_.
All value below the min bound will be included in the first group and all value higher than the max bound will be included in the last group.

  1. >>> from jenkspy import JenksNaturalBreaks
  2. >>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
  3. >>> jnb = JenksNaturalBreaks(4) # Asking for 4 clusters
  4. >>> jnb.fit(x) # Create the clusters according to values in 'x'
  5. >>> print(jnb.labels_) # Labels for fitted data
  6. ... print(jnb.groups_) # Content of each group
  7. ... print(jnb.breaks_) # Break values (including min and max)
  8. ... print(jnb.inner_breaks_) # Inner breaks (ie breaks_[1:-1])
  9. [0 0 0 1 1 1 2 2 2 3 3 3]
  10. [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11])]
  11. [0.0, 2.0, 5.0, 8.0, 11.0]
  12. [2.0, 5.0, 8.0]
  13. >>> print(jnb.predict(15)) # Predict the group of a value
  14. 3
  15. >>> print(jnb.predict([2.5, 3.5, 6.5])) # Predict the group of several values
  16. [1 1 2]
  17. >>> print(jnb.group([2.5, 3.5, 6.5])) # Group the elements into there groups
  18. [array([], dtype=float64), array([2.5, 3.5]), array([6.5]), array([], dtype=float64)]

Installation

  • From pypi
  1. pip install jenkspy
  • From source
  1. git clone http://github.com/mthh/jenkspy
  2. cd jenkspy/
  3. pip install .
  • For anaconda users
  1. conda install -c conda-forge jenkspy

Requirements

  • Numpy

  • Only for building from source: C compiler, Python C headers, setuptools and Cython.

Motivation:

  • Making a painless installing C extension so it could be used more easily
    as a dependency in an other package (and so learning how to build wheels
    using appveyor / travis at first - now it uses GitHub Actions).
  • Getting the break values! (and fast!). No fancy functionality provided,
    but contributions/forks/etc are welcome.
  • Other python implementations are currently existing but not as fast or not available on PyPi.