项目作者: amit1rrr

项目描述 :
Python package to compress numerical series & numpy arrays into strings
高级语言: Python
项目地址: git://github.com/amit1rrr/numcompress.git
创建时间: 2018-02-01T10:26:49Z
项目社区:https://github.com/amit1rrr/numcompress

开源协议:MIT License

下载


PyPI version Build Status Coverage Status

numcompress

Simple way to compress and decompress numerical series & numpy arrays.

  • Easily gets you above 80% compression ratio
  • You can specify the precision you need for floating points (up to 10 decimal points)
  • Useful to store or transmit stock prices, monitoring data & other time series data in compressed string format

Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack.
After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don’t care about conversion to string for transmitting or storing purpose.

Installation

  1. pip install numcompress

Usage

  1. from numcompress import compress, decompress
  2. # Integers
  3. >>> compress([14578, 12759, 13525])
  4. 'B_twxZnv_nB_bwm@'
  5. >>> decompress('B_twxZnv_nB_bwm@')
  6. [14578.0, 12759.0, 13525.0]
  1. # Floats - lossless compression
  2. # precision argument specifies how many decimal points to preserve, defaults to 3
  3. >>> compress([145.7834, 127.5989, 135.2569], precision=4)
  4. 'Csi~wAhdbJgqtC'
  5. >>> decompress('Csi~wAhdbJgqtC')
  6. [145.7834, 127.5989, 135.2569]
  1. # Floats - lossy compression
  2. >>> compress([145.7834, 127.5989, 135.2569], precision=2)
  3. 'Acn[rpB{n@'
  4. >>> decompress('Acn[rpB{n@')
  5. [145.78, 127.6, 135.26]
  1. # compressing and decompressing numpy arrays
  2. >>> from numcompress import compress_ndarray, decompress_ndarray
  3. >>> import numpy as np
  4. >>> series = np.random.randint(1, 100, 25).reshape(5, 5)
  5. >>> compressed_series = compress_ndarray(series)
  6. >>> decompressed_series = decompress_ndarray(compressed_series)
  7. >>> series
  8. array([[29, 95, 10, 48, 20],
  9. [60, 98, 73, 96, 71],
  10. [95, 59, 8, 6, 17],
  11. [ 5, 12, 69, 65, 52],
  12. [84, 6, 83, 20, 50]])
  13. >>> compressed_series
  14. '5*5,Bosw@_|_Cn_eD_fiA~tu@_cmA_fiAnyo@o|k@nyo@_{m@~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`zB_ry@'
  15. >>> decompressed_series
  16. array([[29., 95., 10., 48., 20.],
  17. [60., 98., 73., 96., 71.],
  18. [95., 59., 8., 6., 17.],
  19. [ 5., 12., 69., 65., 52.],
  20. [84., 6., 83., 20., 50.]])
  21. >>> (series == decompressed_series).all()
  22. True

Compression Ratio

Test # of Numbers Compression ratio
Integers 10k 91.14%
Floats 10k 81.35%

You can run the test suite with -s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.

  1. pytest -s

Here’s a quick example showing compression ratio:

  1. >>> series = random.sample(range(1, 100000), 50000) # generate 50k random numbers between 1 and 100k
  2. >>> text = compress(series) # apply compression
  3. >>> original_size = sum(sys.getsizeof(i) for i in series)
  4. >>> original_size
  5. 1200000
  6. >>> compressed_size = sys.getsizeof(text)
  7. >>> compressed_size
  8. 284092
  9. >>> compression_ratio = ((original_size - compressed_size) * 100.0) / original_size
  10. >>> compression_ratio
  11. 76.32566666666666

We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.

Contribute

If you see any problem, open an issue or send a pull request. You can write to me at hello@amirathi.com