项目作者: abhinav-upadhyay

项目描述 :
Sparse array support for Pandas as an extension array
高级语言: Python
项目地址: git://github.com/abhinav-upadhyay/sparsepandas.git
创建时间: 2018-08-20T05:01:03Z
项目社区:https://github.com/abhinav-upadhyay/sparsepandas

开源协议:BSD 3-Clause "New" or "Revised" License

下载


An extension array implementation for pandas to support sparse arrays using the sparse module.

Inspired by the IPArray implementation from cyberpandas. Work in progress.

Example Usage

  1. In [2]: import numpy as np
  2. In [3]: import pandas as pd
  3. In [4]: from sparse_array import SparseExtensionArray
  4. In [5]: arr = np.random.random(20000)
  5. In [6]: arr[arr<0.9] = 0.0
  6. In [7]: sparse_arr = SparseExtensionArray(arr)
  7. In [9]: df = pd.DataFrame({'sparse_col1': sparse_arr})
  8. In [11]: df.info()
  9. <class 'pandas.core.frame.DataFrame'>
  10. RangeIndex: 20000 entries, 0 to 19999
  11. Data columns (total 1 columns):
  12. sparse_col1 2062 non-null sparsetype
  13. dtypes: sparsetype(1)
  14. memory usage: 32.3 KB
  15. In [12]: df.mean()
  16. Out[12]:
  17. sparse_col1 0.09797
  18. dtype: float64
  19. In [13]: %timeit df.sum()
  20. 1.6 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Requirements for using

requirements.txt lists the requirements (pytest and hypothesis are needed for running tests), apart from that it depends on the development version of Pandas and sparse modules.