项目作者: fuyb1992

项目描述 :
Read, write and update large scale pandas DataFrame with Elasticsearch
高级语言: Python
项目地址: git://github.com/fuyb1992/es_pandas.git
创建时间: 2019-11-02T09:31:55Z
项目社区:https://github.com/fuyb1992/es_pandas

开源协议:MIT License

下载


es_pandas

Build Status 996.icu LICENSE PyPi version
Downloads

Read, write and update large scale pandas DataFrame with ElasticSearch.

Requirements

This package should work on Python3(>=3.4) and ElasticSearch should be version 5.x, 6.x or 7.x.

Installation
The package is hosted on PyPi and can be installed with pip:

  1. pip install es_pandas

Deprecation Notice

Supporting of ElasticSearch 5.x will by deprecated in future version.

Usage

  1. import time
  2. import pandas as pd
  3. from es_pandas import es_pandas
  4. # Information of es cluseter
  5. es_host = 'localhost:9200'
  6. index = 'demo'
  7. # crete es_pandas instance
  8. ep = es_pandas(es_host)
  9. # Example data frame
  10. df = pd.DataFrame({'Num': [x for x in range(100000)]})
  11. df['Alpha'] = 'Hello'
  12. df['Date'] = pd.datetime.now()
  13. # init template if you want
  14. doc_type = 'demo'
  15. ep.init_es_tmpl(df, doc_type)
  16. # Example of write data to es, use the template you create
  17. ep.to_es(df, index, doc_type=doc_type, thread_count=2, chunk_size=10000)
  18. # set use_index=True if you want to use DataFrame index as records' _id
  19. ep.to_es(df, index, doc_type=doc_type, use_index=True, thread_count=2, chunk_size=10000)
  20. # delete records from es
  21. ep.to_es(df.iloc[5000:], index, doc_type=doc_type, _op_type='delete', thread_count=2, chunk_size=10000)
  22. # Update doc by doc _id
  23. df.iloc[:1000, 1] = 'Bye'
  24. df.iloc[:1000, 2] = pd.datetime.now()
  25. ep.to_es(df.iloc[:1000, 1:], index, doc_type=doc_type, _op_type='update')
  26. # Example of read data from es
  27. df = ep.to_pandas(index)
  28. print(df.head())
  29. # return certain fields in es
  30. heads = ['Num', 'Date']
  31. df = ep.to_pandas(index, heads=heads)
  32. print(df.head())
  33. # set certain columns dtype
  34. dtype = {'Num': 'float', 'Alpha': object}
  35. df = ep.to_pandas(index, dtype=dtype)
  36. print(df.dtypes)
  37. # infer dtype from es template
  38. df = ep.to_pandas(index, infer_dtype=True)
  39. print(df.dtypes)
  40. # use query_sql parameter if you want to do query in sql
  41. # Example of write data to es with pandas.io.json
  42. ep.to_es(df, index, doc_type=doc_type, use_pandas_json=True, thread_count=2, chunk_size=10000)
  43. print('write es doc with pandas.io.json finished')