项目作者: datapao

项目描述 :
Six-Sigma rules on pySpark Dataframes
高级语言: Python
项目地址: git://github.com/datapao/wilson.git
创建时间: 2019-11-18T11:26:30Z
项目社区:https://github.com/datapao/wilson

开源协议:Apache License 2.0

下载


Six Sigma rules for PySpark DataFrames

Six sigma rule generator is a pyspark tool to generate six sigma rules for columns.

Background: https://www.isixsigma.com/tools-templates/control-charts/a-guide-to-control-charts/

The rule generator expects the target DataFrame to have a timestamp column.

Installation

For local usage:

1. Clone or download repository

2. Install using:

  1. pip install -e .

For Databricks installation:

1. Clone or download repository

2. Generate egg file using:

  1. python setup.py bdist

3. Install on Databricks:

  • Navigate to Clusters/[your cluster]/Libraries page:
  • Click Install New button
  • Select Python Egg from Library Type tab
  • Drag&drop the generated .egg file from the cloned repository’s dist directory to the window
  • Click Install button

Usage

  1. from wilson import SixSigma
  2. df = spark.read.csv('example.csv')
  3. sixsigma = SixSigma(timecol='timestamp')
  4. df = sixsigma.apply(df, ['target_column_1'])
  5. df.show()