Six-Sigma rules on pySpark Dataframes
Six sigma rule generator is a pyspark tool to generate six sigma rules for columns.
Background: https://www.isixsigma.com/tools-templates/control-charts/a-guide-to-control-charts/
The rule generator expects the target DataFrame to have a timestamp
column.
pip install -e .
python setup.py bdist
Clusters
/[your cluster]
/Libraries
page:Install New
buttonPython Egg
from Library Type
tabdist
directory to the windowInstall
button
from wilson import SixSigma
df = spark.read.csv('example.csv')
sixsigma = SixSigma(timecol='timestamp')
df = sixsigma.apply(df, ['target_column_1'])
df.show()