Skip to content

datapao/wilson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Six Sigma rules for PySpark DataFrames

Six sigma rule generator is a pyspark tool to generate six sigma rules for columns.

Background: https://www.isixsigma.com/tools-templates/control-charts/a-guide-to-control-charts/

The rule generator expects the target DataFrame to have a timestamp column.

Installation

For local usage:

1. Clone or download repository

2. Install using:

pip install -e .

For Databricks installation:

1. Clone or download repository

2. Generate egg file using:

python setup.py bdist

3. Install on Databricks:

  • Navigate to Clusters/[your cluster]/Libraries page:
  • Click Install New button
  • Select Python Egg from Library Type tab
  • Drag&drop the generated .egg file from the cloned repository's dist directory to the window
  • Click Install button

Usage

from wilson import SixSigma

df = spark.read.csv('example.csv')

sixsigma = SixSigma(timecol='timestamp')
df = sixsigma.apply(df, ['target_column_1'])

df.show()