Skip to content

ericgordo/Python_Enron_ML_Classifier

Repository files navigation

This project, is my first attempt to utilize machine learning skills to build an algorithm in order to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset. 

This Repository contains the files used to create my algorithm, as well as some additional files to document my work. The files in this repository are listed below:

poi_id.py —>  This file is the final stand alone python code to create the machine learning algorithm classifier, feature list, and dataset to run in the tester code. 

feature_format.py —> helper Code to format data in poi_id.py, and in Notebooks.

tester.py —> this python file can be run after poi_id.py to test the machine learning algorithm.

Just_Code_Notebook —> All investigation and experimentation was done in a Jupyter IPython Notebook. These files contain Just the code that was run to find the optimized Machine Learning Algorithm. 

Final_Notebook  —> Includes all code found in the Just_Code_Notebook, but also includes my thoughts and written notes and opinions throughout the piece. Final Conclusions found in this Notebook.

Resources and Work Sided: 
Enron- Email Dataset:
 https://www.cs.cmu.edu/~./enron/

Article to identify POI’s from Enron:
 http://usatoday30.usatoday.com/money/industries/energy/2005-12-28-enron-participants_x.htm

General Machine Learning Support and Help:
https://www.udacity.com/course/intro-to-machine-learning--ud120 (And all Forums Included)

Python Coding Support and Help:
http://stackoverflow.com/


About

A Project Utilizing Machine Learning Techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published