GitHub - hershyz/standardized-classification-engine: Automated ensemble training engine for geometry-based ML classification models.

Overview

Automated ensemble training engine for geometry-based ML classification models.

Purpose

The end goal of this project is to create an all encompassing ML algorithm that automatically finds the best classification method for an individual dataset, serving as a standalone module for instantaneous predictions and lightweight model retraining, having lower runtimes and resource utilization than existing frameworks (TensorFlow, PyTorch, SKLearn), while maintaining similar levels of classification accuracy.
Such a system eliminates the need for human trial and error when choosing a classification technique, allowing for an automatic classification algorithm switch as a dataset grows through the use of a standardized model.
Finally, this system removes the overhead of large inferencing frameworks, making for faster classifications and model training times without the need for GPU acceleration.

Dependencies for tests (pip)

numpy
pandas
sklearn
tensorflow
keras

Model Metadata

mean per input feature, parameterized by output feature
int/float mapping (label serialization) for non-numerical input features
standard deviation per input feature
sampled raw data, for knn classifications
max accuracy classification algorithm

Standardized Classification Algorithms

sqrt distance classifier
absolute distance classifier
percent distance classifier
standard deviation distance classifier
knn (k-nearest neighbors) classifier

Package Modules

abs_distance_classifier
percent_distance_classifier
sqrt_distance_classifier
stddev_classifier
common_model_lib
knn
data_sampler
dataframe
model
numerical_feature_converter
prediction_engine
training_engine

Train and cache model

import training_engine
import common_model_lib
model = training_engine.get_model('data/drug200.csv')
common_model_lib.cache(model, 'drug200')

(terminal output)
sqrt distance classifier accuracy: 0.38
abs distance classifier accuracy: 0.44
percent distance classifier accuracy: 0.615
stddev classifier accuracy: 0.64
knn accuracy: 0.94
---
training complete: 0.04366495800059056s elapsed
max training accuracy: knn (0.94)

Parse a cached model and predict

import common_model_lib
import prediction_engine
model = common_model_lib.parse_model('drug200.mlmodel')
'''
input features:
point[0] = age
point[1] = sex
point[2] = bp
point[3] = cholesterol
point[4] = na_to_k (ratio)
real output: DrugY
'''
point = ['23', 'F', 'HIGH', 'HIGH', '25.355']
print(prediction_engine.predict(point, model))

(terminal output)
DrugY

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
__pycache__		__pycache__
benchmark-data		benchmark-data
data		data
images		images
sklearn-tests		sklearn-tests
tensorflow-tests		tensorflow-tests
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
abs_distance_classifier.py		abs_distance_classifier.py
common_model_lib.py		common_model_lib.py
data_sampler.py		data_sampler.py
dataframe.py		dataframe.py
knn.py		knn.py
main.py		main.py
model.py		model.py
numerical_feature_converter.py		numerical_feature_converter.py
percent_distance_classifier.py		percent_distance_classifier.py
prediction_engine.py		prediction_engine.py
sqrt_distance_classifier.py		sqrt_distance_classifier.py
stddev_classifier.py		stddev_classifier.py
training_engine.py		training_engine.py

License

hershyz/standardized-classification-engine

Folders and files

Latest commit

History

Repository files navigation

Overview

Purpose

Dependencies for tests (pip)

Model Metadata

Standardized Classification Algorithms

Package Modules

Train and cache model

Parse a cached model and predict

About

Resources

License

Stars

Watchers

Forks

Languages