Skip to content

hershyz/standardized-classification-engine

Repository files navigation

Overview

Automated ensemble training engine for geometry-based ML classification models.

Purpose

The end goal of this project is to create an all encompassing ML algorithm that automatically finds the best classification method for an individual dataset, serving as a standalone module for instantaneous predictions and lightweight model retraining, having lower runtimes and resource utilization than existing frameworks (TensorFlow, PyTorch, SKLearn), while maintaining similar levels of classification accuracy.
Such a system eliminates the need for human trial and error when choosing a classification technique, allowing for an automatic classification algorithm switch as a dataset grows through the use of a standardized model.
Finally, this system removes the overhead of large inferencing frameworks, making for faster classifications and model training times without the need for GPU acceleration.

Dependencies for tests (pip)

numpy
pandas
sklearn
tensorflow
keras

Model Metadata

mean per input feature, parameterized by output feature
int/float mapping (label serialization) for non-numerical input features
standard deviation per input feature
sampled raw data, for knn classifications
max accuracy classification algorithm

Standardized Classification Algorithms

sqrt distance classifier
absolute distance classifier
percent distance classifier
standard deviation distance classifier
knn (k-nearest neighbors) classifier

Package Modules

abs_distance_classifier
percent_distance_classifier
sqrt_distance_classifier
stddev_classifier
common_model_lib
knn
data_sampler
dataframe
model
numerical_feature_converter
prediction_engine
training_engine

Train and cache model

import training_engine
import common_model_lib

model = training_engine.get_model('data/drug200.csv') common_model_lib.cache(model, 'drug200')

(terminal output)
sqrt distance classifier accuracy: 0.38
abs distance classifier accuracy: 0.44
percent distance classifier accuracy: 0.615
stddev classifier accuracy: 0.64
knn accuracy: 0.94
---
training complete: 0.04366495800059056s elapsed
max training accuracy: knn (0.94)

Parse a cached model and predict

import common_model_lib
import prediction_engine

model = common_model_lib.parse_model('drug200.mlmodel')

''' input features: point[0] = age point[1] = sex point[2] = bp point[3] = cholesterol point[4] = na_to_k (ratio) real output: DrugY ''' point = ['23', 'F', 'HIGH', 'HIGH', '25.355'] print(prediction_engine.predict(point, model))

(terminal output)
DrugY

About

Automated ensemble training engine for geometry-based ML classification models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages