Skip to content

Random Forest Library In Python Compatible with Scikit-Learn

Notifications You must be signed in to change notification settings

mdh266/RandomForests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status codecov made-with-python

Random Forests In Python


Intoduction


I started this project to better understand the way Decision trees and random forests work. At this point the classifiers are only based off the gini-index and the regression models are based off the mean square error. Both the classifiers and regression models are built to work with Pandas and Scikit-Learn

Examples

Basic classification example using Scikit-learn:

from randomforests import RandomForestClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer()

cols = [dataset.data[:,i] for i in range(4)]

X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)

pipe   = Pipeline([("forest", RandomForestClassifier())])

params = {"forest__max_depth": [1,2,3]}

grid   = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model  = grid.fit(X_train,y_train)

preds  = model.predict(X_test)

print("Accuracy: ", accuracy_score(preds, y_test))

>> Accuracy:  0.9020979020979021

Basic regression example using Scikit-learn:

from randomforests import RandomForestRegressor
from sklearn.metrics import r2_score,
from sklearn.datasets import load_boston
dataset = load_boston()

cols = [dataset.data[:,i] for i in range(4)]

X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)

pipe   = Pipeline([("forest", RandomForestRegressor())])

params = {"forest__max_depth": [1,2,3]}

grid   = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model  = grid.fit(X,y)

preds  = model.predict(X_test)

print("R^2 : ", r2_score(y_test,preds))

>> R^2 : 0.37948488681649484

Installing


Uses the setup.py generated by PyScaffold. To install the library in development mode use the following:

python setup.py install

Test


Uses the setup.py generated by PyScaffold:

python setup.py test

Dependencies


Dependencies are minimal:

- Python (>= 3.6)
- [Scikit-Learn](https://scikit-learn.org/stable/) (>=0.23)
- [Pandas](https://pandas.pydata.org/) (>=1.0)

References