SciKit-Learn Laboratory

This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. One of the primary goals of our project is to make it so that you can run scikit-learn experiments without actually needing to write any code other than what you used to generate/extract the features.

Command-line Interface

The main utility we provide is called run_experiment and it can be used to easily run a series of learners on datasets specified in a configuration file like:

[General]
experiment_name = Titanic_Evaluate_Tuned
# valid tasks: cross_validate, evaluate, predict, train
task = evaluate

[Input]
# these directories could also be absolute paths
# (and must be if you're not running things in local mode)
train_directory = train
test_directory = dev
# Can specify multiple sets of feature files that are merged together automatically
# (even across formats)
featuresets = [["family.ndj", "misc.csv", "socioeconomic.arff", "vitals.csv"]]
# List of scikit-learn learners to use
learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
# Column in CSV containing labels to predict
label_col = Survived
# Column in CSV containing instance IDs (if any)
id_col = PassengerId

[Tuning]
# Should we tune parameters of all learners by searching provided parameter grids?
grid_search = true
# Function to maximize when performing grid search
objectives = ['accuracy']

[Output]
# Also compute the area under the ROC curve as an additional metric
metrics = ['roc_auc']
# The following can/should be absolute paths
log = output
results = output
predictions = output
models = output

For more information about getting started with run_experiment, please check out our tutorial, or our config file specs.

We also provide utilities for:

converting between machine learning toolkit formats (e.g., ARFF, CSV, MegaM)
filtering feature files
joining feature files
other common tasks

Python API

If you just want to avoid writing a lot of boilerplate learning code, you can also use our simple Python API which also supports pandas DataFrames. The main way you'll want to use the API is through the Learner and Reader classes. For more details on our API, see the documentation.

While our API can be broadly useful, it should be noted that the command-line utilities are intended as the primary way of using SKLL. The API is just a nice side-effect of our developing the utilities.

A Note on Pronunciation

SciKit-Learn Laboratory (SKLL) is pronounced "skull": that's where the learning happens.

Requirements

Python 3.6+
scikit-learn
tabulate
BeautifulSoup 4
pandas
Grid Map (only required if you plan to run things in parallel on a DRMAA-compatible cluster)
joblib
ruamel.yaml
seaborn

Talks

Simpler Machine Learning with SKLL 1.0, Dan Blanchard, PyData NYC 2014 (video | slides)
Simpler Machine Learning with SKLL, Dan Blanchard, PyData NYC 2013 (video | slides)

Books

SKLL is featured in Data Science at the Command Line by Jeroen Janssens.

Changelog

See GitHub releases.

Contribute

Thank you for your interest in contributing to SKLL! See CONTRIBUTING.md for instructions on how to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 2,820 Commits
.github		.github
conda-recipe		conda-recipe
doc		doc
examples		examples
skll		skll
tests		tests
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pep8speaks.yml		.pep8speaks.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
conda_requirements.txt		conda_requirements.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

AVajpayeeJr/skll

Folders and files

Latest commit

History

Repository files navigation

SciKit-Learn Laboratory

Command-line Interface

Python API

A Note on Pronunciation

Requirements

Talks

Books

Changelog

Contribute

About

Resources

License

Stars

Watchers

Forks

Languages