Skip to content

HealthCatalyst/healthcareai-py

Repository files navigation

healthcareai

Code Health Appveyor build status Build Status

PyPI version DOI GitHub license

The aim of healthcareai is to streamline machine learning in healthcare. The package has two main goals:

  • Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to a database such as MSSQL, MySQL, SQLite or csv flat file.
  • Provide tools related to data cleaning, manipulation, and imputation.

Installation

Windows

  • If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
    • Important When prompted for the Installation Type, select Just Me (recommended). This makes permissions later in the process much simpler.
  • Open the terminal (i.e., CMD or PowerShell, if using Windows)
  • Run conda install pyodbc
  • Upgrade to latest scipy (note that upgrade command took forever)
  • Run conda remove scipy
  • Run conda install scipy
  • Run conda install scikit-learn
  • Install healthcareai using one and only one of these three methods (ordered from easiest to hardest).
    1. Recommended: Install the latest release with pip run pip install healthcareai
    2. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run pip install https://github.com/HealthCatalyst/healthcareai-py/zipball/master

Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:

  • When running anaconda and installing packages using the conda command, you don't need to worry about dependency hell, particularly because packages aren't compiled on your machine; conda installs pre-compiled binaries.
  • A great example of the pain the using conda saves you is with the python package scipy, which, by their own admission "is difficult".

Linux

You may need to install the following dependencies:

  • sudo apt-get install python-tk
  • sudo pip install pyodbc
    • Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai

macOS

  • pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

  • Install docker
  • Clone this repo (look for the green button on the repo main page)
  • cd into the cloned directory
  • run docker build -t healthcareai .
  • run the docker instance with docker run -p 8888:8888 healthcareai
  • You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import SupervisedModelTrainer and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow (with the healthcare-ai tag) so we can help others along this process.

Getting started

  1. Read through the Getting Started section of the healthcareai-py documentation.

  2. Read through the example files to learn how to use the healthcareai-py API.

    • For examples of how to train and evaluate a supervised model, inspect and run either example_regression_1.py or example_classification_1.py using our sample diabetes dataset.
    • For examples of how to use a model to make predictions, inspect and run either example_regression_2.py or example_classification_2.py after running one of the first examples.
    • For examples of more advanced use cases, inspect and run example_advanced.py.
  3. To train and evaluate your own model, modify the queries and parameters in either example_regression_1.py or example_classification_1.py to match your own data.

  4. Decide what type of prediction output you want. See Choosing a Prediction Output Type for details.

  5. Set up your database tables to match the schema of the output type you chose.

  6. Congratulations! After running one of the example files with your own data, you should have a trained model. To use your model to make predictions, modify either example_regression_2.py or example_classification_2.py to use your new model. You can then run it to see the results.

For Issues

  • Double check that the code follows the examples here
  • If you're still seeing an error, create a post in Stack Overflow (with the healthcare-ai tag) that contains
    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error
  • You can also log a new issue in the GitHub repo by clicking here