ATOM Modeling PipeLine (AMPL) for Drug Discovery

Created by the Accelerating Therapeutics for Opportunites in Medicine (ATOM) Consortium

AMPL is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.

The ATOM Modeling PipeLine (AMPL) extends the functionality of DeepChem and supports an array of machine learning and molecular featurization tools. AMPL is an end-to-end data-driven modeling pipeline to generate machine learning models that can predict key safety and pharmacokinetic-relevant parameters. AMPL has been benchmarked on a large collection of pharmaceutical datasets covering a wide range of parameters.

A pre-print of a manuscript describing this project is available through ArXiv. readthedocs are available as well here.

Public release

This release marks the first public availability of the ATOM Modeling PipeLine (AMPL). Installation instructions for setting up and running AMPL are described below. Basic examples of model fitting and prediction are also included. AMPL has been deployed to and tested in multiple computing environments by ATOM Consortium members. Detailed documentation for the majority of the available features is included, but the documentation does not cover all developed features. This is a living software project with active development. Check back for continued updates. Feedback is welcomed and appreciated, and the project is open to contributions!

Useful links

Getting started

Welcome to the ATOM Modeling PipeLine (AMPL) for Drug Discovery! These instructions will explain how to install this pipeline for model fitting and prediction.

Prerequisites

AMPL is a Python 3 package that has been developed and run in a specific conda environment. The following prerequisites are necessary to install AMPL:

conda (Anaconda 3 or Miniconda 3, Python 3)

Install

Clone the git repository

git clone https://github.com/ATOMconsortium/AMPL.git

Create conda environment

cd conda

conda create -y -n atomsci --file conda_package_list.txt

conda activate atomsci

pip install -r pip_requirements.txt

Note: Depending on system performance, creating the environment can take some time.

Install AMPL

Go to the AMPL root directory and install the AMPL package:

conda activate atomsci

cd ..

./build.sh && ./install.sh

After this process, you will have an atomsci conda environment with all dependencies installed. The name of the AMPL package is atomsci-ampl and is installed in the install.sh script to the environment with conda's pip.

More installation information

More details on installation can be found in Advanced installation.

Example AMPL usage

An example Jupyter notebook is available to get you started: atomsci/ddm/Delaney_Example.ipynb

Tests

AMPL includes a suite of software tests. This section explains how to run a very simple test that is fast to run. The Python test fits a random forest model using Mordred descriptors on a set of compounds from Delaney, et al with solubility data. A molecular scaffold-based split is used to create the training and test sets. In addition, an external holdout set is used to demonstrate how to make predictions on new compounds.

To run the Delaney Python script that curates a dataset, fits a model, and makes predictions, run the following commands:

conda activate atomsci

cd atomsci/ddm/test/integrative/delaney_RF

pytest

Note: This test generally takes a few minutes on a modern system

The important files for this test are listed below:

test_delany_RF.py: This script loads and curates the dataset, generates a model pipeline object, and fits a model. The model is reloaded from the filesystem and then used to predict solubilities for a new dataset.
config_delaney_fit_RF.json: Basic parameter file for fitting
config_delaney_predict_RF.json: Basic parameter file for predicting

More example and test information

More details on examples and tests can be found in Advanced testing.

AMPL Features

AMPL enables tasks for modeling and prediction from data ingestion to data analysis and can be broken down into the following stages:

Data ingestion and curation
Featurization
Model training and tuning
Prediction generation
Visualization and analysis

1. Data curation

Generation of RDKit molecular SMILES structures
Processing of qualified or censored data processing
Curation of activity and property values

2. Featurization

Extended connectivity fingerprints (ECFP)
Graph convolution latent vectors from DeepChem
Chemical descriptors from Mordred package
Descriptors generated by MOE (requires MOE license)

3. Model training and tuning

Test set selection
Cross-validation
Uncertainty quantification
Hyperparameter optimization

4. Supported models

scikit-learn random forest models
XGBoost models
Fully connected neural networks
Graph convolution models

5. Visualization and analysis

Visualization and analysis tools

Details of running specific features are within the parameter (options) documentation. More detailed documentation is in the library documentation.

Running AMPL

AMPL can be run from the command line or by importing into Python scripts and Jupyter notebooks.

Python scripts and Jupyter notebooks

AMPL can be used to fit and predict molecular activities and properties by importing the appropriate modules. See the examples for more descriptions on how to fit and make predictions using AMPL.

Pipeline parameters (options)

AMPL includes many parameters to run various model fitting and prediction tasks.

Pipeline options (parameters) can be set within JSON files containing a parameter list.
The parameter list with detailed explanations of each option can be found at atomsci/ddm/docs/PARAMETERS.md.
Example pipeline JSON files can be found in the tests directory and the example directory.

Library documentation

AMPL includes detailed docstrings and comments to explain the modules. Full HTML documentation of the Python library is available with the package at atomsci/ddm/docs/build/html/index.html.

More information on AMPL usage

More information on AMPL usage can be found in Advanced AMPL usage

Advanced AMPL usage

Command line

AMPL can fit models from the command line with:

python model_pipeline.py --config_file test.json

Hyperparameter optimization

Hyperparameter optimization for AMPL model fitting is available to run on SLURM clusters. Examples of running hyperparameter optimization will be added.

Advanced installation

Deployment

AMPL has been developed and tested on the following Linux systems:

Red Hat Enterprise Linux 7 with SLURM
Ubuntu 16.04

Uninstallation

To remove AMPL from a conda environment use:

conda activate atomsci
pip uninstall atomsci-ampl

To remove the atomsci conda environment entirely from a system use:

conda deactivate
conda remove --name atomsci --all

Advanced testing

Running all tests

To run the full set of tests, use Pytest from the test directory:

conda activate atomsci

cd atomsci/ddm/test

pytest

Running SLURM tests

Several of the tests take some time to fit. These tests can be submitted to a SLURM cluster as a batch job. Example general SLURM submit scripts are included as pytest_slurm.sh.

conda activate atomsci

cd atomsci/ddm/test/integrative/delaney_NN

sbatch pytest_slurm.sh

cd ../../../..

cd atomsci/ddm/test/integrative/wenzel_NN

sbatch pytest_slurm.sh

Running tests without internet access

AMPL works without internet access. Curation, fitting, and prediction do not require internet access.

However, the public datasets used in tests and examples are not included in the repo due to licensing concerns. These are automatically downloaded when the tests are run.

If a system does not have internet access, the datasets will need to be downloaded before running the tests and examples. From a system with internet access, run the following shell script to download the public datasets. Then, copy the AMPL directory to the offline system.

cd atomsci/ddm/test

bash download_datset.sh

cd ../../..

# Copy AMPL directory to offline system

Development

Installing the AMPL for development

To install the AMPL for development, use the following commands instead:

conda activate atomsci
./build.sh && ./install_dev.sh

This will create a namespace package in your conda directory that points back to your git working directory, so every time you reimport a module you'll be in sync with your working code. Since site-packages is already in your sys.path, you won't have to fuss with PYTHONPATH or setting sys.path in your notebooks.

Versioning

Versions are managed through GitHub tags on this repository.

Built with

DeepChem: The basis for the graph convolution models
RDKit: Molecular informatics library
Mordred: Chemical descriptors
Other Python package dependencies

Project information

Authors

The Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium

Amanda J. Minnich (1)
Kevin McLoughlin (1)
Margaret Tse (2)
Jason Deng (2)
Andrew Weber (2)
Neha Murad (2)
Benjamin D. Madej (3)
Bharath Ramsundar (4)
Tom Rush (2)
Stacie Calad-Thomson (2)
Jim Brase (1)
Jonathan E. Allen (1)

Lawrence Livermore National Laboratory
GlaxoSmithKline Inc.
Frederick National Laboratory for Cancer Research
Computable

Support

Please contact the AMPL repository owners for bug reports, questions, and comments.

Contributing

Thank you for contributing to AMPL!

Contributions must be submitted through pull requests. Please let the repository owners know about new pull requests.
All new contributions must be made under the MIT license.

Release

AMPL is distributed under the terms of the MIT license. All new contributions must be made under this license.

See MIT license and NOTICE for more details.

LLNL-CODE-795635
CRADA TC02264

Readme date

November 7, 2019

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
atomsci		atomsci
conda		conda
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
VERSION		VERSION
build.sh		build.sh
environment.yml		environment.yml
install.sh		install.sh
install_dev.sh		install_dev.sh
postBuild		postBuild
setup.py		setup.py

License

truatpasteurdotfr/AMPL

Folders and files

Latest commit

History

Repository files navigation