Overview

Authors: Ian Pendleton, Michael Tynes, Aaron Dharna

Science Contact: jschrier .at. fordham.edu, ian .at. pendletonian.com

Technical Debugging: vshekar .at. haverford.edu, gcattabrig .at. haverford.edu,

FAQs

Developer Wiki

Overview

Retrieves experiment files from supported locations and processes to an intermediary JSON file on users local machine. The generated JSON files are used to generate a 2d CSV of the data in a format compatible with most machine learning software (e.g. SciKit learn). Additional configuration is required to map the existing data structures to headers which resemble the users desired configuration. These mappings are typically trivial for computer scientists, but may be more challenging for non-domain experts or individuals unfamiliar with manipulating dataframes. The dataset is augmented with chemical calculations such as concentrations, temperatures derived from models of plate temperature, and other empirical observations. In the final steps the dataset is supplemented with chemical features and calcs derived from ChemAxon, RDKit, and local datasets saved to this repository. Additional information on how to control the generation of _feat_ and _calc_ columns can be found in the user documentation here.

The original ESCALATE publication can be found here.

User documents, relating to a complete cycle of escalate, can be found here.

Installation

This build process has been tested on MacOS High Sierra (10.13.5), MacOS Catalina (10.15.3), Ubuntu Bionic Beaver (18.04), and Windows 10 (version 1909 OS Build 18363.418)

Windows Users: Please note that while windows has been tested it is not the recommended Operating System. Everything is more challenging, the installation is messier, logging is limited, and the file system interaction is more brittle.

Mac and Linux

Initial Setup

Pip Install

Create new python 3.8 environment in conda and activate:

conda create -n escalate_report python=3.8

conda activate escalate_report
Install the latest version of the pip package manager

conda install pip
Then install requirments (still in escalate_report)

pip install -r requirements.txt
Then install conda dependent pieces:

conda install -c conda-forge rdkit

Conda Install

Execute:

conda update conda

conda env create -f environment.yml

The conda env create command will automatically create an escalate_report environment

Custom Environment (Package List)

Windows Users will likely need to use this

Pip install the following python packages prior to use:

pandas, json, numpy, gspread, pydrive, cerberus, google-api-python-client==1.7.4, xlrd, xlwt, tqdm, pytest,

conda install -c conda-forge rdkit

Please report any failures of the above message to the repo admins

Authentication Setup

Download the securekey files and move them into the root folder (./, aka. current working directory, aka. ESCALATE_report-master/ if downloaded from git). Do not distribute these keys! (Contact a dev for access)

Note: Navigate to the wiki for more information on setting up a new lab or generating additional authentication keys
Ensure that the files 'client_secrets.json' and 'creds.json' are both present in the root folder (./, aka. current working directory, aka. ESCALATE_report-master/ if downloaded from git). The correct folder for these keys is the one which contains the runme.py script.
Stop here if you don't want to use ChemAxon for feature generation. Rdkit and the available ESCALATE features will still be generated.
- Note: ESCALATE will throw warnings if chemaxon features are implemented in type_command.csv, these can be ignored if that is the desired functionality

Optional for ChemAxon Support

Download and install ChemAxon JChemSuite and obtain a ChemAxon License Free for academic use
Follow the installation instruction found on ChemAxons website Be sure to not the location of the JChemSuite installation (i.e. ~/opt/chemaxon/jchemsuite/bin on linux or /Applications/JChemSuite/bin/ on MacOSX)
- There are also docs on license install using a graphical user interface (GUI) here: https://docs.chemaxon.com/display/docs/Licenses.html
You will need to specify the location of your chemaxon installation locations in ./expworkup/devconfig.py at the bottom of the file.

Running The Code

Currently supported google_drive_target_name (user defined folder names):

MIT Data: MIT_PVLab
HC and LBL Data: 4-Data-WF3_Iodide, 4-Data-WF3_Alloying, 4-Data-Bromides, 4-Data-Iodides
Development: dev

Basic Overview

A more detailed instruction manual including videos overviewing how to operated the code can be found in the ESCALATE user manual

Definitions

<my_local_folder>: is the name of the folder where files should be created. This will be automatically created by ESCALATE_report if it does not exist. The specified name will also be used as the final exported csv (i.e. if <my_local_folder> is perovskitedata, perovskitedata.csv will be generate)

<google_drive_target_name>: one or more of the available datasets. see examples below

You can always get runtime information by executing:

python runme.py --help
To execute a normal run with chemaxon, rdkit, and ESCALATE calcs (see installation instructions above for more details)

python runme.py <my_local_folder> -d <google_drive_target_name>
To improve the clarity of column headers specify them in the dataset_rename.json file. All columns can be viewed in the initial run by executing:

python runme.py <my_local_folder> -d <google_drive_target_name> --raw 1
Columns that do not conform to the _{category}_ (e.g., _feat_, _rxn_) will be omitted unless --raw 1 is enabled!
- A list of the columns not conforming to the naming scheme will be exported to './<my_local_folder>/logging/UNNAMED_REPORT_COLUMNS.txt'.
- The USER can specify an appropriate name in dataset_rename.json
- To see all columns with naming directly from datasource use: --raw 1
- Conflicting namespaces will be purged!
Significant flexibility is enabled for _feat_ (via, type_command.csv) and _calc_ (via, ./utils/calc_command.py) specification. For examples, discussion, and limitations of these specifications please see the USER docs.
- _calc_ generation can be skipped by using the --disablecalcs True flag on the CLI
- To speed up calc and feature development the first portion of the code can be skipped by:
  1. Running the code with --offline 1
  2. After the first iteration completes running future instances with --offline 2
A file named <my_local_folder>.csv will contain the 2d CSV of the dataset using the configured headers from the data or the mapping developed for the lab. The data/ folder will contain the generated JSONs.
Intermediate dataframes can be exported in bulk by specifying:

python runme.py <my_local_folder> -d <google_drive_target_name> --debug 1

To add additional target directories please see the how-to guide here. If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.

Report to Versioned Data to ESCALATion

More detailed instructions can be found in the ESCALATE user manual.

If you are using Windows10 please follow these instructions on what you will need to setup your environment. Consider using Ubuntu or wsl instead!

Ensure that versioned data repo and escalation are installed
Create an issue on versioned repo with new crank-number
python runmy.py <my_local_folder> -d <google_drive_target_name> -v <crank-number>
This will generate files for upload to versioned data repo with the names:
- <crank-number>.<dataset-name>.csv
- <crank-number>.<dataset-name>.index.csv
Move these files to the /pathto/versioned-dataset/data/perovskite/<my_local_folder>
Follow Readme.md instructions for versioned=datasets

Include a `state-set` file with Crank

Obtain a stateset or generate a stateset
python runmy.py <my_local_folder> -d <google_drive_target_name> -v <crank-number> -s <state-set_file_name.csv>
Follow 5-6 above

Example Useage

python runme.py 4-Data-Iodides -d 4-Data-Iodides
python runme.py 4-Data-Iodides -d 4-Data-Iodides 4-Data-WF3_Iodide 4-Data-WF3_Alloying
python runme.py dev -d dev --debug 1 --raw 1 --offline 1
python runme.py perovskitedata -d 4-Data-Iodides --verdata 0111 --state example.csv

FAQs, Trouble Shooting, and Tutorials

FAQs
Trouble Shooting Help: please send log file, any terminal output and a brief explanation to ipendlet .at. haverford.edu for help.
Tutorials
1. Adding a new target for data workup
2. Adding a new target for experiment generation

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
expworkup		expworkup
statesets		statesets
tests		tests
utils		utils
versiondata		versiondata
.gitignore		.gitignore
HISTORY.md		HISTORY.md
LICENSE		LICENSE
README.md		README.md
b342b013.lprop		b342b013.lprop
dataset_rename.json		dataset_rename.json
environment.yml		environment.yml
perov_test		perov_test
requirements.txt		requirements.txt
runme.py		runme.py
type_command.csv		type_command.csv

License

darkreactions/ESCALATE_report

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Mac and Linux

Initial Setup

Pip Install

Conda Install

Custom Environment (Package List)

Authentication Setup

Optional for ChemAxon Support

Running The Code

Basic Overview

Report to Versioned Data to ESCALATion

Include a state-set file with Crank

Example Useage

FAQs, Trouble Shooting, and Tutorials

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Include a `state-set` file with Crank