TAPIR (Thermomechanical Advanced Polymer Informatics & Resource)

Requirements

Conda is installed

Installation

git clone https://github.com/peterpaohuang/tapir.git
conda create -c rdkit -n tapir rdkit
conda activate tapir_env
Download polymer_db.csv
Move polymer_db.csv into tapir directory
python setup.py while inside tapir_env conda environment

Initialize

from depablo_box import PDBML, model

dx = PDBML()

Understand the database

Access database as pandas dataframe

df = dx.df

List all polymers and corresponding smiles

# list both polymer names and smiles
df[["polymer_name", "smiles"]]

# list only polymer names
df["polymer_name"]

# list only smiles
df["smiles"]

# list only inchi keys
df["inchi"]

# retrieve polymer row by polymer_name
df.loc[df["polymer_name"] == polymer_name]

# retrieve polymer row by smiles
df.loc[df["smiles"] == smiles]

# retrieve polymer row by inchi key
df.loc[df["inchi"] = inchi_key]

List Descriptors

Supported Chemical Descriptors

dx.chemical_descriptors

ExactMolWt
FpDensityMorgan1
FpDensityMorgan2
FpDensityMorgan3
HeavyAtomMolWt
MolWt
etc

Supported Thermo-Physical Descriptors

dx.experimental_descriptors

Molar Volume Vm
Density ρ
Solubility Parameter δ
Molar Cohesive Energy Ecoh
Glass Transition Temperature Tg
Molar Heat Capacity Cp
Entanglement Molecular Weight Me
Index of Refraction n
Coefficient of Thermal Expansion α
Molecular Weight of Repeat unit
Van-der-Waals Volume VvW

See distribution of NaN values in database for Thermo-Physical Descriptors

dx.na_distribution()

List Machine Learning Methods

dx.ml_methods

List Conversion Formats Directly from SMILES

dx.conversion_formats

How to use

Note: currently, depablo_box is only able to handle the calculation of chemical descriptors. Experimental descriptors already exists within the database (dx.df)

Get Chemical Descriptors

descriptor_list = ["ExactMolWt", "HeavyAtomMolWt"]
polymer_identifier = "C=CC(=O)NC(C)(C)C" # can also be the polymer_name
descriptor_df = dx.get_descriptors(polymer_identifier, descriptor_list)

Generate Input Files for Quantum Chemistry Codes

Supported Conversion Formats

Protein Data Bank
Gaussian 98/03 Input

polymer_identifier = 'CC(=O)OC=C' # can also be the polymer_name
conversion_format = 'Gaussian 98/03 Input'
outpath = '/file/path/your_polymer.xyz'
dx.create_input_file(polymer_identifier, conversion_format, outpath)

Add Chemical Descriptors to dataframe

dx.add_descriptors(descriptor_list)

Plot Properties as scatterplot

dx.plot_properties(property_x="glass_transition_temperature", property_y="ExactMolWt")

Plot Many Properties as Pairplot

dx.plot_many(property_list)

Get Correlation Between Two Properties

dx.property_correlation("molar_heat_capacity", "HeavyAtomMolWt")

Plot Correlation Heatmap of Many Properties

dx.correlation_map(property_list)

Export Dataframe as CSV file

dx.export_csv(outpath)

Initialize Model Training

# input_properties must have already been added to PDBML().df
input_properties = ["molar_heat_capacity", "ExactMolWt", "HeavyAtomMolWt"]
output_property = "solubility_parameter"
na_strategy = "remove"
ml = model(df, input_properties, output_property, na_strategy=na_strategy)

Train Model

Supported Model Types

Support Vector Regression
Linear Regression
Ridge Regression
Lasso Regression
Gaussian Process Regression

model_type = "Support Vector Regression"
ml.train(model_type)

View Trained Model R^2 Score

ml.r_2

Predict on new data

new_data = [["10.5", "29", "102.1"]]
results = ml.predict(new_data)

Plot Feature Importances

Note: model type Gaussian Process Regression does not support feature importances

ml.feature_importance()

Export Trained Model as Pickle File

ml.export_fitted_model(outpath)

Load Pickle File as Trained Model

import pickle
with open(outpath, "rb") as f:
  ml = pickle.load(f)
results = ml.predict(new_data)

Scrape CROW Polymer DB for experimental thermo-physical properties

from depablo_box import polymer_scraper

Initialize scraper

scraper = polymer_scraper()

Start Scraping

scraper.start()

Once Finished, Store Scraped Data

outpath = /file/path/to/store/FILE.csv
scraper.store_data(outpath)

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
main		main
scraper		scraper
src/images		src/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
examples.py		examples.py
requirements.txt		requirements.txt
setup.py		setup.py

License

peterpaohuang/tapir

Folders and files

Latest commit

History

Repository files navigation