Skip to content

Data enhancement scheme for addressing the dilemma of small data on machine learning for polymer.

Notifications You must be signed in to change notification settings

hkqiu/DataAugmentation4SmallData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heat-Resistant Polymer Discovery by Utilizing Interpretable Graph Neural Network with Small Data

Workflow: Data Augmentation Graph Neural Network (DAGNN) for Polymer Property Prediction
and Inverse Design


This is the repo. of this paper "Heat-Resistant Polymer Discovery by Utilizing Interpretable Graph Neural Network with Small Data" on Macromolecules. This work features a chemistry-based solution that cleverly addresses the long-standing challenge of small datasets in polymer machine learning modeling, which has achieved SOAT model performance.

All data and code about model training and polyScreen software are in this repository.

ablation: The ablation results of random split.

gene: The gene fragments of each datasets.

model: The data used to train the models mentioned in this work can be found at /model, and the model for PI-0 can also be found with all ensemble GNN_enhanced models for further polymer design at/model/AllModels.

notebooks: The experimental codes for training, inference, visualization and data analysis in .ipynb format.

polyScreen2: The related code and the toolkit – polyScreen2.

src: The original polyimides candidates.

polyScreen2--Streamlined Property Prediction and Inverse Design with Conversational AI

Hard requirements

These packages must be available to use polyScreen2:

  -	python=3.9
  -	numpy=1.24.3=pypi_0
  -	pandas=2.0.1=pypi_0
  -	deepchem=2.6.1=pypi_0
  -	tensorflow=2.10.0=pypi_0
  -	tensorflow-estimator=2.10.0=pypi_0
  -	scikit-learn=1.2.2=pypi_0
  -	joblib=1.2.0=pypi_0

Tutorials

An example of property prediction by calling the model is given here: (Don't want to code? Just skip this part and read the following part.)

import glob,os
import pandas as pd
import deepchem as dc
import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw, PyMol, rdFMCS
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
from deepchem import metrics
from IPython.display import Image, display
from rdkit.Chem.Draw import SimilarityMaps
import tensorflow as tf

Val_DATASET_FILE = 'Path/2/your/file.csv'			# .csv
Restore_MODEL_DIR = ' Path/2/model'
#####Featurizerization#######
featurizer = dc.feat.ConvMolFeaturizer()
loader = dc.data.CSVLoader(tasks=[], feature_field="Smiles", featurizer=featurizer)
testset = loader.create_dataset(Val_DATASET_FILE, shard_size=10000)
###########Model##############
model = dc.models.GraphConvModel(1, mode="regression", model_dir=Restore_MODEL_DIR)
model.restore()
############Predict#############
val_pred = model.predict(testset)

It is convenient to make predictions and conduct structure design directly using polyScreen2 by simply downloading this repository and:

cd DataAugmentation4SmallData/polyScreen2
python polyScreen2.py

The GUI of polyScreen2 is now in your display as follows.


Here are two demos of polyScreen2 for demonstration purposes, and one can find other .gif demos at /polyScreen2/demo of our repo. to help for the usage of other functions. Property Prediction PolyChat

By the way, our hugging face space is coming, where you can use polyScreen2 more easily.

Acknowledgments

We would like to express our gratitude to the authors and research team of the article "Machine learning enables interpretable discovery of innovative polymers for gas separation membranes" for their inspiring insights and providing valuable data for our virtual design of polyimides.

Q&A

Any issue on this article or the usage of polyScreen2, please email hkqiu@ciac.ac.cn.

About

Data enhancement scheme for addressing the dilemma of small data on machine learning for polymer.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages