NISTADS is a python application developed to extract adsorption isotherms data from the NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials (https://adsorption.nist.gov/index.php#home) through their dedicated API. The user can either collect data regarding adsorbent materials and adsorbate species or fetch adsorption isotherm experimental data. Experiments are identified by name upon building the entire database experiments index from the API endpoint. Furthermore, NISTADS exploits the PUG REST API (see https://pubchempy.readthedocs.io/en/latest/ for more information) to enrich the adsorbate species dataset with molecular properties (such as molecular weight, canonical smiles, complexity, heavy atoms, etc.). Eventually, the adsorption isotherm dataset is split into two datasets, one containing data on single component adsorption and the other including experiments with binary mixtures.
The collected data is saved locally in 4 different .csv files, located in the NISTADS/data
folder. Adsorption isotherm datasets are saved in NISTADS/data/experiments
, while the adsorbents and adsorbates datasets are saved into NISTADS/data/materials
. The former will include the experiments datasets for both single component and binary mixture measurements, while the latter will host datasets on guest and host species.
The installation process is designed for simplicity, using .bat scripts to automatically create a virtual environment with all necessary dependencies. Please ensure that Anaconda or Miniconda is installed on your system before proceeding.
- The
setup/create_environment.bat
file, located in the scripts folder, offers a convenient one-click solution to set up your virtual environment.
The project is organized into subfolders, each dedicated to specific tasks.
data: run NISTADS/data/compose_experiments_dataset.py
or NISTADS/data/compose_materials_dataset.py
to respectively fetch data for adsorption experiments or for the guest/host entries. The data collection operation may take long time due to the large number of queries to perform, and it heavily depends on your internet connection performance (more than 30k experiments are available as of now). You can select a fraction of data that you wish to extract (guest, host, or experiments data), and you can also split the total number of adsorption isotherm experiments in chunks, so that each chunk will be collected and saved as file iteratively. Use the notebook NISTADS/data/dataset_info.ipynb
to perform explorative data analysis on the collected datasets.
experimental: contains experimental features to integrate further information into the dataset. Description of chemicals (both adsorbate species and adsorbent materials) can be generated using the pretrained GPT2 model using NISTADS/experimental/gpt_enhancement.py
. Due to the model limitations, description may not be very accurate and lack context for more complex molecules and materials.
The NISTADS/config/configurations.py
file allows to change the script configuration.
Category | Setting | Description |
---|---|---|
Data settings | GUEST_FRACTION | fraction of adsorbate species data to be fetched |
HOST_FRACTION | fraction of adsorbent materials data to be fetched | |
EXP_FRACTION | fraction of adsorption isotherm data to be fetched | |
CHUNK_SIZE | fraction of data chunks to extract and save | |
Series settings | MIN_POINTS | Minimum number of measurements per experiment |
MAX_PRESSURE | Max pressure to consider (in Pa) | |
MAX_UPTAKE | Max uptake to consider (in mol/g) |
This project is licensed under the terms of the MIT license. See the LICENSE file for details.