Skip to content
Geun Ho Gu edited this page Mar 12, 2018 · 19 revisions

Welcome to the VlachosGroupAdditivity wiki!

This package can load a group additivity scheme of your interest, and estimate molecules' thermodynamic properties. This package uses SMILES string for the description of molecules.

A simple Benson's group additivity example is shown here:

In:
from VGA.GroupAdd.Library import GroupLibrary
import VGA.ThermoChem
lib = GroupLibrary.Load('BensonGA')
descriptors = lib.GetDescriptors('C1CO1')
print descriptors
thermochem = lib.Estimate(descriptors,'thermochem')
print thermochem.eval_ND_H(298.15)

Out:
defaultdict(<type 'int'>, {'O(C)2': 1, 'C(C)(H)2(O)': 2, 'C1CO1': 1})
-19.9132141547

Using the package

The package usage for user is streamlined. The work flow follows:

  • Loading the scheme
  • Getting groups for a SMILES string provided
  • Estimating thermodynamic properties

Importing package requires loading the library module and VGA.ThermoChem:

from VGA.GroupAdd.Library import GroupLibrary
import VGA.ThermoChem

importing VGA.ThermoChem append a ThermoChem thermochemistry computation module to group additivity library module which is used to estimate thermodynamic properties. The code won't work if VGA.ThermoChem is not loaded.

Next, we load up the scheme:

lib = GroupLibrary.Load('BensonGA')

Group additivity schemes are stored in ./VlachosGroupAdditivity/data folder. When you load 'BensonGA', it reads scheme.yaml and library.yaml in ./VlachosGroupAdditivity/data/BensonGA folder. See below if you are interested in creating your own scheme. Below are the list of the scheme names:

  • "BensonGA" S. W. Benson, J. H. Buss. "Additivity Rules for the Estimation of Molecular Properties. Thermodynamic Properties". Journal of Chemical Education, 1965, 42, 502-518

  • "SalciccioliGA2012" M. Salciccioli, S. M. Edie, D. G. Vlachos. "Adsorption of Acid, Ester, and Ether Functional Groups on Pt: Fast Prediction of Thermochemical Properties of Adsorbed Oxygenates via DFT-Based Group Additivity Methods" Journal of Physical Chemistry C, 2012, 116 (2), 1873-1886

  • "GuSolventGA2017Aq" G. H. Gu, B. Schweitzer, C. Michel, S. Stephan, S. Philippe, D. G. Vlachos "Group Additivity for Aqueous Phase Thermochemical Properties of Alcohols on Pt(111)". Journal of Physical Chemistry C, submitted. (Aqueous Phase. Cavitation energy accounted )

  • "GuSolventGA2017Vac" G. H. Gu, B. Schweitzer, C. Michel, S. Stephan, S. Philippe, D. G. Vlachos "Group Additivity for Aqueous Phase Thermochemical Properties of Alcohols on Pt(111)". Journal of Physical Chemistry C, submitted. (Vacuum Phase)

Next, we get descriptors and estimate thermodyamic properties.

descriptors = lib.GetDescriptors('C1CO1')
thermochem = lib.Estimate(descriptors,'thermochem')
print thermochem.eval_ND_H(298.15)

GetDescriptors command breaks down a given molecular graph into groups and corrections. Then the descriptor dictionary can be used to estimate thermochemical properties. When VGA.ThermoChem is imported, the ThemoChem thermodynamic property estimation object is appended to the group additivity library module. By specifying 'thermochem' in the second input, you are asking the software to use ThermoChem object for property estimation. Then, the created objective can be used to estimate thermochemical properties at given temperature. Thermochem module also has:

  • eval_ND_Cp
  • eval_ND_S

definitions to estimate heat capacity and entropy.

If the scheme comes with uncertainty quantification information, following is available as well:

  • eval_ND_Cp_SE(temperature)
  • eval_ND_S_SE(temperature)
  • eval_ND_S_SE(temperature)

Important notes for using surface group additivity scheme

In RDKit, transition metal must be written in square bracket, i.e. [Pt].

Salciccioli et al 2012 GA scheme, and Gu et al 2017 GA scheme both assigns the number of connected surface atom as the number of valence electrons. For example, consider an adsorbate below:

Here, the molecule is CH2CHCHCH2OH, black, white, red, and blue spheres represent carbon, hydrogen, oxygen and platinum atoms, respectively. Three carbons each have one "hanging" valence electron interacting with surface. As the diagram shows, the two electrons from the two carbons at the center are interacting with a single Pt atom together through a pi-mode interaction. However, these GA schemes do not require these binding mode information, and SMILES is written so that these two carbons are bonded to two separate Pt atoms. i.e. C([Pt])C([Pt])C([Pt])CO is used instead of C([Pt])C([Pt]1)C1CO. Here is an another example:

Here, the molecule is CHOHCH2CCOH. In this example, the carbon at the right side has two valence electron, but bonded to a single Pt atom, seemingly making a double bond. The same rule applies here and the connectivity to surface atom will be described as two Pt atom with single bond. i.e. C(O)([Pt])CC([Pt])([Pt])C([Pt])([Pt])O is used instead of C(O)([Pt])CC([Pt])([Pt])C(=[Pt])O

RDkit, for some reason, cannot read SMILES of hydrogen bonded to the surface (e.g. [HPt], [H][Pt] do not work). I recommend simply using the hydrogen thermodynamic properties provided in our publications.

The C=O carbonyl group of a molecule can either bond to the surface with single bond each removing the double between C and O, or stay as C=O non-interacting with the surface. The surface group additivity can differentiate when an adsorbate has a former or latter case. E.g. C=O for the former case, or C([Pt])O([Pt]) for the latter case.

In the Gu et al. 2017 GA scheme, methane and CO2 in the gas phase is differentiated from chemisorbed methane and CO2. There are energetic difference due to dispersion interaction. To compute thermodynamic property for chemisorbed species, use SMILES of C~[Pt] for methane and C(=O)(=O)~[Pt] for CO2. ~ represents a partial weak bond.

Scheme

This section explains the structure of the schemes stored in ./VlachosGroupAdditivity/data folder as well as code flow of the loading. When the VGA.GroupAdd.Library.Load is called, it first see if there is a matching folder name in the scheme folder, and, if not found, treat it as a path to a folder that has scheme information. Then, it reads library.yaml and scheme.yaml file that is in the folder.

library.yaml is a data file that contains group thermochemistry information written in yaml format, and often has "include" list variable written. The VGA.GroupAdd.Library.Load use yaml_io module which detects this include list, then go through other yaml files specified in "include" list and build scheme objective.

Library.yaml has two components:

  • scheme: this info shows where the corresponding scheme is located
  • contents: This has a list of groups and thermochemistry values. See the existing scheme for the format.

Scheme.yaml contains 4 sub data:

  • patterns: This list variable contains the RING language based pattern description that assign atoms based on how the scheme assigns atoms.
  • pretreatment_rules: This contains a reaction query that is, again, based on the RING language, that the code will apply to the molecular graph before finding groups. Currently, this functionality is not applied yet.
  • remaps: This list contains remapping information of groups. For example, the original group additivity, treat C(C[d])(H)3 group the same as C(C)(H)3.
  • other_descriptors: This list contains any other corrections that doesn't follow the group scheme.

Group assigning mechanism

Given an example group A(B)b(C)c(D)d..., A is called central sub-group, and B C and D in parentheses are called peripheral sub-group in our work. b,c,d are the frequency of the sub-group. The algorithm goes through all the patterns listed in the scheme.yaml and assign the central and peripheral sub-group to each atom in the molecule. For example, the sub-group 'C' is a sp3 hybridized carbon, thus the pattern listed in scheme.yaml has sp3 connectivity info, and the code looks for any carbon with sp3 hybridization.

Next, the code go through each atoms again, and use assigned sub-groups to make up a group. See an ethane example below: Here red circle shows the picked central sub-group (CSG), and other red fonts in connected atom represents the picked peripheral sub-group (psg) to make up a group.

The code is smart enough that, if an atom in a molecule matches several pattern, it will give you error and which atom had multiple matching pattern. It also spits error, if an atom does not match any pattern. Thus, giving you an ability to debug your group scheme.