moleculetda

A framework to use topological data analysis to extract topological information from a structure (e.g., molecule or crystal), which can then be used in downstream tasks.

Installation

The library can be installed as follows:

pip install moleculetda

Examples

As an example, we will start with the following metal-organic framework (MOF) and construct topological summaries of all the channels and voids in the structure:

Persistence diagrams can be generated from an example structure file such as a .cif file.

from moleculetda.structure_to_vectorization import structure_to_pd
import matplotlib.pyplot as plt
import numpy as np

filename = 'files/mof_structs/str_m4_o1_o1_acs_sym.10.cif'

# return a dict containing persistence diagrams for different dimensions (1d - channels, 2d - voids)
arr_dgms = structure_to_pd(filename, supercell_size=20)

# plot out the 1d and 2d diagrams
dgm_1d = arr_dgms['dim1']
dgm_2d = arr_dgms['dim2']

plot_pds(dgm_1d, dgm_2d)

̰

Starting from arr_dgms (dict storing the persistence diagrams), vectorized representations can be generated. Axes units are the same as the units of the original structure file:

# initialize parameters for the "image" representation:
# spread: Gaussian spread of the kernel, pixels: size of representation (n, n),
# weighting_type: how to weigh the persistence diagram points
# Optional: specs can be provided to give bounds on the representation
from moleculetda.vectorize_pds import PersImage, pd_vectorization
from moleculetda.plotting import plot_per_images

pim = PersImage(spread=0.15,
            pixels=[50, 50],
            weighting_type = 'identity')

# get both the 1d and 2d representations
images = []
for dim in [1, 2]:
    dgm = arr_dgms[f"dim{dim}"]
    images.append(pd_vectorization(dgm, spread=0.15, weighting='identity', pixels=[50, 50]))

plot_pers_images(images, arr_dgms)

The resulting 1d and 2d image representations can be used for other tasks.

Citation

Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials. J. Phys. Chem. C (2020)

@article{doi:10.1021/acs.jpcc.0c01167,
author = {Krishnapriyan, Aditi S. and Haranczyk, Maciej and Morozov, Dmitriy},
title = {Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials},
journal = {The Journal of Physical Chemistry C},
volume = {124},
number = {17},
pages = {9360-9368},
year = {2020},
doi = {10.1021/acs.jpcc.0c01167},

}

Aditi S. Krishnapriyan, Joseph Montoya, Maciej Haranczyk, Jens Hummelshoej, Dmitriy Morozov. Machine learning with persistent homology and chemical word embeddings improves predictive accuracy and interpretability in metal--organic frameworks. Scientific Reports (2021)

@article{krishnapriyan_machine_2021,
  title={Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks},
  author={Krishnapriyan, Aditi S and Montoya, Joseph and Haranczyk, Maciej and Hummelsh{\o}j, Jens and Morozov, Dmitriy},
  journal = {Scientific Reports},
  volume = {11},
  numer = {1},
  issn = {2045-2322},
  pages = {8888},
  year={2021},
  doi = {10.1038/s41598-021-88027-8}
}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
src/moleculetda		src/moleculetda
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
Legal.txt		Legal.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

a1k12/moleculetda

Folders and files

Latest commit

History

Repository files navigation

moleculetda

Installation

Examples

Citation

About

Resources

License

Stars

Watchers

Forks

Languages