Skip to content

dbkgroup/GVAE4Smiles

Repository files navigation

GVAE4Smiles

Grammar Variational Autoencoder for Smiles.

Based off: https://github.com/kanojikajino/grammarVAE Paper: https://arxiv.org/abs/1703.01925

Now includes code to cache one-hot-encodings to disk via h5py and the corresponding data generator. This is needed to cope with training on large datasets e.g. six million SMILES strings sampled from ZincDB. [Not included here as too big for github].

Any SMILES strings need to be filtered to conform to the MAX_LENGTH criterion and the SMILES-Grammar used.