Cliquify - Robust representation of molecular graphs to trees structures is an extension to the work from Junction Tree Variational Autoencoder (JTVAE).
This work aims to improve the tree representation of the molecular graph by introducing a variation of Hugin's Algorithm through the formation of chordal graphs.
- We define the importance of tree molecular vocabulary through its ability of representing more and diverse molecules.
- JTVAE ring vocabulary constraints the number of molecules generated due to its poor generalizability.
- Our solution fixes the problem by using more generalizable triangular cliques as vocabulary.
- Generalizable vocabulary helps in generative model, eg. VAE or GAN, to generate more diverse molecules without the need of redefining vocabulary based on new dataset.
- Vocabulary used
- Samples generated (Pruned)
- Vocabulary used
- Samples generated (Pruned)
As you can from the comparison above, by using the more generalizable vocabulary from Cliquify, after random sampling, cliquify can produce molecules with diversified components.
Cliquify uses triangular clique decomposition, which helps in
- reducing number of candidates per node generation (candidate generation explosion when involving large rings mentioned in hgraph2graph),
- control the number and characteristics of candidates being generated for each fragments.
- The diagram above shows how Cliquify reduces the possibility of candidates generation per node.
- The diagram above shows the average candidate generation per tree node, from molecules which has 6 membered rings and above
- Cliquify has low fluctuation of average numbers of candidates generated as compared to JTVAE
- JTVAE junction tree (ring vocabulary) is not deterministic since there are potentially many molecules that correspond to the same junction tree. -
- Using Cliquify, using the triangulation clique method
-
We quantify the tree similarity between molecules using Graph Edit Distance (GED) from Networkx Library
- GED based on tree nodes
- GED based on tree nodes and edges
- Based on the two diagrams above, we can infer that cliquify produces more unique trees as compared to JTVAE, making the tree structure more determistic for decoding, encourages more one to one relationship between molecules and tree structure representation.
- JTVAE – due to its neighborhood to neighborhood decoding process, it does not consider the orientation of the existing decoded molecule
- Cliquify eliminate this possibility by restricting the location of possible attachment, reducing/eliminating the possibility of orientation identification error.
- It does that through prioritizing Non Ring Bonds attachment during graph to tree decomposition, reducing the possible triangular cliques attached to the Non Ring Bond.
- Honeycomb structure is prevalent in large organic molecules. JTVAE fails to capture such formation
- Honeycomb formation requires recursive build, thus the more complicated the neighboring molecules, the larger the candidate count would be.
- This would like result in possibility of candidate explosion.
-
Due to its inherent tree structure decoding, JTVAE fails to capture how multiple children of the same parent are being connected to one another.