Skip to content

CDK 2.9

Latest
Compare
Choose a tag to compare
@johnmay johnmay released this 21 Aug 19:50
· 180 commits to main since this release

DOI

Summary

  • Improved abbreviation handling
  • More arrow types
  • Multi-step Reaction SMILES
  • Reaction Set and Multi-step depiction
  • More correct PubChemFingerprinter
  • Universal (InChI) SMILES for large molecules
  • Dependency updates and stability improvements, huge kudos to @uli-f for finding some longstanding issues

Improved abbreviation handling

#991. The Abbreviation handling has been tweaked with more and cleaner options:

Abbreviations  abbreviations = new Abbreviations();
// abbreviations.setContractToSingleLabel(true); // old (still supported)
abbreviations.with(Abbreviations.Option.ALLOW_SINGLETON); // new
// abbreviations.setContractOnHetero(true); // old (still supported)
abbreviations.with(Abbreviations.Option.AUTO_CONTRACT_HETERO); // new

The full options are described here: Abbreviations.Option.

More arrow types

Now includes NoGo/Equilibrium/RetroSynthetic - #927. See IReaction.Direction. Examples:

#1 (2)

#1 (3)

Multi-step Reaction SMILES

#986

An new entry point to the SMILES parser has been added to parse into a "multi-step" reaction where by the product of one step is the reactant the the next. The basic idea is to allow more than two '>'. Parts at even positions are reactants/products and odd positions are agents/catalysts/solvents.

Basic idea:

SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
IReactionSet rset = sp.parseReactionSetSmiles("[Pb]>>[Ag]>>[Au] lead-to-silver-to-gold");

Real example (see next bullet for depiction):

ClC1=NC=2N(C(=C1)N(CC3=CC=CC=C3)CC4=CC=CC=C4)N=CC2C(OCC)=O>C1(=CC(=CC(=N1)C)N)N2C[C@H](CCC2)O.O1CCOCC1.CC1(C2=C(C(=CC=C2)P(C3=CC=CC=C3)C4=CC=CC=C4)OC5=C(C=CC=C15)P(C6=CC=CC=C6)C7=CC=CC=C7)C.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.[Pd].[Pd].[Cs]OC(=O)O[Cs]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(OCC)=O)N6C[C@H](CCC6)O>CO.C1CCOC1.O.O[Li]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(O)=O)N6C[C@H](CCC6)O>CN(C)C(=[N+](C)C)ON1C2=C(C=CC=N2)N=N1.F[P-](F)(F)(F)(F)F.[NH4+].[Cl-].CN(C)C=O.CCN(C(C)C)C(C)C>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(N)=O)N6C[C@H](CCC6)O>>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N)N=CC3C(N)=O)N4C[C@H](CCC4)O |f:4.5.6.7.8,16.17,18.19|  US20190241576A1

Reaction Set and Multi-step depiction

#986

The DepictionGenerator has been extended to depict reaction sets. If the product of the previous reaction is the same as the reactant in the next (object identity) it is omitted for a terser depiction:

US20190241576A1 (3)

More correct PubChemFingerprinter

Explicit hydrogens are not longer required and there is an option to use a more correct ring set definition matching closer the original CACTVS substructure keys. This is now on by default:

IChemObject builder = SilentChemObjectBuilder.getInstance();
new PubchemFingerprinter(builder); // new - default is to use "ESSSR-like" ring set
new PubchemFingerprinter(builder, false); // old - for backwards compatible with FP generated with older CDK versions

Universal (InChI) SMILES for large molecules

#979.

The InChI now supports > 999 atoms, we have the option to generate a SMILES using the InChI canonical labelling, it makes sense to use the larger molecules flag and support more.

New Contributors

All Contributors

  75 John Mayfield
  17 Egon Willighagen
   6 Uli Fechner
   4 Mark J. Williamson
   3 Mark Williamson
   1 Parit Bansal
   1 Matthias Mailänder

Full Changelog: cdk-2.8...cdk-2.9