Summary
- Improved abbreviation handling
- More arrow types
- Multi-step Reaction SMILES
- Reaction Set and Multi-step depiction
- More correct PubChemFingerprinter
- Universal (InChI) SMILES for large molecules
- Dependency updates and stability improvements, huge kudos to @uli-f for finding some longstanding issues
Improved abbreviation handling
#991. The Abbreviation handling has been tweaked with more and cleaner options:
Abbreviations abbreviations = new Abbreviations();
// abbreviations.setContractToSingleLabel(true); // old (still supported)
abbreviations.with(Abbreviations.Option.ALLOW_SINGLETON); // new
// abbreviations.setContractOnHetero(true); // old (still supported)
abbreviations.with(Abbreviations.Option.AUTO_CONTRACT_HETERO); // new
The full options are described here: Abbreviations.Option.
More arrow types
Now includes NoGo/Equilibrium/RetroSynthetic - #927. See IReaction.Direction. Examples:
Multi-step Reaction SMILES
An new entry point to the SMILES parser has been added to parse into a "multi-step" reaction where by the product of one step is the reactant the the next. The basic idea is to allow more than two '>'. Parts at even positions are reactants/products and odd positions are agents/catalysts/solvents.
Basic idea:
SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
IReactionSet rset = sp.parseReactionSetSmiles("[Pb]>>[Ag]>>[Au] lead-to-silver-to-gold");
Real example (see next bullet for depiction):
ClC1=NC=2N(C(=C1)N(CC3=CC=CC=C3)CC4=CC=CC=C4)N=CC2C(OCC)=O>C1(=CC(=CC(=N1)C)N)N2C[C@H](CCC2)O.O1CCOCC1.CC1(C2=C(C(=CC=C2)P(C3=CC=CC=C3)C4=CC=CC=C4)OC5=C(C=CC=C15)P(C6=CC=CC=C6)C7=CC=CC=C7)C.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.[Pd].[Pd].[Cs]OC(=O)O[Cs]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(OCC)=O)N6C[C@H](CCC6)O>CO.C1CCOC1.O.O[Li]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(O)=O)N6C[C@H](CCC6)O>CN(C)C(=[N+](C)C)ON1C2=C(C=CC=N2)N=N1.F[P-](F)(F)(F)(F)F.[NH4+].[Cl-].CN(C)C=O.CCN(C(C)C)C(C)C>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(N)=O)N6C[C@H](CCC6)O>>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N)N=CC3C(N)=O)N4C[C@H](CCC4)O |f:4.5.6.7.8,16.17,18.19| US20190241576A1
Reaction Set and Multi-step depiction
The DepictionGenerator
has been extended to depict reaction sets. If the product of the previous reaction is the same as the reactant in the next (object identity) it is omitted for a terser depiction:
More correct PubChemFingerprinter
Explicit hydrogens are not longer required and there is an option to use a more correct ring set definition matching closer the original CACTVS substructure keys. This is now on by default:
IChemObject builder = SilentChemObjectBuilder.getInstance();
new PubchemFingerprinter(builder); // new - default is to use "ESSSR-like" ring set
new PubchemFingerprinter(builder, false); // old - for backwards compatible with FP generated with older CDK versions
Universal (InChI) SMILES for large molecules
#979.
The InChI now supports > 999 atoms, we have the option to generate a SMILES using the InChI canonical labelling, it makes sense to use the larger molecules flag and support more.
New Contributors
- @Mailaender made their first contribution in #934
- @parit made their first contribution in #980
All Contributors
75 John Mayfield
17 Egon Willighagen
6 Uli Fechner
4 Mark J. Williamson
3 Mark Williamson
1 Parit Bansal
1 Matthias Mailänder
Full Changelog: cdk-2.8...cdk-2.9