Releases: cdk/cdk
CDK 2.9
Summary
- Improved abbreviation handling
- More arrow types
- Multi-step Reaction SMILES
- Reaction Set and Multi-step depiction
- More correct PubChemFingerprinter
- Universal (InChI) SMILES for large molecules
- Dependency updates and stability improvements, huge kudos to @uli-f for finding some longstanding issues
Improved abbreviation handling
#991. The Abbreviation handling has been tweaked with more and cleaner options:
Abbreviations abbreviations = new Abbreviations();
// abbreviations.setContractToSingleLabel(true); // old (still supported)
abbreviations.with(Abbreviations.Option.ALLOW_SINGLETON); // new
// abbreviations.setContractOnHetero(true); // old (still supported)
abbreviations.with(Abbreviations.Option.AUTO_CONTRACT_HETERO); // new
The full options are described here: Abbreviations.Option.
More arrow types
Now includes NoGo/Equilibrium/RetroSynthetic - #927. See IReaction.Direction. Examples:
Multi-step Reaction SMILES
An new entry point to the SMILES parser has been added to parse into a "multi-step" reaction where by the product of one step is the reactant the the next. The basic idea is to allow more than two '>'. Parts at even positions are reactants/products and odd positions are agents/catalysts/solvents.
Basic idea:
SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
IReactionSet rset = sp.parseReactionSetSmiles("[Pb]>>[Ag]>>[Au] lead-to-silver-to-gold");
Real example (see next bullet for depiction):
ClC1=NC=2N(C(=C1)N(CC3=CC=CC=C3)CC4=CC=CC=C4)N=CC2C(OCC)=O>C1(=CC(=CC(=N1)C)N)N2C[C@H](CCC2)O.O1CCOCC1.CC1(C2=C(C(=CC=C2)P(C3=CC=CC=C3)C4=CC=CC=C4)OC5=C(C=CC=C15)P(C6=CC=CC=C6)C7=CC=CC=C7)C.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.C=1C=CC(=CC1)\C=C\C(=O)\C=C\C2=CC=CC=C2.[Pd].[Pd].[Cs]OC(=O)O[Cs]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(OCC)=O)N6C[C@H](CCC6)O>CO.C1CCOC1.O.O[Li]>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(O)=O)N6C[C@H](CCC6)O>CN(C)C(=[N+](C)C)ON1C2=C(C=CC=N2)N=N1.F[P-](F)(F)(F)(F)F.[NH4+].[Cl-].CN(C)C=O.CCN(C(C)C)C(C)C>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N(CC4=CC=CC=C4)CC5=CC=CC=C5)N=CC3C(N)=O)N6C[C@H](CCC6)O>>C1(=CC(=CC(=N1)C)NC2=NC=3N(C(=C2)N)N=CC3C(N)=O)N4C[C@H](CCC4)O |f:4.5.6.7.8,16.17,18.19| US20190241576A1
Reaction Set and Multi-step depiction
The DepictionGenerator
has been extended to depict reaction sets. If the product of the previous reaction is the same as the reactant in the next (object identity) it is omitted for a terser depiction:
More correct PubChemFingerprinter
Explicit hydrogens are not longer required and there is an option to use a more correct ring set definition matching closer the original CACTVS substructure keys. This is now on by default:
IChemObject builder = SilentChemObjectBuilder.getInstance();
new PubchemFingerprinter(builder); // new - default is to use "ESSSR-like" ring set
new PubchemFingerprinter(builder, false); // old - for backwards compatible with FP generated with older CDK versions
Universal (InChI) SMILES for large molecules
#979.
The InChI now supports > 999 atoms, we have the option to generate a SMILES using the InChI canonical labelling, it makes sense to use the larger molecules flag and support more.
New Contributors
- @Mailaender made their first contribution in #934
- @parit made their first contribution in #980
All Contributors
75 John Mayfield
17 Egon Willighagen
6 Uli Fechner
4 Mark J. Williamson
3 Mark Williamson
1 Parit Bansal
1 Matthias Mailänder
Full Changelog: cdk-2.8...cdk-2.9
CDK 2.8
Key Changes
- JDK Versions:
- JDK 8 (minimum)
- JDK 11 (minimum if not using cdk-iordf)
- JDK 17 (recommended)
- The project is now built with Java 11+ but compiled to target Java 8. If you have any issues please let us know.
- The master branch has been renamed to main.
- logj4-core is no longer a dependency of cdk-log4j, you should include these separately if you intend to use Log4j
- A new cdk-slf4j module allows connecting logging to SLF4J
- MayGen structure generator, provides the ability to generate millions and millions of structures that have a given formulae.
Maygen maygen = new Maygen(SilentChemObjectBuilder.getInstance());
maygen.setFormula("C3Cl2H4");
maygen.setConsumer(new SmiOutputConsumer(new OutputStreamWriter(System.out)));
maygen.run();
Maygen is pure Java, if you need more speed consider Surge by the same author.
- New Smallest Ring utilities for single atom/bond
if (Cycles.smallRingSize(atom, 7) != 0) {
// atom is in a ring 7 or smaller
}
if (Cycles.smallRingSize(bond, 7) != 0) {
// bond is in a ring 7 or smaller
}
- RAW/Count Path Fingerprints
IFingerprinter fpr = new Fingerprinter();
Map<String, Integer> feats = fpr.getRawFingerprint(mol);
-
Where possible "Re-inflate" convex rings on cyclcophanes:
Before: now: -
New substructure/copy utility that allows a whole or part of a structure to be copied. Atoms are bonds are selected by providing a predicate:
IAtomContainer dst = builder.newAtomContainer();
AtomContainerManipulator.copy(dst, src, a -> a.isInRing(), b -> b.isInRing()); // select the cyclic part of a molecule
// select atoms in a set, the bonds will also be selected
Set<IAtom> subset = ...
AtomContainerManipulator.copy(dst, src, a -> subset.contains(a));
- New exclusive atoms filter that provides non-overlapping substructure matches, note the input order can determine which matches are selected.
for (int[] mapping : Pattern.findSubstructure(query).matchAll(mol).exclusiveAtoms()) {
// ...
}
Summary
- Merged all PRs and resolved all open issues related to bugs
- InChINumbersTools: Use JNA InChI options by @johnmay in #799
- Avoid integer overflow in MF by @johnmay in #808, #810
- Ensure correct stereo consistency (Fix #812) by @johnmay in #813
- SMILES: Fix an issue with stereochemistry being lost on generic atoms - @johnmay in #814, #866
- Maygen structure generator by @MehmetAzizYirik in #811
- Weighted path descriptor performance improvements by @johnmay in #817
- Depiction: Fix missing bond annotations by @johnmay in #819
- Utility functions for determining the smallest ring size of an atom/b… by @johnmay in #820
- Better consistentcy in Stereochemistry and Sgroups when removing atoms by @johnmay in #821
- Unify MOLfile V2000/V3000 options by @johnmay in #824, #852
- Improved stereochemistry perception by @johnmay in #826, #839
- Replace Atom symbol (String) comparison with atomic number (integer) by @johnmay in #827
- Improved/fix bugs with XLogP, PiContact, and BCUT, HuLuIndex descriptors by @johnmay in #833, #656, #822, #832
- Additional Raw and count path fingerprints by @johnmay in #834
- "Re-inflate" convex rings in macrocycles. The macrocycle layout can en… by @johnmay in #836
- Fix a corner case in repeat crossing bonds when we have variable atta… by @johnmay in #835
- Restore space as delimiter for string-based definition of InChI options by @marco-foscato in #846
- Update to Apache Jena 4.2 (requires JDK 11) by @egonw in #748
- Fix localisation of alpha channel floats in SVG by @egonw in #868
- Check string bounds on PDB COMPND line. Fixes #870 by @johnmay in #871
- Methods to manipulate atom types in ReactionManipulator by @uli-f in #883, #879
- Added ChemObjectBuilder.newReaction() by @uli-f in #888
- Utilities for selecting a substructure of a molecule. by @johnmay in #889
- Improved CDK Log4J/SLF4J interactions by @johnmay in #878, #876
- Additional SMARTS/matching utilities by @johnmay in #896, #900
- Use Junit5 by @johnmay in #901
- Fix issue with hose code nesting by @johnmay in #828
Authors
278 John Mayfield
13 Egon Willighagen
11 Uli Fechner
5 Mark Williamson
3 Valentyn Kolesnikov
2 MehmetAzizYirik
2 Marco Foscato
1 dependabot[bot]
1 Tim Dudgeon
1 Otto Brinkhaus
1 Christoph Steinbeck
1 Alex
New Contributors
- @marco-foscato made their first contribution in #846
- @tdudgeon made their first contribution in #847
- @OBrink made their first contribution in #851
- @sashashura made their first contribution in #885
Full Changelog: cdk-2.7.1...cdk-2.8
CDK 2.7.1
This page documents the changes for CDK v2.7 and v2.7.1. The patch version was made after some minor issues with how the new InChI code was organised were discovered by downstream projects.
Features
Switch from JNI to JNA InChI.
There are two main technologies for calling native code JNI (Java Native Interface) and JNA (Java Native Access). JNI requires writing a custom native wrapper which is then bound to Java code, JNA allows you to call the native methods of an existing SO/DYLIB directly. Essentially what this means is to expose the native InChI library in Java one needs to first write (and maintain) a native wrapper, with JNA we can just drop the InChI SO directly in. JNI InChI exposed InChI v1.03 and worked well for many years - unfortunately this project was no longer maintained and as newer more stable versions of InChI were released (now v1.06) an alternative was needed. A few years ago Daniel Lowe started JNA InChI and recently made it feature complete and released v1.0.
ChemAxon have also independently used the JNA path to integrated newer InChI libraries into their tools: (slides). It is not clear if this was made available, it is not listed on GitHub/ChemAxon.
Build on Java 17
The Maven plugins were updated to allow building on Java 17
Verify declared dependencies
The maven modules were checked for unused declared dependencies and used undeclared dependencies (mvn dependency:analyze
).
Organise and restructure test-jar and testdata
CDK was originally built with the ant
build tool, under this scheme there was a jar for the main/ code and one the test/ code. Test modules could share an inherit dependencies. To replicate this in maven we install and deploy "test-jar" artefacts. The project test code was restructured to put all common test code in the "cdk-test" module.
All test data was stored in a cdk-testdata module, this data has now been relocated to the test/resources
of each module where it is used. This meant some data was duplicated but means the ~18MB test-jar no longer needs to be uplodaded to maven central.
Remove Guava dependency
We have removed the use of Guava, the functionality could mostly be directly replaced with newer JDK idioms (Function/Predicate/Stream) which were not available in the past.
Use XorShift PRNG in ShortestPathFingerprinter (different fingerprint)
Commons Math3 was used in a single place to hash paths (Mersenne Twister) in the ShortestPathFingerprinter. Since this fingerprint method is not widely used and the hashes do not need to be cryptographically secure a simple https://en.wikipedia.org/wiki/Xorshift random generate is now used instead. This allows us to remove the dependency on Commons Math3. This does mean the fingerprint bits have changed, note the CDK version description is accessible via the Fingerprinter.getVersionDescription()
method.
Authors
137 John Mayfield
6 Egon Willighagen
1 dependabot[bot]
Full Change Log
- Bump version ready for development John Mayfield on 2021-12-14
- Bumped the log4j version Egon Willighagen on 2021-12-14
- Make log4j a test-only dependency of the InChI module John Mayfield on 2021-12-16
- Make sure TOTAL_DEGREE works correctly John Mayfield on 2021-12-16
- Additional test for converting "Molecules" to queries using the useful SMARTS therom: D=X+h. John Mayfield on 2021-12-16
- Updated CMLXOM version Egon Willighagen on 2021-12-16
- Update Mockito to 4.1.0 - fixing test failures (except InChI) on M1 AARCH64. John Mayfield on 2021-12-16
- Add in a Java 17 build John Mayfield on 2021-12-16
- Fix indents John Mayfield on 2021-12-16
- Update Jacoco plugin John Mayfield on 2021-12-16
- Formatting only - to make changes easier to follow John Mayfield on 2021-12-18
- First pass at moving from JNA to JNI inchi - some tests need adjusting. John Mayfield on 2021-12-18
- Fix Test - ignore longer extended tetrahedral - could be a warning. John Mayfield on 2021-12-18
- 5000L (long) default timeout in ms John Mayfield on 2021-12-18
- This is not a status message rather than log - makes sense John Mayfield on 2021-12-18
- These tests add the same bond twice a1E=a2E which is now a warning - used to be ignored. The tests were wrong John Mayfield on 2021-12-18
- This is now a warning, there is no EOF status. However it should perhaps set a sensible message John Mayfield on 2021-12-18
- More double bonds added twice. John Mayfield on 2021-12-18
- Message is empty string rather than null. John Mayfield on 2021-12-18
- This is the most questionable change but believe to be a bug in InChI 1.03. Using JNI INCHI setting the chiral flag = on or off we get "rA:9n..." without it we get ""rA:9...". In JNA INCHI we always get "rA:9n..." - this molecule is not chiral so it seems odd that the setting would change anything. Since this is only a change in AuxInfo this is acceptable. John Mayfield on 2021-12-18
- Looks like we can different timeout messages based on the system? John Mayfield on 2021-12-18
- Bump log4j-core from 2.16.0 to 2.17.0 dependabot[bot] on 2021-12-18
- Some minor version/scope cleanups. Hamcrest should be a test dependency. Make sure we pull in Log4j2 2.17.0. In QSARCML log4j-core should only be in the tests John Mayfield on 2021-12-22
- Move over to Log4J2 configuration - allows us to remove some log4j-1.2 deps John Mayfield on 2021-12-22
- We don't need the Log4J 1.2 API in these locations - unfortunately it still comes in via CMLXOM and JENA but better for now. John Mayfield on 2021-12-22
- Cleanup of the cdk/base modules - using dependency analzye to ensure all used undelcared dependencies are included and unused declared are removed. John Mayfield on 2021-12-22
- Cleanup of dependencies in CDK storage/io modules John Mayfield on 2021-12-22
- Significant dependency cleanup in the descriptor/qsar modules. John Mayfield on 2021-12-22
- Cleanup dependencies in depict/render module John Mayfield on 2021-12-22
- Cleanup dependencies in CDK tool modules. John Mayfield on 2021-12-22
- Cleanup dependencies in the misc/ module John Mayfield on 2021-12-22
- Broken by changes to another module - it implicitly depended on the CDK atomtyping. John Mayfield on 2021-12-22
- Make sure everything is used is declared in the cdk-legacy module John Mayfield on 2021-12-22
- More cleanup of base/ modules now I've got better at using dependency:analyze John Mayfield on 2021-12-22
- Minor issues of non-test dependencies now a clean build is tested John Mayfield on 2021-12-22
- More left overs - all good now. John Mayfield on 2021-12-22
- This should probably be install instead of test John Mayfield on 2021-12-22
- Looks like some things were folded into JDK 17 John Mayfield on 2021-12-22
- JENA-CORE pulls in a very specific version XML-APIS. There may well be a conflict but a fix should be to leave it as a transient dependency in cdk-io John Mayfield on 2021-12-22
- Avoid test-jar dependency for qsarcml - we only need the roundtrip function. John Mayfield on 2021-12-24
- This code is deprecated we don't need the full Desctiptor ba...
CDK 2.7
CDK 2.6
CDK 2.5
CDK 2.3
CDK 2.2
Please see 2.2 Release Notes for full details.
CDK 2.1.1 (patch release)
This patch release removes the SNAPSHOT dependency. Release Notes