Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a minimized set of JARs for rCDK #915

Closed
zachcp opened this issue Sep 19, 2022 · 6 comments
Closed

a minimized set of JARs for rCDK #915

zachcp opened this issue Sep 19, 2022 · 6 comments

Comments

@zachcp
Copy link

zachcp commented Sep 19, 2022

Hi Team-CDK,

I am following up on a set of tweets about the performance of some of the code in rcdk (Tweet here), and I am wondering if you can help us figure out a way to reduce the JAR size we include in rcdk.

Note: I think CDK would be a really good substrate for LOTs of chemical work in R. Java is fast and the $ makes it easy to work. There are currently 2 major dev issues: massive JAR size for CDK-libs and the compiled code step for rCDK.

Although @egonw mentioned that the CDK jars for 2.8 are smaller, I see that the 2.8 bundle is now 45Mb. Maximum package size for CRAN is 5MB. rcdk has been granted a size exception but the latest JAR is already >25Mb. 45 is too big. Ideally, we could find the minimal set of non-bundle JARs that would let us retain backwards compatibility with rcdk ( and all of its dependencies in CRAN/Bioconductor). Do you have any suggestions on the best way to do this?

zach cp

(note: @rajarshi if the above works you might be able to rewrite the JAVA compiled code back to R/rJava for a smoother dev cycle)

@egonw
Copy link
Member

egonw commented Sep 19, 2022

rcdk should not use the cdk bundle. Also, in the past the full CDK has been larger than this :)

I guess this issue should go to the rcdk repository, but don't mind discussing this here either.

@zachcp
Copy link
Author

zachcp commented Sep 19, 2022

Agree this could go in rcdk. The main thing that I thought would be relevant to larger CDK users is how to identify JARs representing some subset of functionality.

@johnmay
Copy link
Member

johnmay commented Sep 19, 2022

Please see the already open #911. Long story short, don’t include cdk-iordf, then will cut it basically in half. Then next target is jna inchi which targets more platforms than the old JNI libs. You can selectively not support 32-bit Linux/windows for example. You can also shade the jna jar in a similar manner.

@johnmay
Copy link
Member

johnmay commented Sep 19, 2022

If I have time this year I will try and rewrite the CML processing to be more dependency efficient…

oh the other one to remove is builder3d, the cdk 3D code is not that good anyways :).

As Egon said, don’t use the cdk-bundle JAR you should pull in what you need

@zachcp
Copy link
Author

zachcp commented Sep 19, 2022

Thank you, @johnmay!

@zachcp
Copy link
Author

zachcp commented Sep 20, 2022

PR in place for cdk-2.8 jars with a size compatible with previous releases. A bit hacky to generate the individual jars but seems to work for now.

@zachcp zachcp closed this as completed Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants