Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix CMake project for spark-rapids-jni library and its unit tests #1687

Open
ttnghia opened this issue Jan 10, 2024 · 1 comment
Open
Labels

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Jan 10, 2024

Currently, there are two problems with the CMake project in the repo:

  • The spark-rapids-jni project is built into libcudf.so library. This is bad. It should have its own name, instead of reusing libcudf which leads to name clash with the true libcudf. I would recommend to name libsparkrapidsjni.so or something similar.
  • Unit tests are both dynamic-linked with libcudf.so (the library built from spark-rapids-jni, not the true libcudf library) and static-linked with libcudf.a (the true libcudf library). Indeed, it only needs to dynamic-linked with spark-rapids-jni since this library already embeds libcudf inside it.

Fixing these issues is not urgent for now as they don't cause any bugs but they should be addressed soon when we can, to improve the code cleanness.

@ttnghia ttnghia added bug Something isn't working ? - Needs Triage labels Jan 10, 2024
@jlowe
Copy link
Member

jlowe commented Jan 10, 2024

Building to libcudf.so is intentional for multiple reasons:

  • cudf's native dependency loader expects to find libcudf.so
  • RAPIDS accelerated UDFs will be dynamically linked against libcudf.so, so at runtime they need to be able to find this library when the dynamic linker loads them.
  • Since we are statically linking the CUDA runtime, we only want one .so artifact so there aren't multiple CUDA contexts

If there's another solution that addresses these problems without clobbering the lbcudf,.so name I'd be happy to discuss them. Using a dynamic CUDA runtime seems like the easiest solution for this, as it allows us to provide multiple .so artifacts without duplicating the CUDA contexts. However we explicitly moved away from dynamically linking the CUDA runtime since we had issues with the correct CUDA runtime being properly installed in all the environments and it adds an additional step for users to setup the environment. As it is now, they just need the CUDA driver installed and we're good to go.

@mattahrens mattahrens added build and removed bug Something isn't working ? - Needs Triage labels Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants