Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Jitify2 #1150

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft

Migrate to Jitify2 #1150

wants to merge 30 commits into from

Conversation

Robadob
Copy link
Member

@Robadob Robadob commented Nov 15, 2023

  • RTC and Execution (Works with CUDA12.0, Windows/Linux)
  • Serialisation/Deserialisation (jitify2 pre-processed serialised objects are 2-3x larger)
  • CUDA 12.3 support
  • Investigate access violation from cuda().ModuleUnload() during sim shutdown, when CUDAAgent map is cleared by CUDASimulation destructor.
  • Reimplement jitify1 demangle used in curve_rtc.cpp
  • Optimise serialisation load time
  • Optimise compile time
    • Rework of old header hack? (would require loading headers from file)
      • Preloading fgpu headers only cuts agent fn time from 6.8s to 4.1s.
      • Loadings CUDA headers too makes a big difference, but these may not be particularly stable between version
    • Offline pre-process FLAMEGPU2 include hierarchy into a single header file? (using jitify tools?)
    • Wait for this PR to be merged?
  • Visual Studio 2019 support (we may be able to drop this)
  • ManyLinux2014 support

@Robadob Robadob added the RTC label Nov 15, 2023
@Robadob Robadob self-assigned this Nov 15, 2023
if [[ ${package} == *devel* ]] && version_lt "$CUDA_VERSION_MAJOR_MINOR" "11.0" ; then
package="${package//devel/dev}"
# libnvjitlink not required prior to CUDA 12.0
if [[ ${package} == libnvjitlink-dev* ]] && version_lt "$CUDA_VERSION_MAJOR_MINOR" "12.0" ;then
Copy link
Member Author

@Robadob Robadob Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is as narrow as it could be, but I wanted to make minimal changes to get it working.

The wildcard on libnvjitlink-dev can probably be removed.

Same issue in the ubuntu script

@Robadob
Copy link
Member Author

Robadob commented Nov 20, 2023

Update regarding header pre-loading with Jitify2/CUDA 12.3

Windows/CUDA 12.0

No preload
Millis: 6822.000000
Millis: 6853.000000

Preloading FLAMEGPU headers
Millis: 4045.000000
Millis: 4277.000000

Preload FLAMEGPU + CUDA headers
Millis: 1296.000000
Millis: 1667.000000

Linux/CUDA 12.3

Jitify 2 from scratch (Waimu)
Millis: 25318.000000
Millis: 24143.000000

Preload FLAMEGPU + CUDA headers
Millis: 1376.000000
Millis: 2218.000000

CUDA 12.0 has ~30 CUDA headers to preload.
CUDA 12.3 has ~257 CUDA headers to preload. (List contains some dupes)

Not clear whether we would want to generalise this code, to better handle different CUDA versions, because we could be potentially needing to update it with each CUDA update.

Edit: Removed from-cache times, latest commit has these matching Jitify1.

Robadob and others added 24 commits November 21, 2023 16:17
Having issues on windows, will try Linux
Slow (as we haven't got our pre-header hack) and lacks serialization.
Triggering compile of the preprocessed source after deserialisation is still fast.
This only reduces time from 6.8s to 4.1s (Windows/CUDA 12.0) and can't easily extend it to system headers.
Quick windows test shows it to be much faster to deserialize.
@Robadob
Copy link
Member Author

Robadob commented Nov 21, 2023

Current issue holding back the Jitify2 preprocesor branch is that it expects our flamegpu headers to be included as system header <> rather than " ". Waiting to here back from the dev (Ben) before I try to correct that on our side.

@Robadob
Copy link
Member Author

Robadob commented Nov 23, 2023

Did three full test runs last night, all passed, however in those cases the cmake jitify dependency was pointing at the preprocess branch. Not currently using that here as it causes all windows CI to fail with WError.

Linux/CUDA12.3/Seatbelts ON/GLM ON/Release
Linux/CUDA12.3/Seatbelts OFF/GLM ON/Release
Windows/CUDA12.0/Seatbelts ON/GLM OFF/Debug

In release builds kernels are taking ~1 second to compile each. As Jitify is now doing the pre-processing, this is closer to 2.5 seconds under Debug builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants