-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove workRequest layer from GPU code #166
Conversation
…naged by DataManager
Remove WorkRequest code from TreePiecePartListDataTransferLocal
@trquinn I'm having a look through the PR changes to see where it would be helpful to add comments and further documentation. Obviously, a docstring with a @brief and some @param descriptions would probably be useful above any functions I added, especially ones that get called from another module. I'm wondering what to do about most of the functions in HostCUDA.cu, such as TreePieceCellListDataTransferLocal. The function parameters are all contained with CudaRequest objects. I'm not sure what the best way to use doxygen here would be. Would just a '@brief' comment above each be sufficient? I didn't any examples elsewhere in the code to follow. On a related note: do you know how doxygen handles ifdefs (such as TreePiece::commenceCalculateGravityLocal)? If there are multiple versions of a function, does it work to have multiple docstrings? |
Incorporate gpu_multicopy changes
@trquinn I think this PR is ready for review and merge. I misplaced an #ifdef inside of commenceCalculateGravityLocal that was preventing the dual tree walk from initializing properly when CUDA was defined, but GPU_LOCAL_TREE_WALK was not, hence some TreePieces skipping their gravity calculation. 'teststep' and 'testcosmo' are now passing on Frontera and Lonestar, with and without the local gpu tree walk enabled, on both single and multiple nodes. |
I'm getting an abort on line 844 of DataManager.cpp. "partIndex" is vestigial, and its declaration at line 791 should also be deleted. |
For some reason that 'CkAssert' on line 844 isn't firing when I run my tests. I just tried stepping through the function with GDB and it goes straight from line 839 to line 848 without a clear reason why. This was compiled with gcc/12.2.0 on Fontera with optimization disabled.
In any case, I've removed the unneeded references to partIndex in serializeLocal. I already did an upstream merge of the gpu_multicopy changes into this PR and the auto-merge created a bit of a mess I had to clean up. |
CkAssert()s don't fire if Charm is compiled "--with-production" |
That was indeed the issue. Just re-ran the tests on Frontera without '--with-production' and confirmed everything is passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of code cleanups recommended.
Requested changes have been made and I think this is ready for merge. See comment about docstrings for TreePieceDataTransferBasic. Following the old convention, I added these to HostCUDA.cu, but let me know if you would prefer to move this and the other function documentation to the respective .h files instead. |
This gets rid of the workRequest functions which allowed ChaNGa to asynchronously interact with the GPU. Instead, GPU requests are more directly handled by submitting memory transfer/allocation requests, kernel launches and callback assignments to CUDA streams.
Each TreePiece is assigned a CUDA stream in a round-robin fashion. The amount of available streams can be controlled via the runtime parameter 'numStreams'.