Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very wide model support #729

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

ptheywood
Copy link
Member

@ptheywood ptheywood commented Oct 27, 2021

Allow models with more than 128 wide layers to run without error, by not using a fixed number of CUDAStreamCompactionConfig.

Closes #727

Todo

  • Add tests which expose the current issue when a number of different features are used
  • Malloc a dynamic array based on the width of the widest layer in the model
  • CUDAScatter relies on MAX_STREAMS, so refactoring is required.
  • FLAMEGPUDeviceException relies on MAX_STREAMS, so refactoring is required.
  • Remove CUDAScanCompaction::MAX_STREAMS, replacing with a member variable of the current number allocated.
  • Remove/adjust related exceptions (where CUDAScanCompaction::MAX_STREAMS is/was checked)
  • Test with seatbelts
  • Test without seatbelts

Notes

CUDAScanCompaction::MAX_STREAMS is hardcoded to 128, the upper limit that can run on a (<= SM75) device at once. This is a bad assumption.

Models can have more than 128 functions per layer, which requires that many streams

CUDAScatter is initialsed as a singleton member of CUDASimulation, so we know the fixed model properties at that point in time, so can add a call to allocate enough data then.

DeviceExceptionManager has an array of 1 device pointer to a DeviceExceptionBuffer per stream, and host memory to copy that back to. DeviceExceptionManager is a member of cudaSimulation::singletons, so can be allocated during singleton initialisaton.

CUDAScanCompaction is a member variable of CUDAScatter, which is default initialised (rather than being manually constructed or mentioned by an inisialiser list.
This will need to be changed to pass the number of streams to create during conscruction, or to allocate the required number of elements later.
This appeasr to to be the only instatntiations of CUDAScanCompaction afiak.

Destrcution / deleteion will also be required.

@ptheywood ptheywood added this to the v2.0.0-alpha.2 milestone Oct 28, 2021
@ptheywood ptheywood modified the milestones: v2.0.0-alpha.2, v2.0.0-alpha.N Nov 29, 2021
@ptheywood ptheywood removed this from the v2.0.0-alpha.N milestone Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow more then 128 CUDAScanCompactionConfig
1 participant