Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more then 128 CUDAScanCompactionConfig #727

Open
ptheywood opened this issue Oct 27, 2021 · 0 comments · May be fixed by #729
Open

Allow more then 128 CUDAScanCompactionConfig #727

ptheywood opened this issue Oct 27, 2021 · 0 comments · May be fixed by #729
Assignees
Labels

Comments

@ptheywood
Copy link
Member

CUDAScanCompactionConfig are per-stream per-data type bits of data, used during compation and other per-stream operation. I.e. used during PBM construction etc.

These are currently stored in a fixed size array of 128 elements (as only 128 streams can concurrently run on a single device). This fixed size is an issue.

In practice it is perfectly fine to create and use more than 128 streams, they will just not actively run concurrently on the same device at the same time.

Instead, as we know the width of the widest layer of any model, we know prior to the use of cuda scan compaction data how many elements we need, so we can allocate a correctly sized array of these configs at runtime.

A better fix would be to refactor this / the general use of per-stream bits of data to an alternate abstraction, but that will be a more significant investment of time, so can wait as part of #449.

This isssue presented itself for the concurrency benchmark when using more than 128 species for spatial messaging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant