Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 64-bit integers for variables, use SimplexId for data #817

Open
klacansky opened this issue Aug 3, 2022 · 2 comments
Open

Use 64-bit integers for variables, use SimplexId for data #817

klacansky opened this issue Aug 3, 2022 · 2 comments

Comments

@klacansky
Copy link

Is your feature request related to a problem? Please describe.
I run into crashes when using datasets that are around 1024x1024x1024 in size due to the use of SimplexId to store sizes. For example, the discrete gradient stores the number of cells using SimplexId type

std::vector<SimplexId> numberOfCells(numberOfDimensions);
which overflows and causes allocation failure.
(*gradient_)[2 * i + 1].resize(numberOfCells[i + 1], -1);
The necessary step is to set the SimplexId to be 64 bits, but that doubles the size of arrays, such as the offset array.

Describe the solution you'd like
I think a good solution is to use 64-bit integers (int64_t) for variables that are not data arrays. For indexing arithmetic, I would still use SimplexId due to idiv latency (https://www.agner.org/optimize/instruction_tables.pdf). Ideally, benchmarks would test if 32-bit index calculations are noticeably slower compared to 64-bit indices.

Additional benefit of this solution, I think, is that TTK can now detect if an input dataset can be represented using SimplexId robustly and exit with a clear error message.

Describe alternatives you've considered
Compile TTK using 64-bit indices at the cost of increased memory consumption.

@julien-tierny
Copy link
Collaborator

julien-tierny commented Aug 4, 2022 via email

@klacansky
Copy link
Author

Hi Julien,

that's what I used to force SimplexId to be 64 bits. I was thinking it may be better to decouple mesh indices and other variables (such as sizes, loop induction variables) into different data types. Of course, this solution adds more complexity, but I think it would be possible to check at few spots if the mesh fits into a SimplexId and otherwise give a user warning. All other variables could be 64 bits.

For example, I use by default int64_t for all variables except when reducing the data type size offers memory savings, such as 16-bit indices inside a grid tile to represent segmentation.

I am curious about the thinking process about using SimplexId for (almost) all variables in TTK. What were the advantages and disadvantages? Was it to support 32-bit processors?

Thank you,
P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants